Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Static program analyses for WebAssembly
(USC Thesis Other)
Static program analyses for WebAssembly
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
STATIC PROGRAM ANALYSES FOR WEBASSEMBLY by Alan Romano A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) August 2024 Copyright 2024 Alan Romano Dedication To my mom Celia and my dad Fidel who have always given me their unconditional love and support. Words cannot express the thanks I would owe you for all that you have done for me! To my brother Luis and my sister Andrea, thank you for constantly checking in on me and giving me your support throughout this process! To my partner Alex, thank you for your love and patience throughout this journey, and thank you for always being there for me during this sometimes stressful process! To all my family and friends who have been waiting for this day, thank you for believing in me! Your words of encouragement helped motivate me during the toughest moments of the program. ii Acknowledgements My six-year journey towards my PhD has been a long and tough process filled with many moments of struggle, hope, amazement, and luck. This process has been been one of the most challenging things I have undertaken, and I am glad to acknowledge all those who have helped me reach this point. First of all, I would like to thank my advisor, Dr. Weihang Wang. I would not be standing here without her constant support in both research, where we were able to lead eight projects towards publication, and my personal life, where I know I have found a lifelong friend and mentor. I am truly grateful for the opportunity she gave me as the first student to join her research group. I would like to express my gratitude towards Dr. Chao Wang and Dr. Pierluigi Nuzzo who served on my dissertation committee. I appreciate the time and effort spent to provide feedback on my dissertation and for giving me helpful advice towards my future steps. During my research, I was fortunate enough to have had productive collaborations with researchers from various institutions. I would like to express my thanks towards my collaborators, Dr. Yonghwi Kwon, Dr. Yunhui Zheng, Dr. Michael Pradel, Dr. Daniel Lehmann, Zihe Song, Sampath Grandhi, and Xinyue Liu. I would not have been able to pursue my PhD without the encouragement and guidance from the Ronald E. McNair Post-baccalaureate Achievement program at the New Jersey Institute of Technology. I would like to thank Dr. Angelo Perna and Zara Williams. The lessons and advice they gave allowed me to navigate the challenges encountered throughout my PhD. I would also like to thank Dr. Michael Bieber for iii advising me during my undergraduate research and providing me with a great introduction to the research process. I would not be here without the support of the faculty and staff here at USC. Thank you for your indispensable guidance during my PhD program and for your assistance during the transfer process! I also would like to thank the faculty and staff at the University at Buffalo, SUNY, for their support and providing me with a solid foundation during the first four years of my PhD program. Lastly, I would like to thank my family. Without their emotional and financial support, I would not have been able to complete my research journey. Thank you to my mom Celia, my dad Fidel, my brother Luis, my sister Andrea, and my partner Alex. iv Table of Contents Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Static Program Analysis of WebAssembly . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Challenges and Opportunities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.1 Challenge 1: Understanding Compilation Issues . . . . . . . . . . . . . . . . . . . . 4 1.2.2 Challenge 2: WebAssembly Program Comprehension . . . . . . . . . . . . . . . . . 5 1.2.3 Challenge 3: WebAssembly Security . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.4 List of Publications Included . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Chapter 2: Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.1 WebAssembly Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 WebAssembly Program Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2.1 WebAssembly Dynamic Program Analysis Techniques . . . . . . . . . . . . . . . . 16 2.2.2 WebAssembly Static Program Analysis Techniques . . . . . . . . . . . . . . . . . . 16 2.2.3 Intermediate Representations for WebAssembly . . . . . . . . . . . . . . . . . . . . 16 2.2.4 Static Analysis Frameworks for Other Languages . . . . . . . . . . . . . . . . . . . 17 Chapter 3: An Empirical Study of Bugs in WebAssembly Compilers . . . . . . . . . . . . . . . . . . 18 3.1 Motivation and Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2.1 Selecting WebAssembly Compilers . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2.2 Compiler Bug Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3 Study I: Qualitative Study of Emscripten Issues . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3.1 RQ1: Development Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.3.2 RQ2: Bug Causes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3.2.1 Asyncify Synchronous Code Bug Causes . . . . . . . . . . . . . . . . . . 28 3.3.2.2 Incompatible Data Type Bug Causes . . . . . . . . . . . . . . . . . . . . . 30 3.3.2.3 Memory Model Difference Bug Causes . . . . . . . . . . . . . . . . . . . 31 v 3.3.2.4 Other Infrastructures Bug Causes . . . . . . . . . . . . . . . . . . . . . . 32 3.3.2.5 Emulating Native Environment Bug Causes . . . . . . . . . . . . . . . . 33 3.3.2.6 Supporting Web APIs Bug Causes . . . . . . . . . . . . . . . . . . . . . . 33 3.3.2.7 Cross-Language Optimizations Bug Causes . . . . . . . . . . . . . . . . . 33 3.3.2.8 Runtime Implementation Discrepancy Bug Causes . . . . . . . . . . . . . 34 3.3.3 RQ3: Bug Reproducing Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.3.4 RQ4: Bug Fixing Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.3.4.1 Asyncify Synchronous Code Bug Fix . . . . . . . . . . . . . . . . . . . . 38 3.3.4.2 Incompatible Data Type Bug Fix . . . . . . . . . . . . . . . . . . . . . . . 39 3.3.4.3 Memory Model Difference Bug Fix . . . . . . . . . . . . . . . . . . . . . 39 3.3.4.4 Other Infrastructure Bug Fix . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.3.4.5 Emulating Native Environment Bug Fix . . . . . . . . . . . . . . . . . . . 40 3.3.4.6 Supporting Web APIs Bug Fix . . . . . . . . . . . . . . . . . . . . . . . . 40 3.3.4.7 Cross-Language Optimizations Bug Fix . . . . . . . . . . . . . . . . . . . 41 3.3.4.8 Runtime Implementation Discrepancy Bug Fix . . . . . . . . . . . . . . . 41 3.3.4.9 Unsupported Primitives Bug Fix . . . . . . . . . . . . . . . . . . . . . . . 41 3.4 Study II: Quantitative Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.4.1 RQ5: Lifecycle of Bugs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.4.2 RQ6: Impact of Bugs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.4.3 Bugs in Existing Compiler Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.4.4 Testing and Fixing Bugs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.4.4.1 Size of Bug-Inducing Test Inputs . . . . . . . . . . . . . . . . . . . . . . . 44 3.4.4.2 Size of Bug Fixes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Chapter 4: When Function Inlining Meets WebAssembly: Counterintuitive Impacts on Runtime Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.1 Motivation and Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.2 Background on WebAssembly Compilation and Execution Pipeline . . . . . . . . . . . . . 51 4.2.1 WebAssembly Compilation Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.2.2 WebAssembly Execution Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.3.1 C/C++ Source Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.3.2 Experiments Inspecting Inlining Effects . . . . . . . . . . . . . . . . . . . . . . . . 56 4.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.4.1 RQ1: Significance of Counterintuitive Effects . . . . . . . . . . . . . . . . . . . . . 59 4.4.1.1 Function Inlining Effects on x64 Architecture . . . . . . . . . . . . . . . 60 4.4.2 RQ2: Investigation of Function Characteristics Causing Counterintuitive Effects . . 61 4.4.2.1 Function Inlining in LLVM . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.4.2.2 Function Inlining in Binaryen . . . . . . . . . . . . . . . . . . . . . . . . 63 4.4.3 RQ3: Impact of Hot Functions on Counterintuitive Effects . . . . . . . . . . . . . . 66 4.4.4 Case Study on a Real-World Application . . . . . . . . . . . . . . . . . . . . . . . . 68 4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.5.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.5.2 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.5.2.1 Internal Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.5.2.2 External Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 vi 4.5.2.3 Construct Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Chapter 5: Automated WebAssembly Function Purpose Identification With Semantics-Aware Analysis 71 5.1 Motivation and Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.2 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.2.1 Abstraction Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.2.2 Applying Abstractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.3 Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.3.1 Encoding Abstractions as Features . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.3.2 Training the Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.3.3 Neural Network Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.4 Data Collection and Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.4.1 Data Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.4.2 Alexa Top 1 Million Websites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.4.3 Chrome Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.4.4 Firefox Add-ons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.4.5 GitHub Repositories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.4.6 Module-Level Categorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.4.7 Function-Level Categorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.5.1 Function Name Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.5.2 Classification Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.6.1 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.6.1.1 Internal Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.6.1.2 External Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.6.1.3 Construct Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 Chapter 6: MinerRay: Semantics-Aware Analysis for Ever-Evolving Cryptojacking Detection . . . 87 6.1 Motivation and Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 6.1.1 In-Browser Cryptomining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 6.1.2 Evolving Cryptominers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 6.1.2.1 Variant Algorithm (uPlexa [305]) . . . . . . . . . . . . . . . . . . . . . . 89 6.1.2.2 JavaScript-Based Miners . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 6.1.2.3 JS-WebAssembly Hybrid Miner (WebDollar [340]) . . . . . . . . . . . . . 90 6.1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 6.2 Threat Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 6.3 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 6.3.1 Programming Language Lifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 6.3.1.1 Abstracting Stack Operations . . . . . . . . . . . . . . . . . . . . . . . . 93 6.3.1.2 Abstracting Structured Control Flow Constructs . . . . . . . . . . . . . . 94 6.3.1.3 Abstracting Memory Operations . . . . . . . . . . . . . . . . . . . . . . . 95 6.3.2 CFG Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 6.3.2.1 Intraprocedural CFG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 6.3.2.2 Interprocedural CFG (ICFG) . . . . . . . . . . . . . . . . . . . . . . . . . 95 6.3.3 Hash Function Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 vii 6.3.3.1 Hash Function Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 6.3.3.2 Matching Models to Programs . . . . . . . . . . . . . . . . . . . . . . . . 97 6.3.3.3 User Consent Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 6.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 6.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 6.4.1.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 6.4.1.2 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 6.4.1.3 Cryptominer Detection Tools . . . . . . . . . . . . . . . . . . . . . . . . 100 6.4.2 Cryptominer Detection Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 6.4.3 Comparison Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 6.4.4 User Consent Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 6.4.5 Performance and Memory Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.5.1 Evaluation Data Collection Methodology . . . . . . . . . . . . . . . . . . . . . . . . 104 6.5.2 Generality of Our Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.5.3 Complementary to Existing Approaches . . . . . . . . . . . . . . . . . . . . . . . . 105 6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Chapter 7: Wobfuscator: Obfuscating JavaScript Malware via Opportunistic Translation to WebAssembly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 7.1 Motivation and Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 7.2 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 7.2.1 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 7.2.1.1 Obfuscating String Literals . . . . . . . . . . . . . . . . . . . . . . . . . . 112 7.2.1.2 Obfuscating Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 7.2.1.3 Obfuscating Function Names . . . . . . . . . . . . . . . . . . . . . . . . . 114 7.2.1.4 Obfuscating Calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 7.2.1.5 Obfuscating If Statements . . . . . . . . . . . . . . . . . . . . . . . . . . 117 7.2.1.6 Obfuscating For and While Loops . . . . . . . . . . . . . . . . . . . . . . 118 7.2.2 Synchronous and Asynchronous WebAssembly Instantiation . . . . . . . . . . . . 118 7.2.3 Applying Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 7.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 7.3.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 7.3.1.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 7.3.1.2 JavaScript Malware Detectors . . . . . . . . . . . . . . . . . . . . . . . . 124 7.3.1.3 JavaScript Obfuscation Tools . . . . . . . . . . . . . . . . . . . . . . . . . 124 7.3.2 Effectiveness in Evading Detection (RQ1) . . . . . . . . . . . . . . . . . . . . . . . 125 7.3.2.1 Comparison with Other Obfuscators . . . . . . . . . . . . . . . . . . . . 128 7.3.3 Correctness of the Transformations (RQ2) . . . . . . . . . . . . . . . . . . . . . . . 129 7.3.4 Efficiency in Terms of Runtime and Code Size (RQ3) . . . . . . . . . . . . . . . . . 130 7.3.4.1 Translation Runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 7.3.4.2 Execution Time Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . 131 7.3.4.3 Code Size Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 7.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 7.4.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 7.4.2 Mitigations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 viii Chapter 8: WAF, a WebAssembly Analysis Framework . . . . . . . . . . . . . . . . . . . . . . . . . 135 8.1 Motivation and Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 8.2 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 8.2.1 Waf-Low . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 8.2.1.1 Removing Data Types from WebAssembly Instructions . . . . . . . . . . 142 8.2.1.2 Grouping Semantically-Similar Instruction Sets . . . . . . . . . . . . . . 142 8.2.1.3 Removing the call_indirect Instruction . . . . . . . . . . . . . . . . . . 142 8.2.1.4 Removing Implicit Function Returns . . . . . . . . . . . . . . . . . . . . 143 8.2.2 Waf-Mid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 8.2.2.1 Abstracting Away Implicit Stack through Symbolic Execution . . . . . . 144 8.2.2.2 Handling Scoping Statements . . . . . . . . . . . . . . . . . . . . . . . . 145 8.2.3 Waf-High . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 8.2.3.1 Building Control Flow Graphs . . . . . . . . . . . . . . . . . . . . . . . . 146 8.2.3.2 Building Statement Dependency Graph . . . . . . . . . . . . . . . . . . . 147 8.2.3.3 Consolidating Stack Assignments . . . . . . . . . . . . . . . . . . . . . . 147 8.2.3.4 Consolidating Local Variable Assignments Using Alive Variable Definition Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 8.2.3.5 Converting Block Statements into If/Else Statements . . . . . . . . . . . 148 8.2.3.6 Converting Loop Statements into For/While/Do-While Loop Statements 149 8.2.4 Downward Conversions Between IR Levels . . . . . . . . . . . . . . . . . . . . . . 149 8.2.4.1 Waf-High to Waf-Mid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 8.2.4.2 Waf-Mid to Waf-Low . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 8.2.4.3 Waf-Low to WebAssembly Text . . . . . . . . . . . . . . . . . . . . . . . 150 8.3 Applications of WAF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 8.3.1 WebAssembly Binary Decompilation . . . . . . . . . . . . . . . . . . . . . . . . . . 151 8.3.2 Program Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 8.3.2.1 Hot Code Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 8.3.2.2 Loop Re-Rolling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 8.3.3 Analyzing Cross-Language Obfuscation . . . . . . . . . . . . . . . . . . . . . . . . 153 8.3.4 Malware Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 8.3.5 Traditional Compiler Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 8.3.5.1 Control-Flow Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 8.3.5.2 Live Variable Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 8.3.5.3 Reaching Definition Analysis . . . . . . . . . . . . . . . . . . . . . . . . 155 8.3.6 WebAssembly-Specific Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 8.3.6.1 Global Variable Usage Analysis . . . . . . . . . . . . . . . . . . . . . . . 156 8.3.6.2 Static Memory Size Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 156 8.4 Implementation and Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 8.4.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 8.4.2 Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 8.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 8.5.1 RQ1: Practicality of WAF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 8.5.1.1 Development Effort for WAF Analyses . . . . . . . . . . . . . . . . . . . 159 8.5.1.2 Existing WebAssembly Static Analysis Frameworks . . . . . . . . . . . . 160 8.5.2 RQ2: Binary Decompilation Effectiveness . . . . . . . . . . . . . . . . . . . . . . . 160 8.5.2.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 8.5.2.2 Decompilation Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 8.5.2.3 Code Complexity and Readability Metrics . . . . . . . . . . . . . . . . . 162 ix 8.5.2.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 8.5.3 RQ3: Performance Optimizations Effectiveness . . . . . . . . . . . . . . . . . . . . 164 8.5.3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 8.5.3.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 8.5.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 8.5.3.4 Loop Re-Rolling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 8.5.3.5 Hot Code Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 8.5.4 Evaluating Cross-Language Malware Detection Analysis . . . . . . . . . . . . . . . 166 8.5.4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 8.5.4.2 JavaScript Malware Detectors . . . . . . . . . . . . . . . . . . . . . . . . 167 8.5.4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 8.5.5 Evaluation of WebAssembly Malware Detection . . . . . . . . . . . . . . . . . . . . 169 8.5.5.1 Comparison with MinerRay . . . . . . . . . . . . . . . . . . . . . . . . . 169 8.5.5.2 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 8.5.5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 8.5.6 RQ4: Evaluating Framework Overhead . . . . . . . . . . . . . . . . . . . . . . . . . 170 8.6 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 8.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 Chapter 9: Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 9.1 General WebAssembly Related . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 9.1.1 WebAssembly Prevalence Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 9.1.2 WebAssembly Analysis Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 9.2 WebAssembly Compiler and Runtime Related . . . . . . . . . . . . . . . . . . . . . . . . . 175 9.2.1 Compiler Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 9.2.2 Compiler Optimizations Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 9.2.3 WebAssembly Compiler and Runtime Testing . . . . . . . . . . . . . . . . . . . . . 176 9.3 WebAssembly Program Comprehension Related . . . . . . . . . . . . . . . . . . . . . . . . 176 9.3.1 Machine Learning on Binary Code . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 9.3.2 Large Language Model Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 9.4 WebAssembly Security Related . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 9.4.1 WebAssembly Cryptominers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 9.4.1.1 Static Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 9.4.1.2 Dynamic Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 9.4.1.3 Machine Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . 178 9.4.2 Obfuscation Studies and Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . 178 9.4.2.1 Obfuscated Malware Detection . . . . . . . . . . . . . . . . . . . . . . . . 179 9.4.2.2 Obfuscation Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 Chapter 10: Future Work and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 10.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 10.1.1 Leveraging WAF in Neural Techniques . . . . . . . . . . . . . . . . . . . . . . . . . 181 10.1.2 Develop Cross-Language Program Optimizations . . . . . . . . . . . . . . . . . . . 181 10.1.3 Extend into Non-Web WebAssembly Use Cases . . . . . . . . . . . . . . . . . . . . 182 10.1.4 Develop WebAssembly Application Bug Detection Techniques . . . . . . . . . . . . 182 10.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 x List of Tables 3.1 Statistics of Compiler Projects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2 Findings and Implications of Our Study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.3 Bug Report Dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.4 Bugs Related to Development Challenges. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.5 Asyncify Synchronous C/C++ Code Bugs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.6 Incompatible Data Types Bug Causes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.7 Information Included in the Bug Reports. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.8 Bug Reports Where the Bugs Are Difficult to Reproduce.1,2 . . . . . . . . . . . . . . . . . . 37 3.9 Asyncify Synchronous C/C++ Code Bug Fixes. . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.10 Incompatible Data Types Bug Fixes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.11 Impact Categories for Each Compiler. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.1 Findings and Implications of Our Study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.2 LLVM Test Suite SingleSource/Benchmarks. . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.3 Experiments Testing Function Inlining Effects. . . . . . . . . . . . . . . . . . . . . . . . . . 57 5.1 Abstraction Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.2 WebAssembly Use Case Categories. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 6.1 Abstraction Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 7.1 Transformation Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 xi 7.2 Recall of Malware Detectors on Code Obfuscated by Wobfuscator. Lowest Recall in Bold. . 126 7.3 Recall of Malware Detectors on Code Obfuscated by Wobfuscator and Other Obfuscators. . 128 7.4 Correctness Validation Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 7.5 Efficiency of Transformations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 8.1 WAF IR Units for Waf-Low, Waf-Mid, Waf-High. . . . . . . . . . . . . . . . . . . . . . . . . 140 8.2 Source Lines of Code (SLOC) for Analyses in WAF. . . . . . . . . . . . . . . . . . . . . . . 159 8.3 Code Complexity and Readability Metrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 8.4 Samples Experiencing Improved Runtime Performance After Optimization. . . . . . . . . . 166 8.5 Datasets for Evaluating Cross-Language Malware Detection Effectiveness. . . . . . . . . . 167 8.6 Detection Rates of Samples Obfuscated by Wobfuscator then Transformed by WAF. . . . . 169 8.7 Cryptominer Detection Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 xii List of Figures 2.1 WebAssembly Development Workflow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1 Structure of Emscripten Compiler Toolchain. . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2 Emscripten Issue #9823: Missing sleep Callback. . . . . . . . . . . . . . . . . . . . . . . . 29 3.3 Bug Fix for Emscripten Issue #9823. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.4 Emscripten Issue #9562: Incorrect i64 Legalization. . . . . . . . . . . . . . . . . . . . . . . . 30 3.5 Emscripten Issue #5179: Missing Reference Updates. . . . . . . . . . . . . . . . . . . . . . . 32 3.6 Cumulative Distribution of Lifecycle of Bugs. . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.7 Cumulative Distribution of Input Sizes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.1 Chromium Tier-Up Process. In this example, function $main uses the Liftoff-generated code when first called as it is the only code available. $main calls $f1 which only has Liftoff code ready. $f2 uses the TurboFan-generated code as it is available at the first call. On the second call to $f1, its TurboFan-generated code is available and used for the call. . . . . . 49 4.2 Function Inlining Slows Runtime Performance in Chromium. Using Emscripten with the O3 optimization flag, the green bars show the % runtime speedup in the samples when function inlining is enabled compared to when inlining is disabled. The blue bars show the % decrease in the number of function call sites when inlining is enabled. . . . . . . . . . . 50 4.3 Binaryen Function Inlining. In the original C code, function b calls function a. The WebAssembly code before inlining shows how both a() and b() are separate. After inlining, the instructions of a() are inlined into b(). . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.4 random.cpp C++ and WebAssembly Code. (a) shows the C++ code of two functions, gen_random and main. (b) shows the WebAssembly output compiled with function inlining disabled. Function 13 is the main function, and inspecting its loop code reveals function 14 is gen_random. (c) shows function 13 from the module compiled in the Baseline experiment. The file differences between (b) and (c) show that function 14, gen_random, has been inlined into function 13, main. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 xiii 4.5 Runtime Speedup of Samples in Experiment #1 with Chromium. The bars show the % speedup of the samples having both LLVM and Binaryen inlining passes disabled compared to when both inlining passes enabled, i.e., the default version. The runtime speedups range from as low as 5.34% to high as 50.61% with an average speedup of 25.2% . . . . . . . . . . 59 4.6 Runtime Speedup of Samples in Experiment #1 with Firefox. The runtime speedups range from as low as 5.02% in functionobjects.cpp to high as 56.15% in matrix.cpp with an average speedup of 15.07% . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.7 Runtime Speedup of Samples in Experiment #2. The bars show the % runtime speedup of the x64 samples after having LLVM’s inline pass disabled. . . . . . . . . . . . . . . . . . . 60 4.8 Runtime Speedup Comparing Experiment #3 with Baseline in Chromium. The bars show the % speedup in the samples’ runtimes with the LLVM inline pass disabled compared to their runtime with the inline pass enabled. . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.9 Perf Record Output for random.cpp. (a) traces the browser function calls made during the Baseline experiment. Here, function 13 uses Liftoff code and occupies most of the execution time. (b) traces the browser calls in Experiment #3, and it shows that function 14 uses its TurboFan code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.10 Runtime Speedup Comparing Experiment #4 with Baseline in Chromium. The bars present the runtime speedup in samples after the Binaryen inlining-optimizing pass is disabled. . . 63 4.11 Runtime Speedup Comparing Experiment #4 with Baseline in Firefox. . . . . . . . . . . . . 64 4.12 Breakdown of Liftoff or TurboFan Compilers Used in O2. Each bar segment represents a function and its portion of total execution time. The left bar shows the Baseline execution, while the right bar shows Experiment #2 execution. . . . . . . . . . . . . . . . . . . . . . . 65 4.13 Chromium Compilation Duration Among All Samples. For each function, the TurboFan (blue dots) and Liftoff (green dots) compiler duration is plotted against the code size. Darker dots indicate multiple functions of the same size also have the same duration. . . . . . . . 66 4.14 Runtime Speedup of Experiments #4 & #5 Against Baseline in Chromium. The dark blue and green bars represent the % speedup of the sample when compiled with our patched Binaryen pass for O2 and O3, respectively. The light blue and green bars show the % speedup from Experiment #4 to serve as a reference point on the patched runtime impact. In 24 samples, our patch produces a speedup to a similar extent that disabling all inlining does. 66 4.15 Runtime Speedup of Experiments #4 & #5 Against Baseline in Firefox. . . . . . . . . . . . . 67 4.16 Libsodium.js Runtime Speedup of Experiment #1 in Chromium. . . . . . . . . . . . . . . . 67 5.1 WASPur System Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.2 Training Metrics over 30 Iterations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 xiv 6.1 Overview of MinerRay. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 6.2 Abstraction Merging Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 6.3 User Consent Call Graph on synonymus.cz. . . . . . . . . . . . . . . . . . . . . . . . . . . 99 6.4 Miners by Alexa Rank. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 6.5 Miners by Cryptomining Services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 6.6 Comparison with State-of-the-Art Detectors. . . . . . . . . . . . . . . . . . . . . . . . . . . 102 7.1 Overview of Wobfuscator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 7.2 Synchronous WebAssembly Instantiation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 7.3 Instantiation of the Asynchronous Variant. . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 7.4 Baseline Recall vs. Obfuscated Recall. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 8.1 WAF Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 8.2 sha256_update Processed with WAF. For the WebAssembly text shown in Figure 8.2b, the code is first processed into Waf-Low, shown in Figure 8.2c. In this level, the types have been removed and some instructions are consolidated. The IR is then lifted to Waf-Mid, where the implicit stack operations are made explicit by introducing stack, variable, and memory assignments. Finally, the IR is lifted to Waf-High, which introduces complex expressions and new semantic constructs not native to WebAssembly. The output resembles the source code text shown in Figure 8.2a for easier program comprehension. . . . . . . . . . . . . . 141 8.3 Example of call_indirect Instruction Removed. . . . . . . . . . . . . . . . . . . . . . . . 143 8.4 Example of StackAssignments Uses Being Replaced by Their Values. . . . . . . . . . . . . 147 8.5 WAF Duration for IR Generation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 8.6 WAF Memory Usage for IR Generation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 xv Abstract WebAssembly is a recent standard for the web that aims to enable high-performance web applications that can run at near-native speeds. The standard has gained attention in both academia and industry for its ability to speed up existing user-facing web applications. Due to its well-defined and sound design, many static program analysis techniques have been developed to accomplish various purposes of WebAssembly analysis. However, we explore the current landscape of static program analyses and identify gaps within the current WebAssembly ecosystem . We find that current program optimizations applied on WebAssembly modules may lead to diminished performance. We also identify a lack of tools that help developers understand WebAssembly modules through robust binary decompilation. Finally, we find a gap in the ability to analyze cross-language WebAssembly applications across the two languages they are typically implemented in, i.e., WebAssembly and JavaScript. In this dissertation, we present a novel WebAssembly Analysis Framework, or WAF. WAF is a static program analysis framework for WebAssembly modules that consists of multiple intermediate representations. Inspired by frameworks made for Java, the core of our framework lies in our three intermediate representations that each model the WebAssembly module at a different semantic level. This structure enables WAF to serve in multiple use cases, including program optimizations, binary decompilation, crosslanguage program analysis, and malware detection. We aim to show that our framework can improve static program analysis in the areas that the WebAssembly ecosystem is lacking. xvi Chapter 1 Introduction chp:introduction Web applications are programs designed to be run by web browsers, and, prior to 2017, JavaScript was the only standard web language supporting general-purpose scripting [68]. As such, JavaScript (officially known as ECMAScript [119]) became the de facto programming language for the web. However, although flexible, JavaScript became used in use cases where it was not intended, leading to performance issues. As a remedy, a subset of JavaScript, asm.js, was created as an efficient compiled language for the web [129]. However, despite the performance advantages of asm.js, the size limitations of the stopgap solution became apparent. As a response, the asm.js standard was transformed into a new binary format capable of achieving fast performance and small code sizes [86], and this binary format is named WebAssembly (abbreviated as Wasm). WebAssembly is a bytecode designed for the web designed to efficiently perform the computationally intensive operations unsuited for JavaScript [123]. All major browsers including Chrome, Firefox, Safari, and Edge have included support for WebAssembly on their desktop and mobile browsers since November 2017 [212]. In December 2019, the World Wide Web Consortium (W3C) made WebAssembly an official web standard [338]. WebAssembly has progressed to run within several different platforms, including standalone desktop runtimes [332, 269], serverless functions [210, 251], IoT and edge computing [182, 183], and containerization systems [143]. 1 Unlike JavaScript code that is distributed as text, WebAssembly modules are distributed as compact binary modules. The binary modules are created by compiling source code from a high-level language such as C, C++, Rust, Go, etc... The WebAssembly compilers rely on existing infrastructures rather than designing WebAssembly-specific optimizations from scratch. Differences in the execution environments of native applications and WebAssembly applications complicate the infrastructure required to generate and run WebAssembly modules. The differences can be sufficient enough to cause problems when applying analyses designed for traditional architectures to WebAssembly modules. WebAssembly modules also cannot directly access the Web APIs (e.g., the Document Object Model, Canvas, WebSockets, and WebWorkers API) [71]. Instead, the modules are expected to rely on JavaScript to access these functionalities. As a result, WebAssembly applications routinely make cross-language calls (between WebAssembly and JavaScript) to perform singular operations. This essentially introduces problems on both worlds, i.e., JavaScript and WebAssembly. Lastly, the WebAssembly language itself is difficult to understand. Due to its stack machine architecture and low-level instructions, there is a steep learning for new WebAssembly developers or users. There are few tools that help these developers understand the purpose and functionality of the modules. As a result, the process of manually inspecting WebAssembly modules remains tedious and time consuming. In addition, malicious actors have capitalized on the dense nature of WebAssembly syntax and used the language to implement a malicious use case, in-browser cryptojacking [203]. This dissertation discusses the static analysis techniques for WebAssembly used in various use cases. We discuss the motivations for investigating these use cases as well static techniques developed to address them. Despite the research effort, there exist limitations within these existing static analysis techniques, and we explore these issues. To overcome these limitations, we present a novel static program analysis framework for WebAssembly that is versatile enough to serve in multiple use cases, including program optimizations, binary decompilation, cross-language WebAssembly-JavaScript program analysis, and malware detection. 2 1.1 Static Program Analysis of WebAssembly As a low-level language, WebAssembly is designed primarily with compactness and runtime efficiency in mind. However, this prioritization does come at the sacrifice of readability and comprehension. On its own, the properties of WebAssembly, including its limited data types and stack machine design, make it difficult for a developer to understand WebAssembly code. In addition, its novel execution paradigm on the web introduces new challenges compared with existing web technologies. To handle these difficulties, program analysis techniques are valuable tools to gain insight into WebAssembly applications. Program analyses are broadly classified into two groups: dynamic analysis and static analysis. In dynamic analysis, the investigated program is executed, and relevant information is collected during its execution [22]. Dynamic analysis is a powerful tool to understand how an executed program impacts the state of any software system. These techniques are very precise as they model exactly what was executed. However, this class of analyses does have its limitations. First, dynamic analyses require executing the program under investigation. This requirement is not ideal for malware analysis, where a malicious file should be flagged before it is run. Second, dynamic analysis techniques suffer from incomplete program coverage [89]. If program functionality is not executed during program runs, dynamic techniques cannot collect data on this portion of the program. Third, dynamic analyses, specifically instrumentation approaches, often impose runtime overhead, limiting their usage in low-resource use cases or use cases where too much delay cannot be tolerated [229]. In static analysis, program information is collected without executing the investigated program [89]. This approach has the benefit of overcoming some of the limitations of dynamic analysis. Static techniques usually have lower runtime overhead and higher program coverage than dynamic techniques, offering a more complete model of program execution [229]. However, static techniques have their own limitations. First, runtime behavior is only approximated, which can lead to it being inaccurate compared to a dynamic analysis [176]. As such, they are not suited for evaluating the runtimes that execute the targeted applications. 3 Second, it has been shown that modeling a program’s complete runtime behavior through static program analysis is an undecidable problem [223]. Nevertheless, static program analyses can approximate useful and sophisticated models of program behavior. In this dissertation, we discuss static analysis approaches of WebAssembly. Since the language is well-defined and shown to be sound [116], we argue that static techniques are better suited for aiding in understanding and analyzing WebAssembly modules. As such, we describe challenges in WebAssembly development, debugging, and maintenance that benefit from static analysis. Finally, we describe the utility of a general-purpose static analysis framework for WebAssembly. 1.2 Challenges and Opportunities sec:challenges-and-opportunities While static analysis can be useful for WebAssembly, there are several open challenges that the current analyses within the WebAssembly ecosystem face. In addition, the novel design of WebAssembly provides new opportunities for improving analysis in the language. We discuss these challenges and opportunities to motivate the contribution of this dissertation. 1.2.1 Challenge 1: Understanding Compilation Issues subsec:challenge-understanding-compiler-issues As a compiled language for the web, WebAssembly applications go through several steps from their source code being written to their execution. This series of steps is referred to as the WebAssembly compilation and execution pipeline [258]. The pipeline encompasses the several components that perform static program analyses to generate, optimize, and transform WebAssembly binary modules. WebAssembly compilers transform source code from a high-level language to a WebAssembly binary. These compilers are usually constructed on top of existing compiler infrastructures and toolchains to handle parsing source code into an intermediate representation (IR), link external libraries, apply program optimizations, and transform the 4 IR into the WebAssembly bytecode [238, 63]. The WebAssembly runtime environments encompass several platforms, namely browsers and standalone runtimes. WebAssembly applications started off only running on browsers, such as Chromium and Firefox. These browsers typically leverage their existing JavaScript engines to support executing WebAssembly modules [308, 281]. These browser engines include browser compilers to generate machine code from the JavaScript or WebAssembly code. WebAssembly standalone runtimes execute applications outside of the browser. Similar to browsers, WebAssembly runtimes typically execute the code through interpretation [172, 171] and by generating machine code from the bytecode [8, 331]. In addition, some runtimes also implement a virtual machine in order to execute the WebAssembly bytecode [269]. When WebAssembly applications are run in browsers or on certain runtimes, the modules require additional JavaScript support code to perform fundamental tasks, such as instantiating the module itself or accessing external APIs [69, 66]. This pipeline is a complex system built on new and legacy technologies. As with all software systems, bugs can occur within the components of the pipeline, leading to headaches for the tool developers. It remains an open question how valid one assumption made within this pipeline is: does applying legacy static analysis built for older architectures to WebAssembly produce optimal code? In addition, we ask the following: Can static analysis techniques aid in addressing bugs appearing within these components? What are the unique challenges that WebAssembly components face compared with those targeting traditional architectures? 1.2.2 Challenge 2: WebAssembly Program Comprehension subsec:challenge-program-comprehension WebAssembly imposes a steep learning curve for new developers adopting the language. In addition, comprehending the language remains tedious even for experienced WebAssembly developers. This difficulty is due to several properties of the language. First, the language is composed of low-level instructions that each convey specific, but limited, semantic meaning [112]. Conceptualizing the functionality of a 5 WebAssembly program involves building a comprehensive picture from the small details offered by the instructions. Second, the language offers a limited set of data types. WebAssembly has a total seven types consisting of four numeric types (i32, i64, f32, f64), two reference types (funcref, externref), and a vector type (v128) [113]. Compared with high-level languages such as JavaScript and Rust, these limited data types make it less apparent when representing high-level abstractions such as strings or classes. Third, the instructions operate according to a stack machine architecture, so they push and pop values to a virtual stack. All intermediate computations must be performed on this stack and stored on the stack, in local or global variables, or in the linear memory. To statically identify the value at a given code location, stack operations have to be unwound up to that point, which is a tedious task to perform. Fourth, as a tool for building web applications, WebAssembly is also expected to frequently interact with JavaScript. Since its release as an MVP (Minimum Viable Product), WebAssembly has lacked direct access to Web APIs, such as the DOM, Fetch, or IndexedDB [71]. Behavior implemented in WebAssembly applications may actually span across the two languages, complicating analysis of the application further. The languages of two very different designs and semantic levels must be analyzed in tandem to accurately model the application. These four reasons introduce points of friction for comprehending WebAssembly application functionality. For the reasons described, WebAssembly is difficult to analyze on its own. This dissertation asks the following question: How can static analysis techniques help developers, new and experienced alike, understand the functionality described by WebAssembly modules? 1.2.3 Challenge 3: WebAssembly Security subsec:challenge-security As a recent standard, the novelty of WebAssembly can be exploited to hide functionality from uninformed or unsuspecting users. In fact, the language has been used to implement malicious applications, such as in-browser cryptojackers [170, 221, 259]. WebAssembly is adopted by malicious users for three reasons. First, the language achieves good runtime performance compared with JavaScript. Second, as discussed in 6 Section 1.2.2, the language is difficult to understand, making it ideal for obscuring malicious functionality. Third, there are few static analysis tools aimed a general WebAssembly malware detection. For this reason, in this dissertation, we ask: How can static program analysis be leveraged to address malware appearing in WebAssembly? 1.3 Outline subsec:outline The remainder of this dissertation will be organized as follows. Chapter 2 introduces the WebAssembly standard and the current state of WebAssembly program analysis. Chapters 3 to 8 cover the breadth of our work on different aspects of WebAssembly program analysis and form the main content of this dissertation. These chapters correspond to six research projects and publications. Chapter 9 provides a survey of related work, and Chapter 10 discusses open questions, future directions for this work, and a conclusion to this dissertation. Chapters 3 to 8 can be grouped into four high-level categories. The first category, containing Chapters 3 and 4, performs empirical studies to gain insight into WebAssembly compilers, including the prevalence of bugs in the compilers and incorrect assumptions in the program optimizations applied. The second category contains Chapter 5 and focuses on addressing WebAssembly program comprehension using static analysis techniques combined with machine learning. The third category consists of Chapters 6 and 7 and discusses security issues pertaining to WebAssembly. These chapters discuss both static WebAssembly malware detection as well as a WebAssembly attack scheme enabled by static transformations on the code. The fourth category, Chapter 8, introduces a novel general-purpose framework for WebAssembly designed to succinctly address these challenges facing WebAssembly. This framework builds on the previous research projects discussed in order to improve static analysis for several different use cases. We summarize the main contributions of the six projects discussed in Chapters 3 to 8. 7 Chapter 3 describes two studies we conduct on WebAssembly compiler bugs. The first study is a qualitative analysis on Emscripten, a widely-used C/C++-to-WebAssembly compiler. We investigate 146 bug reports for the compiler to identify the unique challenges that WebAssembly compilers face leading to new bugs. Our second study performs a quantitative analysis on three different WebAssembly compilers to understand the prevalence of bugs within these compilers. We investigate the bugs along three dimensions: the lifecycle of these bugs, the impacts that they have on compiled programs, and the sizes of bug-inducing inputs and bug fixes applied. We find that the WebAssembly ecosystem lacks tools to adequately debug these complex compilers. Chapter 4 describes an empirical study performed on a traditional compiler optimization, function inlining, that can counter-intuitively introduce performance degradation in certain WebAssembly applications. That is, although this optimization is designed to improve runtime performance and speed up the program, we observe that applying the optimization worsens runtime performance by slowing down execution. We find that function inlining introduces counterintuitive behavior in 66 of 127 studied WebAssembly samples. We develop a simple patch for Binaryen’s inlining pass to help mitigate this behavior, but we find better heuristics will be needed to optimally apply function inlining optimizations and other traditional optimization techniques. Chapter 5 introduces WASPur, a machine-learning-based tool that leverages an intermediate representation on WebAssembly instructions to label the purposes of individual WebAssembly functions. This work proposes an intermediate representation that abstracts the underlying semantics of WebAssembly. We develop our tool by first constructing a diverse dataset of WebAssembly samples collected from real-world websites, Firefox add-ons, Chrome extensions, and GitHub repositories. We identify the purposes of these samples and form 12 different use case categories encompassing these samples. We build a neural network classifier that can accurately label an input WebAssembly function with a name according to its functionality, achieving an accuracy of 88.07%. 8 Chapter 6 presents an in-browser, cryptojacking detection method, MinerRay. In this work, we propose an intermediate representation that abstracts underlying semantics of miners written in WebAssembly and JavaScript. This IR supports cross-language analysis and is resilient to variation. We use this IR to implement a lightweight static analysis that refers the critical steps of hashing. We evaluate MinerRay on over 1 million websites and find 901 websites using WebAssembly-based cryptominers. We compare MinerRay against five state-of-the-art detectors: MineSweeper, CM-Tracker, Outguard, No Coin, and minerBlock. We find MinerRay detects the most websites with the least errors among the detectors tested. Chapter 7 presents our novel obfuscation attack scheme, Wobfuscator. Wobfuscator effectively translates portions of JavaScript into WebAssembly code, allowing critical semantic information in JavaScript files to be hidden. our technique relies on a set of code transformations that translate selected JavaScript code locations into web assembly. We perform a comprehensive evaluation on Wobfuscator showing it can effectively evade state-of-the-art static malware detectors while preserving the semantics of the original code. We use this attack scheme to motivate the need for cross-language program analysis between JavaScript and WebAssembly. Chapter 8 presents our WebAssembly Analysis Framework, or WAF. The core of our framework lies in three intermediate representations, Waf-Low, Waf-Mid, and Waf-High, that each model WebAssembly code at a different semantic level. We evaluate the effectiveness of WAF in four distinct use cases, namely, binary decompilation, program optimization, cross-language malware detection, and WebAssembly malware detection, to demonstrate how our framework can enable general-purpose static analysis. 1.4 List of Publications Included The core of this dissertation is based on six research projects. Five of these research projects have been published in peer-reviewed conference proceedings. This dissertation reuses content from these published papers. We map the chapters of this dissertation to the publications as follows: 9 • Chapter 3: An Empirical Study of Bugs in WebAssembly Compilers. Alan Romano, Xinyue Liu, Yonghwi Kwon, and Weihang Wang. The 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). 2021. • Chapter 4: When Function Inlining Meets WebAssembly: Counterintuitive Impacts on Runtime Performance. Alan Romano and Weihang Wang. The 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 2023. • Chapter 5: Automated WebAssembly Function Purpose Identification With Semantics-Aware Analysis [256]. Alan Romano and Weihang Wang. The 32nd ACM Web Conference (TheWebConf). 2023. • Chapter 6: MinerRay: Semantics-Aware Analysis for Ever-Evolving Cryptojacking Detection [259]. Alan Romano, Yunhui Zheng, and Weihang Wang. The 35th IEEE/ACM International Conference on Automated Software Engineering (ASE). 2020. • Chapter 7: Wobfuscator: Obfuscating JavaScript Malware via Opportunistic Translation to WebAssembly. Alan Romano, Daniel Lehmann, Michael Pradel, and Weihang Wang. The 43rd IEEE Symposium on Security and Privacy (S&P). 2022. The following publications were authored during the PhD but are not included in this dissertation. • An Empirical Analysis of UI-Based Flaky Tests. Alan Romano, Zihe Song, Sampath Grandhi, Wei Yang, and Weihang Wang. The 43rd IEEE/ACM International Conference on Software Engineering (ICSE). 2021. • WASim: Understanding WebAssembly Applications through Classification. Alan Romano and Weihang Wang. The 35th IEEE/ACM International Conference on Automated Software Engineering Demonstrations Track (ASE Demo). 2020. PDF Link 10 • WasmView: Visual Testing for WebAssembly Applications. Alan Romano and Weihang Wang. The 42nd IEEE/ACM International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). 2020. 11 Chapter 2 Background chp:background As our works rely on aspects of WebAssembly, we first present background information on WebAssembly. We describe the low-level syntax of the language and the relevant structures of a WebAssembly module. We then provide background on the current state of static program analyses for WebAssembly, which is the primary focus of this dissertation. 2.1 WebAssembly Technology sec:wasm-technology WebAssembly provides a compilation target for languages such as C, C++, and Rust [123]. The WebAssembly standard defines an assembly-like binary format, a textual representation of the binary format, and the JavaScript APIs to interact with WebAssembly modules. Each WebAssembly program is a single-file module with a clear structure defined. Each module is composed of 10 sections that each describe different components of the module [114]: 1. Types - This section defines all function types used within the module, including the parameter and result data types. 2. Functions - This section defines all WebAssembly functions by their type, the local variables used, and the function body comprised of a sequence of WebAssembly instructions. 12 3. Tables - This section defines the function tables used as the targets of indirect function call, i.e., using call_indirect. 4. Memory - This section defines the properties of the linear memory sections of the module. 5. Globals - This section lists the global variables that are accessible across all functions in the module. 6. Elements - This section lists the function indices that will be used to initialize a specified function table. 7. Data - This section lists the byte sequences that will be used to initialize the specified linear memory sections. 8. Start - This section defines whether any one function is called once the module initializes. 9. Imports - This section declares the functions imported from JavaScript and will be called within a WebAssembly function. 10. Exports - This section specifies which WebAssembly functions are exported to the host JavaScript context so that they can be invoked there. WebAssembly instructions operate at a low level of abstraction, without, e.g., classes or complex objects. Instructions and functions are statically typed, but there are only seven primitive types: four numeric types implementing 32- and 64-bit integer and floating-point values, two reference type implementing function and external references, and one vector type implementing packed data processed by SIMD (Single Instruction Multiple Data [101]) instructions. Instructions are executed on a virtual stack machine, i.e., they pop their operands and push their results onto an implicit evaluation stack. Instructions are simple and designed to map closely to hardware instructions, e.g., a WebAssembly i32.add instruction would be translated into an x86 addl. Since there is no garbage collector and only primitive scalar types, complex 13 bool isEven(int a){ int b = a % 2; if(b == 0) return true; else return false; } example.cpp (a) C++ Source Program example.wasm (b) WebAssembly Binary Format example.wat (d) WebAssembly Text Format Website or Web Application (e.g., Extensions) (c) JavaScript Program WebAssembly.instantiateStreaming(fetch("example.wasm",imports)).then (o=>{o.instance.exports.isEven()}) (1) Compile via Emscripten Compiler (emcc) (2) Execute via JavaScript glue code to instantiate (3) Translate via WebAssembly Binary Tool (WABT) 0x00 0061 736d 0100 0000 0186 8080 8000 0160 .asm...........` 0x10 017f 017f 0382 8080 8000 0100 0484 8080 ................ 0x20 8000 0170 0000 0583 8080 8000 0100 0106 ...p............ 0x30 8180 8080 0000 0797 8080 8000 0206 6d65 ..............me 0x40 6d6f 7279 0200 0a5f 5a36 6973 4576 656e mory..._Z6isEven 0x50 6900 000a 8e80 8080 0001 8880 8080 0000 i............... (module (memory 1) (func $isEven (param $p0 i32) (result i32) get_local $p0 i32.const 1 i32.and i32.eqz Figure 2.1: WebAssembly Development Workflow. fig:background-wasm-development-flow objects (strings, arrays, records, etc...) are stored into program-organized linear memory, which is essentially a growable array of untyped bytes. WebAssembly does not provide a standard library, and for interaction with the “outside world,” WebAssembly modules need to import functions from the host environment. In the browser, that host environment is JavaScript, so any reliance on Web APIs (e.g., the Document Object Model, WebSockets, and WebWorkers) requires complementary JavaScript code. Non-primitive data needs to be marshalled through the WebAssembly memory however (which can be accessed from JavaScript as well). The WebAssembly API in JavaScript provides functions to compile and instantiate (that is, fulfill the imports of) WebAssembly modules and call exported WebAssembly functions [70]. A WebAssembly compiler is a tool that can generate WebAssembly binary modules from source code written in a high-level language. WebAssembly compilers are composed of a frontend that parses the source code into an intermediate representation (IR), an optional middle-end that optimizes the IR of the program, and a backend that generates WebAssembly binary code from the IR. In addition, WebAssembly compilers include bindings of existing libraries in order to support using standard libraries available in the source language within a WebAssembly runtime. Fig. 2.1 shows a typical workflow of generating a WebAssembly program from the source code in C++ to the runtime usage on a website: (1) The C++ source program example.cpp shown in Fig. 2.1(a) 14 defines a function isEven(). This C++ program is first compiled by a WebAssembly compiler, such as Emscripten (emcc) [65], to generate the resulting WebAssembly binary example.wasm as shown in Fig. 2.1(b). The binary format is how a WebAssembly module is delivered to and compiled by browsers. (2) To ease debugging, the WebAssembly binary can be translated to its text format (example.wat shown in Fig. 2.1(c)) by using a WebAssembly toolkit, such as WebAssembly Binary Tool (WABT) [117]. The text format shows examples of WebAssembly instructions, such as get_local and i32.and, as well as the WebAssembly function $isEven(). (3) To deploy the WebAssembly binary on a website, a JavaScript glue code as shown in Fig. 2.1(d) that instantiates the example.wasm file is required. The JavaScript code calls the function WebAssembly.instantiateStreaming that takes the parameter fetch("example.wasm") as the binary module source to instantiate. Finally, the returned module invokes the exported function isEven(). WebAssembly modules are not typically standalone files. Instead, they are combined with generated JavaScript wrapper/glue code. Since WebAssembly cannot start on its own and cannot directly interact with WebAPIs, the glue code is responsible for importing the necessary functions used by the module. Additionally, the glue code can set up data structures necessary to implement the runtime provided by the native language, such as memory allocation, file system emulation, and socket emulation. The final output of a WebAssembly compiler includes (1) a WebAssembly module, (2) a JavaScript file that handles the module imports and runs the module, and (3) an HTML file that loads the module. 2.2 WebAssembly Program Analysis sec:wasm-program-analysis Existing work has proposed tools enabling WebAssembly program analysis. We discuss the two main classes of WebAssembly program analysis: dynamic program analysis and static program analysis [176, 89]. 15 2.2.1 WebAssembly Dynamic Program Analysis Techniques Some existing works have proposed dynamic analysis techniques for WebAssembly. Taint analysis techniques [291, 104] follow the flow of information as the WebAssembly program executes. SEISMIC [327] instruments WebAssembly code at runtime to detect cryptominers within web pages. WASP [206] provides a symbolic execution engine for WebAssembly modules to enable concolic execution of WebAssembly programs. Wasabi is a general-purpose dynamic analyses framework for WebAssembly that enables analyses such as call graph analysis, taint analysis, and memory access tracking [179]. 2.2.2 WebAssembly Static Program Analysis Techniques Another direction within this line of work has been static program analysis. One work presents a technique for constructing code property graphs on WebAssembly programs to detect security vulnerabilities within them [33]. An approach has been developed to recover high-level function type information from WebAssembly samples using neural networks [178]. A compositional static analysis technique has been developed for information flow analysis in WebAssembly [287]. A backwards program slicing approach for WebAssembly programs can construct a minimal version of the program clone that executes a specified code location within the program [285]. Wassail [286] is a static analysis library for WebAssembly that can perform lightweight and heavyweight static analyses, including code querying, control-flow analysis, and data-flow analysis. Lehmann et al. [181] perform a comprehensive study on static call graph construction techniques for WebAssembly. WasmA [32] is a static analysis framework that provides friendly call, control-flow, and data-flow analysis for WebAssembly. 2.2.3 Intermediate Representations for WebAssembly There are several intermediate representations (IRs) that exist to enable static analysis for WebAssembly. For example, when using an LLVM-based compiler such as Emscripten, the original source code is converted 16 into LLVM IR [189], and this IR is used to generate WebAssembly bytecode. The Binaryen toolchain implements its own IR [337] to handle transforming and optimizing WebAssembly code. The WebAssembly Binary Toolkit (WABT) [318] uses its own IR [319] to transform WebAssembly modules between the binary and text formats, as well as collect useful metadata on the modules. 2.2.4 Static Analysis Frameworks for Other Languages Previous work has developed robust analysis frameworks for a variety of languages and purposes. Soot [310] is a Java bytecode optimization framework that heavily inspires our work. This framework implements three intermediate representations, Baf, Jimple, and Grimp, that each aim to enable different optimizations. PhASAR [272] implements a static analysis framework for C++ that makes program analysis more accessible by hiding the complexities of static analysis behind a user-friendly API. HybriDroid [175] and JN-SAF [341] implement static analysis frameworks for vulnerability detection in Android applications. Kashyap et al. [156] and Ko et al. [168] implement and evaluate static analysis frameworks for JavaScript applications. 17 Chapter 3 An Empirical Study of Bugs in WebAssembly Compilers chp:bug-study WebAssembly relies on WebAssembly compilers to construct the binary modules. The compilers are critical components as they are responsible for transforming the high-level source code implementations into the compact and correct binary modules. However, how reliable are these compilers? Do they introduce issues into the output binaries they create? In this chapter, we investigate the prevalence of bugs in WebAssembly compilers. This is the first of two chapters that investigate Challenge 1 (Section 1.2.1) identified in Section 1.2. We conduct two empirical studies to understand the characteristics of the compiler bugs. We conduct a qualitative analysis of bugs in the Emscripten compiler to understand the unique challenges that WebAssembly compilers face compared with traditional compilers. We also conduct a quantitative analysis of 1,054 bugs in three WebAssembly compilers: AssemblyScript, Emscripten, and Rustc/Wasm-Bindgen. We analyze these bugs along three dimensions: lifecycle, impact, and sizes of bug-inducing inputs and bug fixes. We use these studies to deepen our understanding of WebAssembly compiler bugs and the challenges that compiler developers face when analyzing both defective compiler implementations and defective WebAssembly binary modules. This chapter takes large portions of its content from its corresponding publication [260]. The author of this dissertation is also the main the author of the paper and performed the two empirical studies described as well as wrote most of the paper content. 18 3.1 Motivation and Contributions sec:bug-study-motivation As WebAssembly is increasingly adopted for various applications, there is a growing ecosystem of compilers that support WebAssembly development. As shown in Table 3.1, there are currently 10 compilers available to support compiling programs written in different programming languages to WebAssembly [24]. Table 3.1: Statistics of Compiler Projects. table:compiler-project-stats Compiler Created Source LOC Releases Stars AssemblyScript 2017-09 TypeScript 140,548 131 10,935 Emscripten 2011-02 C/C++ 226,196 363 20,191 Rustc/Wasm-Bindgen 2010-06 Rust 1,013,824 98 58,276 Asterius 2017-11 Haskell 92,078 0 1,537 Binaryen 2015-10 asm.js 115,106 190 4,597 Bytecoder 2017-04 Java 151,601 29 333 Faust 2016-11 Faust DSP 320,723 48 1,199 Ilwasm 2015-08 .NET CIL 4,549 0 344 Ppci-Mirror 2016-11 Python 12,998 0 174 TinyGo 2018-06 Go 4,910 20 7,399 Similar to compilers of native languages, WebAssembly compilers also contain bugs that can miscompile binary outputs [289]. These bugs are difficult to locate as they may be encountered only at project’s runtime. Compiler bugs can also waste development time when debugging an affected project before realizing that the bug is introduced due to miscompilation. For these reasons, it is important to understand how reliable compiler projects are in discovering, understanding, and resolving bugs. In addition to handling the bugs associated with traditional compilers, the developers of WebAssembly compilers face unique challenges that can introduce buggy behavior. For example, fully synchronous executions are not natively supported by browser engines, which differs from the execution model expected by C/C++. WebAssembly compiler developers should ensure that synchronous operations in C/C++ code are properly ported over to the asynchronous browser environment as relying on asynchronous APIs to perform synchronous behavior can lead to issues. Moreover, JavaScript does not support all data types 19 supported by WebAssembly. WebAssembly compilers have to support the compilation of data types across multiple target languages, as well as ensuring that, during runtime, types are not used in incorrect ways. In this chapter, we perform an empirical analysis of bugs in WebAssembly compilers to investigate the following research questions: • RQ1: What new challenges exist in developing WebAssembly compilers and how many bugs do they introduce? • RQ2: What are the root causes of these bugs? • RQ3: How do WebAssembly compiler developers reproduce these bugs and what information is needed? • RQ4: How do WebAssembly compiler developers fix bugs? • RQ5: How long does it take to fix bugs in different compilers? • RQ6: What are the impacts of the bugs in diverse compilers? To answer these research questions, we first perform a qualitative study on 146 bugs in Emscripten to identify unique development challenges, and understand the root causes, bug reproducing and bug fixing strategies of these bugs. Next, we perform a quantitative study on 1,054 bugs among three WebAssembly compilers, namely AssemblyScript [15], Emscripten, and Rustc/Wasm-Bindgen. This study focuses on the lifecycle of the bugs, their impacts, and the sizes of the bug-inducing inputs and bug fixes. Based on the findings obtained from the two studies, we identify useful implications for WebAssembly compiler developers. Our findings and implications are summarized in Table 4.1. We hope that our findings can provide WebAssembly compiler developers with specific areas that introduce bugs into the compiler, provide details on these bugs and previous fixes to help in designing new fixes, and provide general project management suggestions to prevent the introduction of new bugs. 20 Table 3.2: Findings and Implications of Our Study. table:findings-implications-summary Findings Implications 1 Data type incompatibility bugs account for 15.75% of the 146 bugs (Section 3.3.2.2). Interfaces (e.g., APIs) passing values between WebAssembly and JavaScript caused type incompatibility bugs when their data types are mishandled in one of the languages. Such interfaces (e.g., ftell, fseek, atoll, llabs, and printf) require more attention. 2 Porting synchronous C/C++ paradigm to event-loop paradigm causes a unique challenge (Section 3.3.2.1). While automated tools support the synchronous to event-loop conversion (e.g., Asyncify), bugs in them may cause concurrency issues (e.g., race condition, out-of-order events). Programs going through this conversion require extensive testing. 3 Supporting (or emulating) linear memory management models is challenging (Section 3.3.2.3). WebAssembly emulates the linear memory model (of the native execution environment). Many bugs reported in this regard require a particular condition (e.g., allocation of a large memory to trigger heap memory size growth), calling for more comprehensive testing. 4 Changes of external infrastructures used in WebAssembly compilers lead to unexpected bugs (Section 3.3.2.4). Compiler developers should stay on top of developments that occur in the existing infrastructure used within the compiler. In particular, valid changes (in one context) of existing infrastructure can introduce unexpected bugs in WebAssembly. Rigorous testing is needed. 5 Despite WebAssembly being platform independent, platform differences cause bugs (Section 3.3.2.8). The default Emscripten Test Suite focuses on testing V8 browser and Node.js, while there are bugs reported due to the platform differences (e.g., caused by other browsers and OSes). The test suite should pay attention to cover broader aspects of the platform differences. 6 Unsupported primitives not properly documented lead to bugs being reported in the compiler (Section 3.3.4.9). WebAssembly compiler developers should pay attention to keeping the document consistent with the implementation (e.g., mentioning sigsetjmp and function type bitcasting are not supported). 7 Some bug reports failed to include critical information, leading to a prolonged time of debugging (Section 3.3.3). We observe that the current bug reporting practice can be improved. In particular, an automated tool that collects critical information (e.g., inputs, compilation options, and runtime environments) would significantly help in the bug reproduction process. 8 Bugs that manifest during runtime made up a significant portion (43%) of the bugs inspected (Section 3.4.2). Many bugs in the compilers cause runtime bugs in the compiled programs, which are more difficult to detect and fix. To mitigate these bugs, compiler developers should be sure to test the emitted modules in the test suites more exhaustively. 9 77.1% of bug-inducing inputs were less than 20 line and developers manually reduce the size of inputs (Section 3.4.4). In many cases, bugs can be successfully reproduced by relatively small inputs that are less than 20 lines. Currently, developers often manually reduce large inputs. Automated bug-inducing input reduction (e.g., delta debugging) would be beneficial. 21 3.2 Data Collection 3.2.1 Selecting WebAssembly Compilers We inspect WebAssembly compiler projects on GitHub using the curated awesome-wasm list [24] that includes 10 WebAssembly compilers currently available, as shown in Table 3.1. We focus on popular compilers that support general-purpose, high-level programming languages. Specifically, we prune out Faust [109] (Domain Specific Audio DSP Language) and Binaryen (asm.js lowlevel target) as these source languages are not general or high-level. We also filter out compilers with less than 100,000 lines of code (Asterius [62], Ilwasm [105], Ppci-Mirror [303], and TinyGo [57]) and less than 50 releases (Bytecoder [277]) to focus on mature projects. To this end, our studies focus on three WebAssembly compilers, Emscripten [64], Rustc/Wasm-Bindgen [103], and AssemblyScript [15]. 1. Emscripten compiles C/C++ to WebAssembly [87]. It originally targeted asm.js [14] – a precursor language to WebAssembly, so it precedes the creation of WebAssembly. It uses a modified Clang frontend and originally used Binaryen to provide the backend. It later adopted LLVM as the backend [355]. 2. Rustc compiles Rust programs to WebAssembly [103]. As this compiler relies on the Wasm-Bindgen project [263] to provide bindings necessary for WebAssembly compilations, we include issues affecting both Rustc and Wasm-Bindgen in our count. We use the name Rustc/Wasm-Bindgen to highlight the combination of these two components. 3. AssemblyScript compiles a TypeScript-like language into WebAssembly [15]. It uses its own frontend and relies on Binaryen to handle the backend code generation. In the qualitative study (Section 3.3), we aim to investigate WebAssembly compiler bugs in-depth to answer the research questions RQ1, RQ2, RQ3, and RQ4. For this purpose, we choose Emscripten because it is the most mature and widely-used WebAssembly compiler: (1) Emscripten was created earliest and has 22 the most numbers of stars and milestone releases, compared with others. It also has the greatest number of reported bugs (will be discussed in Section 3.4). (2) It dominates real-world usage [132]. 3.2.2 Compiler Bug Collection subsec:bug-collection We collect bug reports from the three selected WebAssembly compiler projects’ GitHub repositories through two methods. First, we use the GitHub Search API [274] to collect closed GitHub issues related to WebAssembly∗ . Second, we use the GitHub REST API [140] to collect all the issues and pull requests for the projects. We also collect the commits referenced in the timeline of each issue in order to find which files the issues affected in the repositories. After obtaining the full set of issues for each project, we use the keywords “bug”, “defect”, “error”, and “fault” to identify the issues likely to be bugs. Table 3.3: Bug Report Dataset. table:bug-source Compiler Start End Bugs Unique Bugs Commits AssemblyScript 2018-02 2020-12 136 107 174 Emscripten 2015-06 2020-12 711 430 1,460 Rustc/Wasm-Bindgen 2017-12 2020-12 207 158 245 Totals 1,054 695 1,879 Qualitative Study Dataset (Emscripten). We extract all 430 closed bugs from the Emscripten project. We read the bug reports of these issues to only include those that are related to the challenges unique to WebAssembly compilers. Specifically, we check the root causes of the 430 bugs to determine whether a typical compiler targeting a native platform (e.g., GCC targeting x86-64) would need to deal with a similar root cause. If not, we consider that they are unique challenges to WebAssembly and include them in our dataset. This brings the final number of bugs to 146. This scale is on par with similar work involving manual inspection [77, 149]. Quantitative Study Dataset. As shown in Table 3.3, we obtain a total of 1,054 bug reports and 1,879 related commits from the three compilers’ GitHub repositories. The second and third columns show the ∗ Issues with keywords like “bug”, “good first bug”, and “breaking changes” with “WebAssembly”, “wasm”, and “wat”. 23 Input Source Code (.c/.cpp/.h) Existing Frontend Clang Middle-end LLVM- Transforms IR IR Emscripten Compiler Toolchain Output Module (.wasm) Existing Backend LLVM Binaryen Library Interface WebAPIs Native Environment (NE) Emulation Utilities (e.g., Optimization, Minification, Sanitizer) JS Support Code NE Emulation Runtime FS Math WebAPIs JS Support Code Implementation Figure 3.1: Structure of Emscripten Compiler Toolchain. fig:emscripten-compiler earliest and latest dates of the bugs considered for the dataset. The number of bugs from each compiler (i.e., after applying the filters) and the number of commits relating to the bugs are presented in the fourth and fifth columns, respectively. Note that we exclude bugs earlier than June 2015 from consideration as these precede the development of WebAssembly [115]. Also, there are multiple bug reports for one single bug because they readdress previous issues for various reasons (e.g., incomplete previous fix). 3.3 Study I: Qualitative Study of Emscripten Issues sec:study1-qualitative In the first study, we manually inspect Emscripten issues that contain bug-inducing code inputs to identify development challenges, bug causes, reproducing difficulties, and fixing strategies. Fig. 3.1 presents the architecture of Emscripten. It is built on top of existing compiler tools and infrastructures with Clang being used to implement the frontend. LLVM is used to provide middle-end optimizations. Binaryen and LLVM provide the backend functionality. Although the three stages resemble a traditional compilation pipeline for C/C++ compilers, developers of Emscripten (and any WebAssembly compiler in general) face unique challenges. Specifically, Emscripten provides implementations of the standard C and C++ that emulate the functionality available on native platforms (e.g., file systems and threading). These emulation libraries implement the semantics of legacy system calls by leveraging functions from JavaScript runtime components. For example, the FS library in Emscripten emulates traditional 24 Table 3.4: Bugs Related to Development Challenges. table:bug-challenges Development Challenge Count 1 Asyncify Synchronous Code 12 2 Incompatible Data Types 23 3 Memory Model Differences 12 4 Bugs in Other Infrastructures 25 5 Emulating Native Environment 23 6 Supporting Web APIs 17 7 Cross-Language Optimizations 15 8 Runtime Implementation Discrepancy 17 9 Unsupported Primitives 2 Total 146 filesystem operations within the browser. Additionally, Emscripten provides libraries that allow C/C++ to call JavaScript functions at runtime. This is done to allow the C/C++ code to interact with the DOM and Web APIs, which are only accessible through JavaScript. It also includes several utilities supporting compilation or optimization of the input rather than parts of the source language or libraries. At the end of the compilation, a WebAssembly binary module is emitted along with the JavaScript support code to provide a full WebAssembly package. 3.3.1 RQ1: Development Challenges WebAssembly compiler developers face a set of challenges that are unique to the new language. We develop categories for these challenges using an inductive coding approach [138] where we create categories based on the description of the underlying root cause. From this description, we determine whether this is a common compiler issue [289] or an issue unique to WebAssembly features. We iteratively add and refine categories to form district groups. As shown in Table 3.4, we generalize 9 unique WebAssembly compiler development challenges. Challenge 1: Asyncify Synchronous C/C++ Code. Most basic operations in C/C++ are executed in a synchronous and blocking manner. However, fully synchronous executions are not supported by browser engines. Execution in browsers follows an event loop that does not block execution to allow 25 user interactions [59], which differs from the execution model expected by C/C++. In order to support compiling to this model, WebAssembly compilers need to provide additional tools to handle converting synchronous blocking code to fit the event-based asynchronous browser environment. However, we find that the implementations of these tools can be incorrect or inconsistent, causing various bugs. We observe that 12 issues were introduced by these tools. Challenge 2: Incompatible Data Types. We find 23 issues that are caused by incompatibilities in the data types passed between the multiple languages involved in Emscripten compilation. This includes type incompatibilities during compilation between C and WebAssembly and type incompatibilities at runtime between WebAssembly and JavaScript. Challenge 3: Memory Model Differences. WebAssembly has a different memory model than native environments. These differences can lead to issues when compiling to WebAssembly, and we find that 12 issues can be attributed to these differences. Challenge 4: Other Infrastructures’ Bugs. Emscripten is built on top of existing compiler infrastructures and tools. As a result, bugs can be reported in the Emscripten repository but may be found to be caused in the tool of another infrastructure. These existing infrastructures include frontend parsers, backend code generators, and WebAssembly VMs (e.g., such as V8). Challenge 5: Emulating Native Environment. Emscripten provides libraries to seamlessly emulate native environment features that are not available on the web. These include filesystems, POSIX threads, and sockets. Challenge 6: Supporting Web APIs. In addition to emulating native environment libraries, Emscripten also provides APIs to support calling WebAPIs from C/C++ code. These WebAPIs include WebGL, the Fullscreen API, and IndexedDB, and these interfaces are called by existing C/C++ libraries such as OpenGL and SDL or by using the Emscripten-provided WebAPI bindings. 26 Challenge 7: Cross-Language Optimizations. Since Emscripten emits both a WebAssembly binary module and the supporting JavaScript runtime code, optimizers used on either output component must be able to collect usage information from both languages. These optimizers can contain bugs that hinder the optimization of the resulting module. Challenge 8: Runtime Implementation Discrepancy. Some issues can arise from differences in the running environment. This includes differences between browsers, differences in browsers and runtimes, and differences in runtimes supporting ES5 and/or ES6. Challenge 9: Unsupported Primitives. Some issues arise when users attempt to perform functionality that touches on limitations in WebAssembly. For example, Emscripten does not support the C keyword sigsetjmp, because WebAssembly does not support signals [306]. 3.3.2 RQ2: Bug Causes We investigate the Emscripten bugs to identify and analyze the types of root causes among the issues. We read the conversation on the issue’s GitHub page to find what the developers reported the underlying issue to be. After identifying the root cause description for all issues, we generalize similar root causes into challenges listed in Table 3.4. We create root cause categories by using a deductive coding approach beginning with root cause categories from existing work [197, 184, 293, 235, 365, 309]. We extend these categories to be more specific to WebAssembly compilers. To categorize these bugs, we read the issue reports to find what the compiler developers reported the underlying issues to be. We decide to which category the underlying root cause most relates to. Note that some root causes may be related to more than one category. For example, if the bug root cause is an invalid type operation from another infrastructure, we classify it as under Incompatible Data Types. 27 Table 3.5: Asyncify Synchronous C/C++ Code Bugs. table:asyncify-causes Asyncify Tool Causes Count Emterpreter Parsing Errors 4 Emterpreter Incorrect Emterpretify Stack State 2 Missing Features in Emterpreter 3 Asyncify Missing sleep Callback 1 Animation requestAnimationFrame Misuse 1 IndexedDB Flawed Filesystem Sync Operation 1 Total 12 3.3.2.1 Asyncify Synchronous Code Bug Causes subsubsec:asyncify There are 12 bugs in Emscripten tools that convert synchronous execution to asynchronous execution, as shown in Table 3.5. Specifically, 4 bugs are caused by parsing errors in the Emterpreter tool. 2 bugs are caused by the internal state management of the Emterpreter. 3 bugs are caused by unimplemented features in the Emterpreter. 1 bug is caused by the omitted sleep callback. 1 bug is caused by a misuse of the requestAnimationFrame browser function as a polling mechanism. 1 bug is caused by a flawed filesystem sync operation. Fig. 3.2 gives an example [121] of the missing sleep callback bug. Emterpreter and Asyncify are two mechanisms provided by Emscripten to handle porting synchronous C/C++ code into event-based code compatible with the browser event-loop. Asyncify [355] allows for asynchronous execution by modifying WebAssembly code to allow for pausing and resuming during the middle of execution. Emterpreter [88] converts the input code into a bytecode format different from WebAssembly that is run in an interpreter that can be paused and resumed. According to the documentation, both methods should perform the same functionality. This bug happens because the emscripten_sleep API in Asyncify behaves differently from the emscripten_sleep_with_- yield function in Emterpreter. In particular, emscripten_sleep in Asyncify does not actually call a sleep callback. This difference leads to issues in the SDL library as it relies on these tools to handle streaming 28 1 SDL_AudioSpec as; 2 as.callback = audio_callback; 3 void audio_callback(void *unused, Uint8 *stream, int len) { 4 // push audio stream data to stream variable 5 } 6 while (true) { 7 // calculate audio stream data 8 emscripten_sleep_with_yield(1); // emscripten_sleep(1); - for asyncify 9 } Figure 3.2: Emscripten Issue #9823: Missing sleep Callback. fig:emscripten-9823 1 + sleepCallbacks: [], // functions to call every time we sleep 2 ... 3 handleSleep: function(startAsync) { 4 + // Call all sleep callbacks now that the sleep-resume is all done. 5 + Asyncify.sleepCallbacks.forEach(function(func) { 6 + func(); 7 + }); 8 ... 9 } Figure 3.3: Bug Fix for Emscripten Issue #9823. fig:emscripten-9823-fix audio in the main loop. Audio chunks are enqueued through the sleep callback as shown in Fig. 3.2, so this lack of consistency leads to audio distortion in Asyncify. This issue is fixed by adding the changes shown in Fig. 3.3 to the Asyncify library to call sleep callbacks, making it consistent with the behavior in Emterpreter. Table 3.6: Incompatible Data Types Bug Causes. table:challenge-incompatible-data-types-causes Data Type Causes Count Incorrect i64 Legalization 7 Native Unsupported Floating-Point or Precision Loss 4 Missing i32 Operation 1 Incorrect C++ Atomic Types 4 Custom Invalid SIMD Type Operations 4 Error Code Type Change 1 Undefined Undefined Cross-Language Type Function 2 Total 23 29 3.3.2.2 Incompatible Data Type Bug Causes subsubsec:incompat-data We find 23 bugs within Emscripten that are a result of incompatible data types passed between the various languages involved in the compilation. As shown in Table 3.6, incompatible data type bugs result from root causes that can be grouped into three broad categories. The first group includes root causes involving native WebAssembly data types (i.e., i32, i64, f32, and f64). The second group involves types that are not native to WebAssembly, including C++ atomic types designed for threads [284], Single Instruction, Multiple Data (SIMD) values, and error code constants. The last category, Undefined Cross-Language Type Function, involves missing utility functions that fetch type information of compiled C/C++ values. Fig. 3.4 gives an example [3] of the incorrect i64 legalization bug. This bug occurs when using a file pointer provided by the cstdio library and compiling the module with option -s MAIN_MODULE=1. When compiling to WebAssembly, the browser sandbox prevents accessing the host filesystem. To get around this limitation, Emscripten provides a filesystem library, FS, implemented in JavaScript that emulates most of the functionality provided by libc and libcxx. The files are either provided as a static asset to download or embedded within the JavaScript wrapper. When the code in Fig. 3.4 is compiled, the calls to perform file I/Os are handled within this FS library on the JavaScript side. 1 #include <cstdio> 2 int main() { 3 FILE* file_ = std::fopen("input.txt","rb"); 4 if (file_) { 5 std::fseek(file_, 0l, SEEK_END); 6 std::fclose(file_); 7 } 8 return 0; 9 } Figure 3.4: Emscripten Issue #9562: Incorrect i64 Legalization. fig:emscripten-9562 Since JavaScript does not natively support 64-bit integers, passing the 64-bit integer values to JavaScript is usually handled by a method called legalization which converts the 64-bit value into two 32-bit integers holding low and high bits separately. Within the execution path to fseek(), an indirect call attempts to 30 pass a WebAssembly i64 value to exported WebAssembly function of a side module. The issue is that this other module’s export function has been wrapped in JavaScript code to support value legalization, so although the first module knows that the export function’s type is i64, the intermediate JavaScript function cannot accept the parameter. The issue is fixed by exporting legalized and non-legalized versions of WebAssembly functions so that function calls made through the indirect calls used here can pass i64 values to the appropriate function when legalization is not required. 3.3.2.3 Memory Model Difference Bug Causes subsubsec:memory-model We observe 12 bugs in Emscripten that are a result of the differences in memory model between WebAssembly and a native environment. In a native environment, memory is allocated directly from the main memory, while WebAssembly uses the data structures available in the host VM to allocate a block of memory to function as the module’s linear memory. Specifically, there are 5 bugs that do not update the memory location after the memory is relocated. 2 bugs are caused by unnecessarily disabling memory growth in combination with another functionality, such as building standalone modules. 1 bug does not free unused resources after they are no longer needed, resulting in increased memory usage. Another bug attempts to access memory beyond the intended range. 1 bug shifts the boundaries of a memory buffer incorrectly. A bug incorrectly leaves zero-filled memory regions in the initial memory file, increasing the size of the file. Fig. 3.5 shows an example [163] of the missing reference update bug. When a WebAssembly program allocates a large amount of heap memory, the memory might be relocated to a different location. If a program stores a memory location and does not update the location after the heap memory is relocated, it will cause a runtime exception because it refers to an invalid memory location. 31 This bug occurs when both WebAssembly memory growth and file system functionality are used (e.g., via MEMFS filesystem). When the WebAssembly module is initialized, the file content is stored in the heap section created in the module memory. The MEMFS filesystem is one of the Emscripten-emulated filesystems, and it supports in-memory file storage. It contains a reference to this location for future file operations. After the malloc(20000000), the memory is grown, and the heap is moved to a different location. However, the filesystem reference is not updated, and the file contents cannot be read. 1 #include <stdio.h> 2 #include <stdlib.h> 3 void main() { 4 int c; 5 malloc(20000000); // Enlarge memory 6 FILE *fp = fopen("test.c", "r"); 7 while ((c = fgetc(fp)) != EOF) 8 putchar(c); 9 } Figure 3.5: Emscripten Issue #5179: Missing Reference Updates. fig:emscripten-5179 The issue is fixed by forcibly enabling the ‘–no-heap-copy’ flag, which stores the file system in a separate array to allow it to grow freely without worrying about filesystem references. However, this slows operations involving the mmap() syscall. 3.3.2.4 Other Infrastructures Bug Causes subsubsec:other-infra We find 25 bugs reported in Emscripten that are caused by the tool of another infrastructure. The root causes of bugs related to the Other Infrastructure Bug Causes challenge can be grouped by the infrastructure where the bug is caused. Specifically, we find 12 bugs affecting the LLVM Wasm Backend and 1 bug affecting the LLVM C++ standard library. There are 5 bugs affecting Binaryen and 1 bug in Clang. We also observe 3 bugs located in Firefox, 2 bugs in V8, and 1 bug in Safari. For example, a bug was introduced into Emscripten when an update in Clang changed a default behavior when compiling with the options -g3 or -g4 (Emscripten Issue #7883 [359]). Previously, Emscripten would use value names found in LLVM IR to create the variable names in asm.js. An update in Clang discards 32 these value names when generating the IR in order to improve the performance. The Emscripten developers were not aware of this change in Clang. To fix this issue, the Emscripten developers utilize a new Clang flag (-fno-discard-value-names), which disables the new behavior and emits the value names in the IR. 3.3.2.5 Emulating Native Environment Bug Causes subsubsec:emulating-native We find that 23 bugs are related to the Emulating Native Environment challenge. Among the 23 bugs, 11 are in the emulated filesystem library and are caused by issues such as implicit dependencies, incorrect path resolving, and data truncation. 9 bugs are related to the pthread library and include issues such as thread scoping issues and incorrect termination. There are 3 bugs related to the socket library caused by issues such as unsupported functions. 3.3.2.6 Supporting Web APIs Bug Causes subsubsec:supporting-web-apis We find that 17 bugs are caused by the challenge of Supporting Web APIs. 11 of the bugs are related to the WebGL APIs behavior not matching with OpenGL behavior. 2 bugs involving callback ordering are related to the Fullscreen API. 2 bugs impact the IndexedDB APIs by not handling possible errors. 2 bugs affect the WebAPIs exposed through the SDL library. 3.3.2.7 Cross-Language Optimizations Bug Causes subsubsec:cross-lang-opt We find that the Cross-Language Optimization challenge produces 15 bugs. 9 bugs are caused by errors that lead to the optimizer marking a symbol as unused and removing it when it is needed by the code in the other language. 2 bugs are caused by syntactical mistakes in the optimizer code. 2 bugs are caused by errors in the optimizers variable scope tracking that prevent them from identifying all unused variables within the scope. 33 3.3.2.8 Runtime Implementation Discrepancy Bug Causes subsubsec:runtime-discrep Our results show that the Runtime Implementation Discrepancy challenge is responsible for 17 bugs. 5 of these bugs are related to Chrome API changes and behavior discrepancies. 1 bug is related to Safari and its immutable native objects. 1 bug is related to unsupported features in Internet Explorer. 2 bugs are related to NodeJS are caused by misusing V8 or built-in module APIs. 4 bugs are related to runtimes that do and do not support ES6 are caused by the behavior changes made between ES5 and ES6, including module export immutability and new APIs being introduced. 4 bugs are related to other runtimes through causes such as lack of fallback support, API changes, and performance issues. 3.3.3 RQ3: Bug Reproducing Analysis subsec:bug-reproducing Reproducing a bug is often the first step of the debugging process. However, some bugs may require a particular input, environment setting, or compiler version. We analyze each bug report and conversations in the bug issue to understand the challenges in reproducing bugs. Moreover, we inspect bug reports whether they contain all the critical information or not. Information in Bug Reports. Table 3.7 presents the critical information for bug reproduction included or discussed in the issues we check. “ID” represents the GitHub issue IDs. “Src.” indicates the source code of a program that causes the bug. “JS” means the JavaScript code snippet required to run the compiled WebAssembly program. “Stack” means a stack trace of the buggy program. “GT” represents the bug’s ground truth which includes the exact error message or expected values that can determine whether the bug is reproduced or not. “Opt.” and “Ver.” represent WebAssembly compiler options and versions used. “Env.” means the runtime environment (e.g., browser name and version). “Wasm” means the compiled WebAssembly program. The symbol means that the information is provided in the bug report. Otherwise, they are not included in the report. 34 Table 3.7 shows the results from a subset of the bug reports. A complete list can be found on [216]. Observe that most bug reports include sources, stack traces, ground truth, and compiler options, while information for compiler versions and runtime environments are relatively less frequently included. Moreover, compiled WebAssembly files are rarely included. Our manual investigation shows that those reports including WebAssembly files are typically high-quality reports. From those observations, we realize that an automated approach to create informative bug reports is highly desirable. Specifically, when a compiler generates a WebAssembly program, information for all the columns of Table 3.7 can be collected to create a bug report file, similar to memory dump files containing various information about the environment. Bug Reports Lacking Information. We further investigate bug reports that required significantly more effort in reproducing the bugs. Table 3.8 shows such cases. Note that we introduce the Þ symbol to represent information added after the initial report, requested mainly by the developers. It takes some time for developers to request additional information for the reports with many Þ symbols. Compared with Table 3.7, sources (“Src.”) and ground truth (“GT”) are not frequently included in those reports, while those are critical in reproducing the bugs. Compile options and compiler versions are not well provided, and none of the reports includes compiled WebAssembly programs. The last five cases are the ones that developers express difficulty in reproducing. In particular, the initial report of #7409 lacks critical information, leading to many conversations with the developer to provide the missing information. Overall, our analysis shows that many of the bug reports lack sufficient information to quickly reproduce reported bugs. We find that this lack of information can lead to longer debugging time in the compiler project. Compiler developers should explore methods to ensure as much useful information is reported in these bug reports. 35 Table 3.7: Information Included in the Bug Reports. table:bugreport_details Cat. ID Src.1 JS2 Stack3 GT4 Opt.5 Ver.6 Env.7 Wasm8 Incompatible Data Types 3487 3787 3788 3789 3849 3892 4251 5031 5370 6309 7199 7208 Asyncify Synchronous Code 3141 3908 4046 5716 6724 6727 6738 6804 6818 7988 9823 10051 Memory Model Differences 3636 3907 5179 5187 5585 6359 7409 8637 9497 9587 9808 10179 1. means the source code is available. 2. the JavaScript support code is available. 3. means a stack trace is provided. 4. means the ground truth of the expected output is listed. 5. means the compiler options used are listed. 6. means the version of the compiler used is listed. 7. means the runtime information is provided. 8. means the compiled WebAssembly binary is available. 36 Table 3.8: Bug Reports Where the Bugs Are Difficult to Reproduce.1,2 table:difficult_to_reproduce ID Src. JS Stack GT Opt. Ver. Env. Wasm 3778 3857 Þ 3861 Þ 3892 4046 Þ 4122 4646 5797 Þ Þ Þ 6169 6442 7146 Þ Þ Þ 7472 Þ 9091 9319 Þ 9650 10205 Þ Þ 10317 Þ 10385 10675 3824 6534 7409 Þ Þ Þ Þ 8001 10233 1. Refer to Table 3.7 for column headers. 2. Þ means the respective information was added after the initial post. 3.3.4 RQ4: Bug Fixing Strategies We investigate the different strategies used to fix these Emscripten bugs. We determine the bug fix strategy by reading the issue conversation to see if the developers explicitly mention the fix used. If it is not mentioned, we inspect the fixing commit, which is the last commit before the issue is closed. For each compiler challenge category, we group bugs with similar fixes into categories developed using an inductive coding approach on the fix descriptions. Note in all categories we omit low-frequency categories. 37 Table 3.9: Asyncify Synchronous C/C++ Code Bug Fixes. table:asyncify-fix Asyncify Tool Strategy Count Emterpreter Improve State Checking 2 Improve Emterpreter Parsing 4 Improve Function Whitelisting 2 Throw Exception 1 Asyncify Add Sleep Callback 1 Animation Add Warning Message 1 IndexedDB Add Documentation 1 Total 12 Table 3.10: Incompatible Data Types Bug Fixes. table:incomp-type-fix Data Type Strategy Count Native Fix/Bypass Legalization 5 Add/Improve Type Support 4 Add Documentation 2 Provide Workaround 1 Fix/Remove Emitted Type Operations 6 Custom Change Type Used 2 Provide Workaround 1 Undefined Add Missing Function 2 Total 23 3.3.4.1 Asyncify Synchronous Code Bug Fix The fixing strategies used to resolve the Asyncify Synchronous C/C++ Code issues are listed in Table 3.9 and can be grouped by the tool used. Bugs caused by a fault in the Emterpreter tool were fixed by extending the internal state checking, improving the Emterpreter parsing, improving the function whitelisting functionality, or throwing an exception message to prevent misuse. Bugs caused by faults in the Asyncify tool were fixed by handling the missing sleep callback that led to inconsistent behavior compared with Emterpreter. Bugs caused by misuse of animation APIs were fixed adding a warning message against the incorrect usage. Bugs caused by a fault related to IndexedDB were resolved by updating the documentation to mention the buggy behaviors. 38 3.3.4.2 Incompatible Data Type Bug Fix The bug fixing strategies applied on Incompatible Data Types bugs, listed in Table 3.10, can be broken down by whether the root causes affected native WebAssembly types or special types not native to WebAssembly. To fix Native Types issues, the following fixing strategies were applied. The Fix/Bypass Legalization strategy changes the JS-Wasm interfaces to either fix missing value legalization wrappers or disable unnecessary value legalization. The Add/Improve Type Support fixing strategy adds code to handle the unimplemented data types or improves the already-present code to handle a missing operation. The Add Documentation category describes the faulty behavior in the compiler documentation. In the Provide Workaround strategy, the compiler developers give the reporter a temporary solution to avoid triggering the fault while performing the originally intended action. To fix issues involving Custom Types, the following strategies were applied. The Fix/Remove Emitted Type Operations fixing strategy changes the faulty code to either fix or remove the invalid type or related operations that caused the fault. Change Type Used fixes entail changing the variable type used that caused the faulty behavior. The issues caused by Undefined Cross-Language Type Functions were fixed by adding the missing functions (Add Missing Function strategy). 3.3.4.3 Memory Model Difference Bug Fix The bug fixing strategies applied to the 12 Memory Model bugs are as follows. 4 bugs fix the code obtaining memory references to change when the linear memory undergoes a change, such as memory growth. 2 bugs are fixed by releasing unused memory objects. 2 bugs change the operations calculating the boundaries of memory regions to prevent going out of them. 2 bugs change the allocation method used to remove the faulty method. 39 3.3.4.4 Other Infrastructure Bug Fix The bug fixing strategies applied to Other Infrastructure bugs can be grouped by the related project. We find that the bugs in this challenge delegate the other infrastructure to fix the issue, including 11 wasm-ld bugs, 1 LLVM WebAssembly Codegen bug, 1 libcxx project bug, 2 V8 bugs, 3 Firefox project bugs, 3 asm2wasm bugs, and 1 wasm-opt bug. The Use Workaround fixing strategy is used on 3 bugs related to Binaryen, Safari, and Clang to avoid calling the code triggering the bug in the other infrastructure. 3.3.4.5 Emulating Native Environment Bug Fix The bug fixing strategies applied to the 23 Emulating Native Environment are as follows. 6 bugs change the compiler options to automatically export the necessary dependency APIs when compiling. 3 bugs improve the functions gathering properties on the files or paths in the filesystem. 2 bugs remove any functionality that implicitly leads to a filesystem library dependency. 3 bugs change the code that sets global variables to also set those variables within the worker thread’s scope. 2 bugs improve the release of used resources more reliably. 3.3.4.6 Supporting Web APIs Bug Fix The fixing strategies applied to the 17 Supporting Web API bugs are as follows. 3 bugs add function cases to the list of supported WebGL extensions. 3 bugs wrap the faulty code in type checking to prevent accessing non-existent fields. 2 bugs were not fully fixed by linked commits. 3 bugs change the event listeners used to avoid faulty behavior. 2 bugs impacting IndexedDB allow errors to be handled with try-catch statements rather than being hidden. 40 3.3.4.7 Cross-Language Optimizations Bug Fix The fixing strategies applied to the 15 Cross-Language Optimization bugs are the following. 7 bugs change the code to prevent optimizers from changing the variable or field name so that it matches in both JavaScript and WebAssembly. 2 bugs add function definitions provided by the runtime environment to prevent the optimizer from marking the functions as undefined. 2 bugs emit a warning message describing the faulty behavior when it is called. 2 bugs correct typographical mistakes in the implementation of the optimizers. 3.3.4.8 Runtime Implementation Discrepancy Bug Fix The fixing strategies applied to the 17 Runtime Implementation Discrepancy bugs are as follows. 5 bugs patch the affected code to avoid the runtime behavior discrepancies. 4 bugs add code to handle the cases where a feature is not implemented so that execution can continue. 3 bugs change the code logic to conform to updated runtime APIs. 3 bugs fix the checks that determine what properties or operations the environment has or supports. 2 bugs change the code to enable the use of particular runtime behaviors that improve performance. 3.3.4.9 Unsupported Primitives Bug Fix subsubsec:unsup-primitive The only fixing strategy applied to WebAssembly Limitation bugs is the Provide Workaround strategy to implement the unsupported functionality through different WebAssembly features. 3.4 Study II: Quantitative Study sec:study2-quantitative In the second study, we perform a quantitative analysis on 1,054 bug reports collected from three compilers, AssemblyScript, Emscripten, and Rustc/Wasm-Bindgen, to understand the lifecycle of these bugs, the impacts that they have on compiled programs, the sizes of bug-inducing inputs, and the sizes of the fixes applied. The bug lifecycle shows how responsive compilers are in dealing with new bugs. Ideally, most 41 bugs should be solved within one day [289]; however, our results show this is not the case. Investigating the impacts that bugs can cause on miscompiled programs is important in understanding the severity of the bugs that these compilers face. Understanding the sizes of bug-inducing inputs reveals the average code complexity needed to trigger bugs in these compilers, providing guidance for designing test cases. Inspecting the sizes of bug fixes reveals how widespread the bug impact is in the code. 3.4.1 RQ5: Lifecycle of Bugs subsec:bug-lifecycle We analyze the duration between the time a bug is reported and the time the bug is fixed. We consider a bug as fixed when it is closed after a commit is referenced. If the bug is reopened, we use the time of the last closing event as the end of the duration. Fig. 3.6 presents the cumulative distribution of bug lifecycles. Rustc/Wasm-Bindgen, AssemblyScript, and Emscripten were able to fix 35.1%, 27.5%, and 23.6% of their bugs within 1 day, respectively. Within 10 days, the three compilers fixed over 50% of their bugs. These results show that the three compiler projects fall short of the ideal same-day fix turnaround time [289]. This should be taken into consideration when deciding to use WebAssembly in a production-level project. Figure 3.6: Cumulative Distribution of Lifecycle of Bugs. fig:bug_duration 42 Table 3.11: Impact Categories for Each Compiler. table:impact-categories Impact Assembly- Emscr- WasmScript ipten Bindgen Build-Time Build Error 23 54 18 Compile-Time Compile Error 50 151 63 Linker Error 0 2 22 Code Bloating 0 12 1 Runtime Crash 1 61 14 Data Corruption 0 7 2 Fail to Instantiate 2 5 4 Performance Drop 1 2 3 Hang 0 2 0 Incorrect Functionality 5 59 14 Other Runtime Error 25 75 17 Total 107 430 158 3.4.2 RQ6: Impact of Bugs subsec:bug-impacts We manually inspect all 695 unique bugs in the three compilers, listed in Table 3.11, to find out whether the errors occurred at the compiler build time, program compile time, or runtime: (1) Build-Time Errors prevent the compiler itself from successfully compiling [40, 29]. (2) Compile-Time Errors occur during the process that compiles source programs to WebAssembly binaries, including: (a) Compile Error fails to compile correct source programs (with no syntax errors) to WebAssembly binaries. (b) Linker Error fails to link the WebAssembly output with necessary libraries such as stdlib. (c) Code Bloating increases the size of the compiled WebAssembly file but does not affect the functionality [92, 2]. (3) Runtime Errors occur during the execution of a generated WebAssembly binary. The impacts of runtime errors include: (a) Crash leads to unrecoverable exceptions at runtime [73], halting the execution [298]. (b) Data Corruption corrupts the data stored in the output modules by losing or changing stored information [275]. (c) Failure to Instantiate fails to instantiate the WebAssembly binary because of inconsistencies with the wrapper code. (d) Performance Drop causes a noticeable slowdown in runtime performance when executing WebAssembly [99, 252]. (e) Hang stops responding to browser events [131]. (f) Incorrect 43 Functionality results in functionality inconsistent with what the compiled source code specified [108]. (g) Other Runtime Error does not fit into the above categories, such as missing debugging information [1]. Table 3.11 shows the number of bugs by their impacts for the three compilers. We observe a significant portion of runtime errors. Specifically, 49.1%, 34.2%, and 31.8% of the bugs in Emscripten, Wasm-Bindgen, and AssemblyScript, respectively. 3.4.3 Bugs in Existing Compiler Projects WebAssembly compilers often rely on components of existing compiler projects such as LLVM and Clang. We find 43 bugs are located in external compiler infrastructures, including 32 Emscripten bugs (16 LLVM, 12 Binaryen, 4 Clang), 10 Rustc/Wasm-Bindgen bugs (LLVM), and 1 AssemblyScript bug (Binaryen). Those bugs happen because the compiler developers misunderstand external projects’ behaviors [266, 85] or updates on the external projects break assumptions made by developers [358, 52]. Note that the counts for Emscripten differ than those in Section 3.3 due to differences in the bug selection criteria between the datasets. 3.4.4 Testing and Fixing Bugs subsec:fix-sizes 3.4.4.1 Size of Bug-Inducing Test Inputs 10 20 30 40 50 60 70 80 90 100 LOC in Test Case 25% 50% 75% 100% % of Bugs AssemblyScript Emscripten Rustc/Wasm-Bindgen Figure 3.7: Cumulative Distribution of Input Sizes. fig:test_case_loc 44 Fig. 3.7 shows the distribution of the lines of code of the bug-inducing inputs that are given in the issue postings to reproduce the bugs, including source code and compiler options. Note that in our dataset, only 340 (48.9%) issues include the bug-inducing inputs. A large portion of bug-inducing inputs in all three compilers (183, 53.8%) have 10 or fewer lines of code, and 262 (77.1%) bugs-inducing inputs have 20 lines or fewer. In some cases, we observe a large program was provided initially as a bug-inducing input [299, 201, 313]. Later on, multiple posts on the same issue gradually developed to minimize the size of the bug inputs [314, 342, 80]. This suggests that techniques that can minimize testing inputs [357, 198, 128, 139] are desirable. 3.4.4.2 Size of Bug Fixes We analyze the size of bug fixes in terms of the lines of code. Among all compilers, 58.4% of all bugs (Emscripten: 43.7%, Rustc/Wasm-Bindgen: 34.2%, AssemblyScript: 24.3%) of the bugs have been fixed with 10 or fewer LOC. Over 96% (Emscripten: 74.2%, AssemblyScript: 71% , Rustc/Wasm-Bindgen: 69%) of all bug fixes have 100 or less LOC. On the other hand, two compilers, AssemblyScript (213.6 LOC) and Emscripten (189 LOC), have the bug fixes with an average LOC greater than 100. These large fixes are usually the result of the compiler developers incorporating many changes into a single commit, rather than relating to the complexity of the issue. For example, in AssemblyScript, a bug involving missing functions when importing from a file was fixed in the same commit the developer cleaned the project, inflating the lines of code changed [94]. 3.5 Discussion sec:discussion Our findings can be found in Table 4.1, and we highlight the most insightful ones here. We also discuss the threats to validity and a limitation in our bug fix identification strategy. 45 Qualitative Study’s Findings. Through our qualitative study, we find several interesting trends in the Emscripten compiler bugs. Finding 1 shows that Incompatible Data Types bugs make up 15.75% of the 146 bugs inspected. We find many of these bugs originate from interfaces relating to string handling (e.g., printf) and filesystems (e.g., fseek) rather than numeric interfaces. This finding can help developers diagnose similar bugs that arise by providing them with code locations to investigate. Finding 4 shows that changes and bugs in existing infrastructures can cause bugs. Compiler developers need to follow the development of leveraged infrastructures more closely to prevent these bugs. Finding 7 reveals that many bug reports fail to include critical debugging information, including the compiler version, environment, or source code used that triggers the bug. Compiler developers should include automatic reporting tools to include this information when submitting a bug report to aid in debugging. Quantitative Study’s Findings. Through our quantitative study, we obtain some insights into these compiler projects. For example, Finding 9 shows that 77.1% of bug-inducing inputs used were less than 20 lines of code, and developers frequently reduce this manually. This suggests that many bugs in these compilers can be reproduced by small inputs, which favors the use of automated input reduction techniques. Threats to Validity. Similar to other empirical studies, our study is potentially subject to several threats, namely the representativeness of the chosen compilers, the generalization of the studied bugs, and the correctness of the analysis methodology. Regarding the representativeness of the chosen compilers, we choose three compilers that are the most popular and actively maintained WebAssembly compiler projects. Another threat concerns the generalization of the studied bugs. We uniformly use all bug issues satisfying the selection criteria stated in Section 3.2.2. We exclude bugs that were found to be irrelevant to WebAssembly after manual inspection. To ensure correct results, we only study fixed bugs because unfixed or unconfirmed reports may not be real bugs. Regarding the correctness of the analysis methodology, aside from the analysis of test case LOC and impact, we automate all other analyses mentioned in this paper. The manual inspections on bugs to identify 46 the sizes of test cases and impacts might be biased due to our inference of the test cases. To reduce this threat, three authors analyzed these bugs separately and discussed inconsistent results until an agreement was reached. Sizes of Bug Fixes. Bug fixes may also contain feature updates that are not relevant to the bugs. Moreover, fixes for some design bugs require significant changes in the underlying code base, resulting in large bug fixes. In general, identifying bug-fix relevant parts from a software patch is a challenging problem. In our studies, we do not aim to distinguish this, and we observe a few such cases result in large bug fixes. However, from our manual inspection results shows that those are exceptional cases, and they do not affect our key findings and observations. 3.6 Summary sec:bug-study-summary In this chapter, we conduct two empirical studies. In the first study, we perform a qualitative analysis on 146 bugs in Emscripten and analyze their root causes. We conduct a quantitative analysis on 1,054 bugs in three open-source WebAssembly compilers, namely Emscripten, Rustc/Wasm-Bindgen, AssemblyScript, and to reveal various aspects of these bugs, including lifecycles, impacts, locations, and sizes of bug fixes. Our analysis results can help researchers and WebAssembly compiler developers to gain a deeper understanding of WebAssembly compiler bugs and provide guidance toward effective testing and debugging techniques for WebAssembly compilers and applications. 47 Chapter 4 When Function Inlining Meets WebAssembly: Counterintuitive Impacts on Runtime Performance chp:wasm-function-inlining As discussed in Chapter 3, WebAssembly applications rely on compilers to be constructed. Rather than being constructed from scratch, these compilers often leverage existing infrastructures to handle the frontend (parsing), middle-end (optimization), and backend (code generation) tasks of compilation. These middle-end optimization passes rely on static analyses to identify potential optimization sites that should benefit from the pass. It is assumed that WebAssembly fits seamlessly within this model as just another instruction set target. However, does this assumption actually hold up in practice? Are there unexpected downsides to leveraging existing compiler infrastructures with WebAssembly? In this chapter, we counter the validity of this assumption by performing an empirical study on the impact of function inlining optimizations on WebAssembly runtime performance. We inspect the inlining optimization passes of the LLVM and Binaryen infrastructures used in the Emscripten C/C++- to-WebAssembly compiler. Our investigation on 127 C/C++ samples from the LLVM test suite shows that 66 samples exhibit counterintuitive behavior due to function inlining, particularly from inlining hot functions into long-running functions. That is, rather than improving runtime performance as expected, applying the function inlining optimization leads to performance detriments. We find that existing optimizations should be revisited to factor in the unique characteristics of WebAssembly compilation and execution. 48 This chapter draws content from its corresponding publication of the same title [258]. The main author of this publication is also the author of this dissertation and designed the study, performed the evaluation, and wrote the majority of the published paper. 4.1 Motivation and Contributions WebAssembly compilers leverage the same compiler infrastructures as compilers of traditional languages. For example, the Emscripten C/C++-to-WebAssembly compiler [65], the Rustc compiler [343], and Intel’s oneAPI compiler [142] all use the LLVM [294] compiler infrastructure. Unfortunately, we observe that WebAssembly compilers leverage existing infrastructures without considering the differences between WebAssembly and native applications. Execution Pipeline Liftoff TurboFan JavaScript starts Wasm module Main Thread $f1 $f2 1 JavaScript calls $main 2 $main calls $f2 4 $main calls $f1 5 $f2 $f1 $f1 $main $main calls $f1 3 $f1 $main $f1 $f2 $f2 $main JavaScript $main Compilation Threads Figure 4.1: Chromium Tier-Up Process. In this example, function $main uses the Liftoff-generated code when first called as it is the only code available. $main calls $f1 which only has Liftoff code ready. $f2 uses the TurboFan-generated code as it is available at the first call. On the second call to $f1, its TurboFan-generated code is available and used for the call. fig:tier-up-process One of the substantial differences is that WebAssembly has the additional compilation layer at runtime running within browsers, generating the final machine code for WebAssembly instructions. Browsers, such as Chromium [48] and Firefox [282], typically include at least two WebAssembly compilers: a fast compiler emitting unoptimized code and a slow compiler emitting highly optimized code. Browsers use both compilers to ensure the machine code for WebAssembly functions is available early and can perform 49 faster once the optimized code is available. When the optimized code is ready, the code is tiered-up on the following function call invocation by replacing the unoptimized code with the optimized code. The tiering-up process only occurs on a function call because the unoptimized and optimized machine codes are not interchangeable. Figure 4.1 illustrates the Chromium tier-up process. This tiering-up process complicates the effects of traditional optimization techniques such as function inlining, which moves function code into function call sites to reduce context switches. In doing so, function inlining reduces the number of call invocation sites for the tiering-up process to occur. This can cause an undesirable side effect: slowing down the runtime performance. Figure 4.2 shows that function inlining leads to worse runtime performance for samples within the LLVM test suite [193]. This figure compares the runtimes of the samples compiled to the O3 optimization level with function inlining enabled to runtimes of the samples with function inlining disabled. The runtime can slow down by as much as 15.5×. In all the samples, function inlining decreases the number of call sites, leading to fewer opportunities for browsers to tier-up functions to more-optimized machine code. 2mm.c 3mm.c adi.c flops.c heapsort.c heapsort.cpp himenobmtxpa.c jacobi-1d-imper.c linpack-pc.c lpbench.c nsieve-bits.c puzzle.c ReedSolomon.c 0 20 40 60 80 100 400 700 1,000 1,300 1,600 Runtime % Increase 47 23 14 24 60 65 62 10 100 70 76 1,546 102 50 47 53 24 41 41 37 55 29 41 47 62 55 Runtime Number of Function Call Sites 0 20 40 60 80 100 400 700 1,000 1,300 1,600 Function Call % Decrease Figure 4.2: Function Inlining Slows Runtime Performance in Chromium. Using Emscripten with the O3 optimization flag, the green bars show the % runtime speedup in the samples when function inlining is enabled compared to when inlining is disabled. The blue bars show the % decrease in the number of function call sites when inlining is enabled. fig:wasm_og_v_noinline_O3 There has been much work studying compiler optimization techniques. Previous work has looked at the impacts of optimizations on specific platforms [28] and how optimizations affect SIMD performance [134]. 50 Table 4.1: Findings and Implications of Our Study. table:findings-implications-summary Findings Implications 1 We identify counterintuitive function inlining behavior between WebAssembly’s compilation pipeline and execution pipeline. We show that function inlining slows WebAssembly runtime performance in some samples by as much as 15.5×. 2 We find that function inlining can introduce counterintuitive behavior in 51.97% (66/127) of the studied WebAssembly modules. We investigate the characteristics of the inlined functions and find that larger code sizes can introduce the counterintuitive behavior. 3 We show that modifying Binaryen’s inlining pass to avoid inlining hot functions reduces the counterintuitive behavior in 73.21% (41/56) of the modules. This finding motivates improving inlining heuristics for hot functions. 4 We are the first study to perform an indepth investigation into the runtime impacts of a traditional optimization technique on WebAssembly binaries. Our findings can motivate future work investigating the effects of other traditional compiler optimizations on WebAssembly modules. Some works propose optimization selection strategies leveraging machine learning [155, 204]. While compiler optimizations are an active area of study, to the best of our knowledge, there are no systematic studies on the effects of function inlining in the WebAssembly compilation pipeline. We investigate function inlining used in Emscripten [65], a widely used C/C++ to WebAssembly compiler. We also inspect the inlining passes provided by two infrastructures used in Emscripten, LLVM [294] and Binaryen [337]. The findings and their implications of this study are listed in Table 4.1. 4.2 Background on WebAssembly Compilation and Execution Pipeline 4.2.1 WebAssembly Compilation Pipeline We illustrate the WebAssembly compilation pipeline using Emscripten [200], a compiler that converts C/C++ code to WebAssembly. Internally, Emscripten makes use of the Clang compiler [49] for its frontend and uses components from LLVM [294] and Binaryen [337] in its backend. Binaryen is a library providing tools to construct compiler infrastructures targeting WebAssembly. Binaryen defines an intermediate 51 representation (IR) that closely models WebAssembly and enables WebAssembly-specific optimizations such as IR flattening to remove nested side effects and function reordering to shrink the encoding of the most called functions [337, 356]. (func $a i32.const 123 i32.const 1024 i32.store) (func $b call $a i32.const 1024 i32.load return) (func $b i32.const 123 i32.const 1024 i32.store i32.const 1024 i32.load return) Before function inlining int x; function a() { x = 123; } function b(){ a(); return x } Source code After function inlining Figure 4.3: Binaryen Function Inlining. In the original C code, function b calls function a. The WebAssembly code before inlining shows how both a() and b() are separate. After inlining, the instructions of a() are inlined into b(). fig:function-inlining The Emscripten compilation pipeline begins by inputting the source code files into the Clang compiler. This compiler uses its Frontend component to parse the source files into an LLVM IR. The IR is passed to the Middle-end component, which implements several optimization passes, including the inline pass that performs function inlining. The pass moves instructions of a called function to the function call location, but first it checks to ensure that the inlined instructions can safely replace the call. For example, inlining indirect or external calls may break the program semantics. Additionally, the pass estimates the performance cost of inlining a function (e.g., a heuristic on the function’s code size) to determine if it is beneficial to inline. The Middle-end component passes the optimized IR to the CodeGen component to create a WebAssembly module. Next, the module is passed to Binaryen’s wasm-opt tool [337], which applies Binaryen’s set of optimization passes to the module. In Binaryen, function inlining is performed by the inlining-optimizing pass. Similar to the inline pass in LLVM, the inlining-optimizing pass moves function instructions into the location of the original call site if the calculated inlining cost is less than a threshold value. Differences between these passes include the IR structures that are inlined as LLVM can also inline its block structures. Besides, Binaryen can support partial inlining of early-return conditional statements [337]. 52 Figure 4.3 illustrates Binaryen’s function inlining. Finally, the compilation pipeline outputs the optimized WebAssembly binary and JavaScript support code. #define IM 139968 #define IA 3877 #define IC 29573 inline double gen_random(double max) { static long last = 42; last = (last * IA + IC) % IM; return( max * last / IM ); } int main(int argc, char *argv[]) { ... int N = 400000000; double result = 0; while (N--) { result = gen_random(100.0); } ... return(0); } (func (;13;) (param i32 i32) (result i32) (local i32 i32 f64) ... loop i32.const 3877 i32.mul i32.const 29573 i32.add i32.const 139968 i32.rem_s ... local.get 3 i32.const -1 i32.add local.tee 3 br_if 0 end ...) (export "main" (func (;13;) )) (func (;13;) (param i32 i32) (result i32) (local i32 f64) ... loop call 14 local.get 2 i32.const -1 i32.add local.tee 2 br_if 0 end ... (func (;14;) (param f64) (result f64) (local i32) i32.const 3877 i32.mul i32.const 29573 i32.add i32.const 139968 i32.rem_s ...) (export "main" (func (;13;) )) 1 2 3 4 5 6 7 8 9 10 22 23 24 25 26 38 39 201 202 203 225 226 227 228 229 230 231 243 244 245 246 247 248 290 101 102 103 125 126 127 128 129 130 131 132 154 155 156 157 158 159 160 161 162 180 (a) C++ source code (b) Wat not inlined (c) Wat inlined Figure 4.4: random.cpp C++ and WebAssembly Code. (a) shows the C++ code of two functions, gen_random and main. (b) shows the WebAssembly output compiled with function inlining disabled. Function 13 is the main function, and inspecting its loop code reveals function 14 is gen_random. (c) shows function 13 from the module compiled in the Baseline experiment. The file differences between (b) and (c) show that function 14, gen_random, has been inlined into function 13, main. fig:random-cpp-inlined-figure 4.2.2 WebAssembly Execution Pipeline The generated WebAssembly module and JavaScript files are run by a browser such as Chromium [48] or Firefox [218], which each have different internal compilers to generate machine code for the WebAssembly module. For example, Chromium is powered by the V8 JavaScript and WebAssembly engine [308], which includes two compilation engines to generate machine code for WebAssembly. The first compiler, Liftoff [21], is a single-pass compiler that emits machine instructions immediately after reading in a WebAssembly instruction at the expense of the number of optimizations that it applies. As a result, the Liftoff code can perform sub-optimally when executed. The second compiler, TurboFan [304], is a multi-pass compiler that applies several optimization passes to the machine code. While TurboFan generates faster code, this compiler takes much longer to generate code than Liftoff. To balance start-up speed with execution performance, 53 Chromium first generates code for WebAssembly functions with Liftoff and immediately starts the TurboFan compilation. When the TurboFan code for a function is ready, the function code tiers-up by replacing the Liftoff code with the TurboFan code. Firefox uses the SpiderMonkey JavaScript and WebAssembly engine [282] to handle WebAssembly execution. Similar to V8, SpiderMonkey contains two compilation engines for WebAssembly. The first compiler, Wasm-Baseline, performs a fast translation of WebAssembly instructions to machine code for quick startup. The second engine, Wasm-Ion, applies optimizations on the emitted machine code. SpiderMonkey follows the tiering-up scheme by using Wasm-Baseline to emit machine code quickly while Wasm-Ion generates better-performing machine code. 4.3 Methodology We aim to understand the counterintuitive effects of function inlining on WebAssembly program runtime. We define a counterintuitive effect as producing a binary with a slower runtime performance than if the optimization was disabled. Specifically, we focus on the following research questions: • RQ1 – Significance: How often does function inlining counterintuitively impact WebAssembly modules, and are the effects unique to WebAssembly? • RQ2 – Function Characteristics: Which characteristics of the inlined functions cause the counterintuitive behavior? • RQ3 – Quantification: How does excluding certain functions from inlining impact the counterintuitive effects? To answer these questions, we use samples from the LLVM test suite to perform five sets of experiments. Next, we discuss the C/C++ source programs and the experiments in detail. 54 Table 4.2: LLVM Test Suite SingleSource/Benchmarks. table:llvm-test-suite-stats Subdirectory #Samples #LOC Subdirectory #Samples #LOC Adobe-C++ 6 696 Misc-C++ 7 1,341 BenchmarkGame 8 549 Misc-C++-EH 1 16,846 CoyoteBench 4 1,294 Polybench 32 5,188 Dhrystone 2 767 Shootout 14 663 Linpack 1 552 Shootout-C++ 25 972 McGill 4 956 SmallPT 1 102 Misc 27 3,487 Stanford 11 1,332 Total 143 34,745 4.3.1 C/C++ Source Programs To measure the runtime performance impacts of different optimization configurations, we select 143 C/C++ samples totaling over 34,000 lines of code (LOC) from the LLVM test suite [190]. The test suite contains benchmarking samples measuring LLVM compilation performance. We focus on the samples within the SingleSource/Benchmarks directory, listed in Table 4.2, as these samples are designed to trigger optimizations and can be compiled by Emscripten without code changes. We select this test suite for its inclusion of samples used in prior works and its ease of compilation. This test suite includes samples from the Polybench benchmark suite [246], which was used by Jangda et al. to compare WebAssembly and native runtime [144]. The remaining samples implement samples of comparable computational complexity to those in the Polybench suite, such as Fibonacci number computation [191], Cholesky factorization [192], and Huffman compression [193]. In addition, the samples within the SingleSource directory of the test suite are designed so that only a single file is needed for compilation. The compilation settings needed to compile each sample are also well documented within the test suite repository. For these reasons, we choose the LLVM test suite samples for our experiments. We omit 16 samples that do not compile successfully with Emscripten, leaving 127 samples for our experiments. 55 4.3.2 Experiments Inspecting Inlining Effects We start our investigation by establishing baseline runtime measurements using the four optimization level options, i.e., O0, O1, O2, and O3 [237], available to an end user of the compiler. For our study, a binary is faster if its runtime is lower than that of another binary. We describe the details of each optimization level below. • O0: means no optimizations. The compiler compiles the source code without any attempt to optimize the code. • O1: applies basic optimizations, such as loop simplification and redundant instruction combinations [194, 12]. • O2: adds more passes than O1 while balancing between running time improvement and code size reduction. • O3: contains all optimizations in O2 and enables optimizations that increase code size to improve runtime. To identify counterintuitive behavior caused by function inlining, we disable function inlining in each of the optimization levels and measure the runtime in each sample using both Chromium and Firefox. We focus on finding WebAssembly-specific issues surrounding inlining optimizations, so for every sample, we compile to the native x64 architecture and measure the runtime using the same optimization levels. Our analysis focuses on samples where the runtime behavior is intuitive on the native architecture and counterintuitive on WebAssembly. We then separately disable the LLVM inline and Binaryen inliningoptimizing passes in each optimization level to understand how each pass introduces the counterintuitive behavior. To do so, we construct modified builds of the LLVM and Binaryen that selectively disable passes. For the samples compiled to the native x64 architecture, we only disable the inline pass in LLVM as Binaryen is for WebAssembly only. We measure the sample runtimes when the inlining passes are enabled and 56 Table 4.3: Experiments Testing Function Inlining Effects. table:methodology-overview Experiment Platform Opt. Levels LLVM Inlining Enabled Binaryen Inlining Enabled Baseline Wasm O0–O3 ✓ ✓ 1 Wasm O0–O3 ✗ ✗ 2 x64 O0–O3 ✗ N/A 3 Wasm O0–O3 ✗ ✓ 4 Wasm O2–O3 ✓ ✗ 5 Wasm O2–O3 ✓ Patched disabled. The runtime behavior is counterintuitive if disabling the inlining pass results in a faster runtime than if the inlining pass is enabled. To explore the causes of the counterintuitive behavior in the WebAssembly samples, we use the Linux perf tools to inspect the fine-grained runtime details of the execution. Specifically, we use the perf record tool to record which WebAssembly functions are executed and which browser compiler code they use. We also measure the overall percentage of execution time spent within these WebAssembly functions. We investigate the execution of native x64 samples using the perf stat tool. This tool records key hardware and software events, such as cache misses, branches taken, and CPU cycles, which allow us to understand the counterintuitive behavior. Table 4.3 presents the setups of our five experiments for inspecting the effects of function inlining. Similar to other studies [208, 207, 219, 243, 307], we use the average runtime of 10 runs for our analyses in all our experiments. • Baseline: We measure the runtimes of the 127 benchmarking samples compiled to WebAssembly with the optimization levels O0-O3 in their default modes, i.e., with both the LLVM inline and Binaryen inlining-optimizing passes enabled. • Experiment #1: We compare the runtimes of all WebAssembly samples with O0-O3 having the LLVM inline and Binaryen inlining-optimizing passes disabled. 57 • Experiment #2: We compile each sample to two versions of an x64 executable: one version with the LLVM inline pass enabled against another version with inline disabled. • Experiment #3: To understand which inlining pass contributes to the counterintuitive behavior more, our third experiment compiles the 127 WebAssembly samples with only the LLVM inline pass disabled. • Experiment #4: We compile the 127 samples with only the Binaryen inlining-optimizing pass disabled. • Experiment #5: We inspect the samples that experience counterintuitive effects in Experiment #4. We patch the inlining-optimizing pass to selectively disable inlining for hot functions. The runtimes of samples using this patched pass are compared against those using the original pass. 4.4 Evaluation In this section, we present findings and insights from our five sets of experiments investigating the counterintuitive function inlining effects in WebAssembly modules. Summary of Results. We find that disabling inlining in both LLVM and Binaryen causes counterintuitive behavior in 66 of the 127 (51.97%) samples run in Chromium and Firefox. Only 12 of these samples experience similar behavior on the native x64 platform, so we exclude these samples from our investigation∗ . We run two sets of experiments where one disables only LLVM’s inline pass and the other disables only Binaryen’s inlining-optimizing pass. LLVM’s pass impacts 44 samples, while Binaryen’s pass causes counterintuitive results in 56 samples. Further investigation of Binaryen’s inlining-optimizing pass identifies hot functions as a probable cause of the counterintuitive effect. To quantify their impact, we modify the pass to prevent inlining for hot functions. 41 of the 56 samples experience improved runtime with the patched pass, indicating that hot functions inlined into long-running functions are a major cause of the counterintuitive behavior and further work should focus on improving inlining heuristics for hot functions. ∗ Such cases are not unique to WebAssembly, hence not our focus. 58 2mm.c 3mm.c adi.c chomp.c dt.c durbin.c fdtd-2d.c fibo.cpp flops.c gemm.c gemver.c heapsort.c heapsort.cpp hello.cpp himenobmtxpa.c jacobi-1d-imper.c linpack-pc.c lpbench.c lu.c matrix.cpp misr.c nestedloop.c nsieve-bits.c objinst.c puzzle.c random.cpp RealMM.c ReedSolomon.c reg_detect.c sieve.c stepanov_abstr- action.cpp syr2k.c 0 20 40 60 80 Relative Time (%) O0 O1 O2 O3 Figure 4.5: Runtime Speedup of Samples in Experiment #1 with Chromium. The bars show the % speedup of the samples having both LLVM and Binaryen inlining passes disabled compared to when both inlining passes enabled, i.e., the default version. The runtime speedups range from as low as 5.34% to high as 50.61% with an average speedup of 25.2% fig:wasm_og_v_noinline ackermann.c almabench.c ary.cpp ary3.c atax.c bicg.c bigfib.cpp Bubblesort.c dt.c fdtd-2d.c ffbench.c fftbench.cpp fib2.c flops.c functionobjects.cpp gesummv.c hash.c hash.cpp heapsort.c heapsort.cpp hello.cpp huffbench.c IntMM.c jacobi-1d-imper.c jacobi-2d-imper.c lpbench.c mandel-text.cpp matmul_f64_4x4.c matrix.cppmisr.cmvt.c nestedloop.c nsieve-bits.c objinst.cpp Oscar.c Perm.c pi.c puzzle.c Queens.c Quicksort.c random.cpp ray.cpp RealMM.c sieve.c sphereflake.cpp strcat.c strcat.cpp syr2k.c Towers.c Treesort.c trmm.c 0 20 40 60 Relative Time (%) O0 O1 O2 O3 Figure 4.6: Runtime Speedup of Samples in Experiment #1 with Firefox. The runtime speedups range from as low as 5.02% in functionobjects.cpp to high as 56.15% in matrix.cpp with an average speedup of 15.07% fig:wasm_og_v_noinline_firefox 4.4.1 RQ1: Significance of Counterintuitive Effects subsec:counterintuitive-eval Function Inlining Effects on WebAssembly To measure the effects of function inlining, we compile 127 samples from the SingleSource/Benchmarks directory of the LLVM test suite with optimization levels O0 - O3. Experiment #1 compiles each sample with the function inlining passes in both LLVM (inline) and Binaryen (inlining-optimizing) disabled. The resulting modules are run using both Chromium and Firefox. In Figure 4.5, we report the runtime speedup experienced by each sample when run in Chromium with the two inlining passes disabled. Note that the figure presents samples that contain at least one optimization level grouping that experience a 5% percent speedup, i.e., experience a 5% percent decrease in runtime, after disabling inlining. For example, in the sample matrix.cpp, disabling inlining leads to a runtime speedup of 62.42% in O1 and 36.03% in O2. When compiled with inlining disabled, 32 samples become at least 5% faster (and 15 samples become 20% faster) in Chromium. Figure 4.6 shows that 51 samples with inlining disabled 59 ary2.cpp bicg.c Bubblesort.c fibo.cpp hash.c huffbench.c jacobi-2d-imper.c lu.c matmul_f64_4x4.c matrix.c Perm.c puzzle.c Queens.c Treesort.c trisolv.c 0 20 40 60 80 % Speedup O0 O1 O2 O3 Figure 4.7: Runtime Speedup of Samples in Experiment #2. The bars show the % runtime speedup of the x64 samples after having LLVM’s inline pass disabled. fig:native_og_v_noinlining run at least 5% faster and 10 samples run at least 20% faster in Firefox. On average, these counterintuitive samples experience a 15.07% speedup. The results in Figures 4.5 and 4.6 show function inlining causes counterintuitive behavior in certain WebAssembly modules. 4.4.1.1 Function Inlining Effects on x64 Architecture We compare the effects of function inlining in WebAssembly against the inlining effects in native x64 binaries to determine whether these effects are unique to WebAssembly or common across multiple platforms. Experiment #2 compiles the samples to x64 binaries with inline in LLVM disabled. Figure 4.7 shows the samples experiencing the counterintuitive inlining behavior. Of the 127 samples, 15 samples experience similar counterintuitive effects with function inlining as those in WebAssembly. Of those 15 samples, 12 samples, highlighted in yellow, experience counterintuitive inlining effects on both WebAssembly and the native x64 platform. We inspect the execution details of the native binaries using the perf stat tool. We find that higher values in the cache-misses, all-loads-retired, and all-stores-retired events explain the counterintuitive behavior experienced by these samples. These execution statistics indicate that inlining leads to an increased number of cache misses (from the cache-misses events) and increased register pressure (from the all-loads-retired and all-stores-retired events) [144] during execution. These are known issues 60 of function inlining in native architectures [141]. Among the samples experiencing increased counts in these metrics, inlining increases cache-misses by 12.95%, all-loads-retired by 11%, and all-stores-retired by 9%. The small number of common samples in both WebAssembly and native x64 suggests that the counterintuitive effects seen in the WebAssembly samples are caused by different factors than in the native x64 platform. We continue our investigation of the counterintuitive WebAssembly samples to determine what these factors are. 4.4.2 RQ2: Investigation of Function Characteristics Causing Counterintuitive Effects subsec:rq2-characteristics The results in Section 4.4.1 show that the function inlining passes in Binaryen and LLVM can lead to counterintuitive behavior that is unique to WebAssembly. In this section, we investigate the causes of this behavior. Specifically, we seek to understand the characteristics of the inlined functions that lead to the counterintuitive effects we observe. We first manually inspect the compiled WebAssembly modules to understand how the code differences produce the behavior. However, the number of differences between the samples with both inlining passes enabled and disabled is large. Each sample contains an average difference of 24,774.73 LOC between the versions having inlining enabled and disabled. Combined with the terse syntax of WebAssembly, manual inspection of these samples proves to be extremely challenging. Hence, to ease the inspection process, we inspect the effects of a single inlining pass on the module, reducing the search space of code differences between the samples with enabled and disabled inlining. We also compile the samples with the LLVM and Binaryen inlining passes disabled separately to understand how each component introduces counterintuitive behavior and discuss the results of each component separately. 4.4.2.1 Function Inlining in LLVM Experiment #3 aims to show the impact of the LLVM inline pass on the counterintuitive effects. The results, shown in Figure 4.8, reveal that the disabling the inline pass leads to a Chromium runtime speedup of at 61 2mm.c 3mm.c ackermann.c chomp.c dt.c evalloop.c flops.c gemm.c gemver.c hello.c himenobmtxpa.c jacobi-1d-imper.c linpack-pc.c lpbench.c matrix.cpp nestedloop.c random.cpp reg_detect.c stepanov_ abstraction.cpp syr2k.c 0 10 20 30 40 50 60 % Speedup O0 O1 O2 O3 Figure 4.8: Runtime Speedup Comparing Experiment #3 with Baseline in Chromium. The bars show the % speedup in the samples’ runtimes with the LLVM inline pass disabled compared to their runtime with the inline pass enabled. fig:wasm_llvm_only_noinline least 5% in 20 different samples with the average runtime speedup being 17.61%. Of those 20 samples, 7 samples experience a runtime speedup of at least 20%. In Firefox, 32 and 7 samples experience runtime speedups of at least 5% and 20%, respectively. The runtime speedups in Firefox range from as low as 5.03% to high as 56.80% with an average speedup of 13.01%. Due to space constraints, we omit the Firefox results figure. Case Study (random.cpp). To exemplify the characteristics of the inlined functions, we investigate one of these affected samples, random.cpp, in depth. Recall that in its source and WebAssembly code presented in Figure 4.4, we saw that the code for the hot C++ function gen_random (wasm-function[14] in WebAssembly) is inlined into the C++ main function (wasm-function[13]). In this example, inlining is performed by LLVM’s inline pass Figure 4.9 shows the collected browser call trace from perf record for random.cpp when the inline pass is enabled and disabled. Figure 4.9(a) shows that when inline is enabled, wasm-function[13] consumes the largest WebAssembly runtime, i.e., it is the hottest function, and uses the Liftoff-generated code. In Figure 4.9(b), wasm-function[14] supplants wasm-function[13] as the function with the largest share of the overhead. Wasm-function[14] uses the TurboFan-generated code while wasm-function[13] is still running the Liftoff code. Combining the code snippet in Figure 4.4(c) with the call trace in Figure 4.9(a) shows that when inlining is enabled, the browser is stuck using the Liftoff-generated code of gen_random. Figure 4.4(b) 62 67.25% [.] Function:wasm-function[13]-13-liftoff 0.00% [.] Function:wasm-function[1476]-1476-turbofan 0.00% [.] Function:wasm-function[1477]-1477-turbofan 0.00% [.] Function:wasm-function[285]-285-turbofan 0.00% [.] Function:wasm-function[327]-327-turbofan Random.cpp O1 Inlining Enabled (a) Inline Enabled subfig:random-perf-O1-inliner-enabled 42.08% [.] Function:wasm-function[14]-14-turbofan 11.06% [.] Function:wasm-function[13]-13-liftoff 0.00% [.] Function:wasm-function[12]-12-liftoff 0.00% [.] Function:wasm-function[1507]-1507-turbofan 0.00% [.] Function:wasm-function[1510]-1510-liftoff 0.00% [.] Function:wasm-function[1524]-1524-liftoff Random.cpp O1 Inlining Disabled (b) Inline Disabled subfig:random-perf-O1-inlinerpass-disabled Figure 4.9: Perf Record Output for random.cpp. (a) traces the browser function calls made during the Baseline experiment. Here, function 13 uses Liftoff code and occupies most of the execution time. (b) traces the browser calls in Experiment #3, and it shows that function 14 uses its TurboFan code. fig:random-perf-O1-inliner 2mm.c adi.c almabench.c ary.cpp ary3.cpp bigfib.cpp chomp.c fftbench.cpp gemm.c gemver.c heapsort.c heapsort.cpp himenobmtxpa.c loop_unroll.cpp lpbench.c misr.c moments.cpp nestedloop.c nsieve-bits.c random.cpp RealMM.c ReedSolomon.c reg_detect.c sieve.cpp simple_types_ constant_folding.cpp simple_types_ loop_invariant.cpp sphereflake.cpp stepanov_ abstraction.cpp syr2k.c 0 10 20 30 40 50 % Speedup O2 O3 Figure 4.10: Runtime Speedup Comparing Experiment #4 with Baseline in Chromium. The bars present the runtime speedup in samples after the Binaryen inlining-optimizing pass is disabled. fig:wasm_binaryen_only_noinline and Figure 4.9b show that disabling inlining allows the gen_random function to use the TurboFan code because it is not bound to the main function. To sum up, separating the hot gen_random function from the long-running main function allows the hot functionality to use the more-efficient TurboFan code. The random.cpp case shows that the inline pass affects whether the hottest function in the module uses the code generated by the Liftoff or TurboFan compiler. We collect the perf record output of each sample in Figure 4.8 with inlining enabled and disabled. We find that in 12 out of 20 Chromium samples, the hottest function of the module uses Liftoff when inlining is enabled and TurboFan when inlining is disabled. This finding suggests that function inlining can prevent the more performant TurboFan code from being used. 4.4.2.2 Function Inlining in Binaryen subsec:eval-browser-compiler-inlining Experiment #4 aims to highlight the contribution that inlining-optimizing pass in Binaryen has on the counterintuitive behavior. Figure 4.10 shows that the inlining-optimizing pass in Binaryen causes 29 samples 63 2mm.c 3mm.c ary3.cpp atax.c bigfib.cpp cholesky.c chomp.c correlation.c covariance.c dry.c dynprog.c fdtd-2d.c fftbench.cpp fldry.c flops-2.c flops-4.c hash2.cpp himenobmtxpa.c lists.cpp lists1.cpp mandel-2.c methcall.cpp misr.c moments.cpp objinst.cpp Oscar.c ray.cpp richards_benchmark.c sieve.c sieve.cpp simple_types_ constant_folding.cpp simple_types_ loop_invariant.cpp sphereflake.cpp stepanov_v1p2.cpp stepanov_vector.cpp strcat.c symm.c syr2k.c Towers.c trmm.c 0 10 20 30 40 50 % Speedup O2 O3 Figure 4.11: Runtime Speedup Comparing Experiment #4 with Baseline in Firefox. fig:wasm_binaryen_only_noinline_firefox to experience counterintuitive runtime behavior in Chromium. Among these 29 samples, the smallest runtime speedup was 5.60% in reg_detect.c, the largest speedup was 50.0% in ReedSolomon.c, and the average speedup was 20.88%. It is important to note that since Experiment #4 inspects all 127 samples, these 29 samples are not a subset of the samples in Experiment #1. In Firefox, Figure 4.11 shows that 40 samples experience counterintuitive runtime speedup of at least 5%, with the smallest runtime speedup being 5.13% (dry.c) and the largest speedup being 42.39% (lists.cpp). Our analysis of the affected samples with perf record shows that they experience similar behavior as random.cpp exhibits with LLVM’s inline pass (shown in Figure 4.9). In 23 of the 29 Chromium samples, the hottest function in the sample version with inlining-optimizing enabled uses the Liftoff-generated code, while the hottest function in the sample version with inlining-optimizing disabled uses the TurboFan-generated code. This finding shows that, for both components, the same reason explains the effects of function inlining: function inlining can prevent the hot functions in a module from using the more performant browser compiler. Figure 4.12 shows the breakdown of executed functions within the Chromium-run O2 samples in Figure 4.5 that have a speedup of at least 20%. For each sample, the figure shows two bars: the left bar is breakdown of the function executed when both inlining passes are enabled (Baseline), while the right bar shows the breakdown when inlining is disabled (Experiment #1). Each segment represents a single function, and its height indicates its percentage of the total execution time (according to perf record). As 64 3mm.c heapsort.c heapsort.cpp himenobmtxpa.c lpbench.c matrix.cpp nsieve-bits.c random.cpp ReedSolomon.c 0 20 40 60 80 100 Percent of Total Execution Time (%) Inlining Liftoff Inlining TurboFan No-Inlining Liftoff No-Inlining TurboFan Figure 4.12: Breakdown of Liftoff or TurboFan Compilers Used in O2. Each bar segment represents a function and its portion of total execution time. The left bar shows the Baseline execution, while the right bar shows Experiment #2 execution. fig:wasm_og_v_noinlining_perf_compiler_breakdown Figure 4.12 shows, the samples experience similar counterintuitive behavior to Figure 4.9: the samples are limited to using Liftoff for the hot functions when inlined into a long-running function, while disabling inlining allows the hot functions to use the TurboFan code. The timing of the switch between Liftoff and TurboFan is determined by each compiler’s duration for a given function. The architecture of the single-pass Liftoff compiler [21] along with the graph construction used in the TurboFan compiler [214] suggest that function code size should influence the compiler duration. We plot the compilation duration of the TurboFan and Liftoff compilers against the function code size of all functions from the counterintuitive samples in Figure 4.13. Our results reveal that the duration of both compilers closely follow the increase in code size. However, the TurboFan duration is consistently an order of magnitude larger than the Liftoff duration for the same function size. Following this trend, an explanation on why Binaryen’s inlining-optimizing pass and LLVM’s inline pass lead to counterintuitive behavior is that inlining instructions to a call site increases the function code size. By increasing the code size of the long-running function, it quickly increases the compilation time of TurboFan and forces any inlined hot functionality to spend more time using the Liftoff code. In the case of a long-running function invoked only once, the increase in compilation duration decreases the 65 Figure 4.13: Chromium Compilation Duration Among All Samples. For each function, the TurboFan (blue dots) and Liftoff (green dots) compiler duration is plotted against the code size. Darker dots indicate multiple functions of the same size also have the same duration. fig:turbofan-compile-duration-all 2mm.c adi.c almabench.c ary.cpp ary3.cpp bigfib.cpp chomp.c fftbench.cpp gemm.c gemver.c heapsort.c heapsort.cpp himenobmtxpa.c loop_unroll.cpp lpbench.c misr.c moments.cpp nestedloop.c nsieve-bits.c random.cpp RealMM.c ReedSolomon.c reg_detect.c sieve.cpp simple_types_constant_folding.cpp simple_types_loop_invariant.cpp sphereflake.cpp stepanov_abstraction.cpp syr2k.c 0 10 20 30 40 50 % Speedup O2 No-Inlining O2 Patched O3 No-Inlining O3 Patched Figure 4.14: Runtime Speedup of Experiments #4 & #5 Against Baseline in Chromium. The dark blue and green bars represent the % speedup of the sample when compiled with our patched Binaryen pass for O2 and O3, respectively. The light blue and green bars show the % speedup from Experiment #4 to serve as a reference point on the patched runtime impact. In 24 samples, our patch produces a speedup to a similar extent that disabling all inlining does. fig:wasm_binaryen_patched likelihood that the TurboFan code will be ready by the time of the first, and only, invocation. Also, inlining hot functions into a function that is only invoked once, such as main, can prevent the inlined code from ever using the more-efficient TurboFan code. 4.4.3 RQ3: Impact of Hot Functions on Counterintuitive Effects In this section, we empirically quantify the impact of inlining hot functions in our dataset to better understand potential performance gain if the hot function inlining is disabled. Specifically, we modify Binaryen’s inlining-optimizing in the compiler to exclude hot functions (i.e., functions called in loops) 66 2mm.c 3mm.c ary3.cpp atax.c bigfib.cpp cholesky.c chomp.c correlation.c covariance.c dry.c dynprog.c fdtd-2d.c fftbench.cpp fldry.c flops-2.c flops-4.c hash2.cpp himenobmtxpa.c lists.cpp lists1.cpp mandel-2.c methcall.cpp misr.c moments.cpp objinst.cpp Oscar.c ray.cpp richards_benchmark.c sieve.c sieve.cpp simple_types _constant_folding.cpp simple_types _loop_invariant.cpp sphereflake.cpp stepanov_v1p2.cpp stepanov_vector.cpp strcat.c symm.c syr2k.c Towers.c trmm.c 0 10 20 30 40 50 % Speedup O2 No-Inlining O2 Patched O3 No-Inlining O3 Patched Figure 4.15: Runtime Speedup of Experiments #4 & #5 Against Baseline in Firefox. fig:wasm_binaryen_patched_firefox aead_chacha20poly1305 aead_chacha20poly13052 aead_xchacha20poly1305 auth auth2 auth6 box box_easy box_easy2 box_seal box_seed box2 box8 chacha20 core_ed25519 core2 core4 core5 core6 ed25519_convert generichash2 generichash3 kdf kx misuse onetimeauth onetimeauth2 pwhash_scrypt_ll scalarmult scalarmult_ed25519 scalarmult_ristretto255 scalarmult2 scalarmult5 scalarmult6 scalarmult8 secretbox secretbox_easy secretstream sign siphashx24 sodium_utils stream stream3 xchacha20 0 20 40 60 80 % Speedup O0 O1 O2 O3 Figure 4.16: Libsodium.js Runtime Speedup of Experiment #1 in Chromium. fig:libsodiumjs_chromium from being inlined. Then, we repeat the experiments with the modified pass to measure the performance improvement by the modification. Identifying Hot and Long-running Functions. To identify possible hot functions, we search for functions called within a loop as they will be called repeatedly. Specifically, we use the LLVM parser tools [50, 26] to obtain the abstract syntax tree (AST) of the source code. We then traverse the AST to identify a sub tree with a root node of For_Stmt containing a Call_Expr_Stmt node. Such a sub-tree represents a function call statement within a loop and the Call_Expr_Stmt’s call target is a hot function. While it is not a comprehensive method, we find that this criterion is suitable for identifying the hot functions within our samples. Note that we manually verified that all the identified functions are hot functions. The list of functions matching this criterion is used to prevent them from being inlined. We also exclude typical long-running functions that only execute once, such as the entry point (e.g., main) function. Patched inlining-optimizing Pass. In Section 4.4.2, we discuss how Binaryen’s inlining-optimizing pass can prevent hot functions from tiering up to the more-efficient TurboFan code. Hence, in this experiment, we 67 modify the inlining-optimizing pass to disable the inlining optimization for identified hot and long-running functions. Results. We use this patched inlining-optimizing pass on the samples that experience the counterintuitive behavior in Figures 4.10 and 4.11. We list the change in original runtime versus the runtime with the modified Binaryen pass in Figures 4.14 and 4.15. We also list the original runtime versus the runtime results with inlining-optimizing disabled to understand the impact of inlining optimizations except for the hot functions. We find that, in 41 of the 56 samples from both browsers, excluding the hot functions from the inlining leads to improved runtime performance. Figure 4.14 shows that the runtime speedups of the 24 improved samples in the 29 Chromium samples range from as low as 5.24% in stepanov_abstraction.cpp to as high as 53.49% in ReedSolomon.c, with an average speedup of 22.58%. Figure 4.15 shows that the runtime speedups of the 24 improved samples among the 40 Firefox samples range from as low as 5.85% in flops-2.c to as high as 45.39% in lists.cpp, with an average speedup of 16.16%. It is important to note that our simple heuristic for determining hot functions is not comprehensive. Nevertheless, we show that inlining heuristics following this direction can improve runtime performance. We hope our findings motivate the need for re-examined inlining heuristics for WebAssembly compilation. 4.4.4 Case Study on a Real-World Application We now investigate the effects of function inlining in one popular real-world WebAssembly application, Libsodium.js [81]. This project ports the Sodium cryptographic library [81] to the web using WebAssembly and JavaScript, and it has over 396,000 weekly downloads on the NPM package registry [186]. This library provides APIs that implement cryptographic functions, e.g., encryption, message signing, and hashing. In addition, we choose this project for our case study as the project includes links to its C/C++ source code and the compilation scripts. It contains 75 example programs that test the cryptographic APIs provided by the library, and we replicate Experiment #1 on these 75 programs. Figure 4.16 shows that function inlining 68 from the LLVM and Binaryen passes causes counterintuitive behavior in 44 programs on Chromium. The smallest runtime speedup was 5.07% in generichash2, the largest speedup was 86.46% in chacha20, and the average speedup was 16.76%. In Firefox, we find that 58 programs show similar behavior, and the average speedup is 20.36%. These results show that this inlining issue extends to real-world applications. 4.5 Discussion 4.5.1 Limitations Our investigation of WebAssembly performance suffers from two main limitations. First, the precision of our custom-built JavaScript measurement tool limits the depth of our investigation. Most browsers limit JavaScript timers to millisecond resolution [67], which is too coarse to measure a typical WebAssembly call. As a result, we focus on samples that have long running functions with runtimes in the magnitude of seconds. We also focus on samples with a percent decrease greater than 5% to account for the lack of precision. Our second limitation is that we only inspect two browsers, Chromium and Firefox. Inspecting each browser adds additional manual work, and we are limited by our budget of manual effort available. We accept this limitation as Chromium-based browsers and Firefox hold 74% of the browser market share [34]. 4.5.2 Threats to Validity 4.5.2.1 Internal Validity Our study results are subject to possible errors in the manual inspection processes. We manually inspect the emitted code to ensure that function inlining is present or omitted as per the tested configuration. We use the average of 10 runs to ensure changes are not caused by small runtime variations. Multiple factors, such as hardware, operating system, and system load, make it difficult to reproduce the exact runtime values we 69 record. However, we describe the steps used to establish our Baseline experiment. The counterintuitive behavior, relative to the baseline, should remain consistent across different experimental setups. 4.5.2.2 External Validity We use benchmarking samples from the LLVM test suite. As Emscripten is an LLVM-based compiler, we find that this collection of benchmarks curated by the LLVM development team is well-suited to assess the compilation effects caused by the inlining passes. The compiler benchmark samples also perform intensive computations, an intended use case of WebAssembly. 4.5.2.3 Construct Validity We identify the runtime impacts of function inlining optimizations by measuring the program runtime through browser execution timing, native execution timing, and event profiling tools. These measurement methods should highlight changes caused by different optimizations used in the samples. 4.6 Summary Function inlining optimizations in WebAssembly compilers fail to consider the presence of multiple browser compilers, leading to runtime performance issues. We provide the first in-depth investigation on the counterintuitive impact that function inlining can have on WebAssembly modules. Inlining can prevent hot functionality in the modules from leveraging optimized machine code if the functions are inlined into long running or seldomly invoked functions, leading to noticeable performance degradation of the whole application. We find that this behavior effects 66 out of 127 samples in the LLVM test suite and is caused by the inlining passes in both the LLVM and Binaryen components of Emscripten. We use our work to highlight the need to revisit existing static analyses used in optimization techniques for optimal WebAssembly usage. 70 Chapter 5 Automated WebAssembly Function Purpose Identification With Semantics-Aware Analysis chp:waspur Thanks to its design as a compiled language, WebAssembly is able to achieve fast runtime speeds, often within the range of native speeds [352]. Unfortunately, when compared with its complementary language JavaScript, WebAssembly is difficult to read and understand. This aspect makes it difficult to maintain existing WebAssembly code. In addition, as a web technology, some WebAssembly modules are naturally distributed by third parties. These WebAssembly modules need to be implicitly trusted by developers as verifying the functionality themselves may not be feasible. In this chapter, we address the second challenge discussed in Section 1.2, i.e., WebAssembly program comprehension (Section 1.2.2). We present WASPur, a tool to automatically identify the purposes of WebAssembly functions. To build this tool, we first construct an extensive collection of WebAssembly samples that represent the state of WebAssembly. Second, we analyze the dataset and identify the diverse use cases of the collected WebAssembly modules. We leverage the dataset of WebAssembly modules to construct semantics-aware intermediate representations (IR) of the functions in the modules. We encode the function IR for use in a machine learning classifier, and we find that this classifier can predict the similarity of a given function against known named functions with an accuracy rate of 88.07%. 71 This chapter shares content with its corresponding publication [256]. The author of this dissertation is also the main of author of the paper and handled the implementation and evaluation of the tool as well as the majority of the paper text. 5.1 Motivation and Contributions The WebAssembly standard defines a readable text representation of the module’s internal structure, including types, memory limits, and function definitions. Although readable, the text format still has a steep learning curve compared with high-level languages. There are two characteristics of WebAssembly that make it challenging for human readers to interpret. First, WebAssembly has only four numeric data types, i32, i64, f32, and f64, making the instruction sequences of several applications, such as string manipulation and cryptographic hashing, similar. Second, its stack machine design makes deriving the value of a variable at a given location difficult. The stack must be traced from a given location to identify the computed value at a specific code location. These two factors contribute to the difficulty of understanding WebAssembly code. Source maps can be used to find the corresponding functionality in a high-level source language. However, many WebAssembly modules, including malicious modules, are delivered through third-party services where the source code is not available [221]. For such cases, end users need to verify a WebAssembly module’s actual functionality manually. Previous work [221, 132] has looked at the purposes of WebAssembly samples. However, there has been little work to help developers understand the functionality implemented by a WebAssembly module. We develop an automated classification tool, WASPur, to help developers understand the intended functionality of individual WebAssembly functions within the applications. WASPur constructs abstractions on the semantic functionality of the module that are resilient to syntactic differences, and these abstractions are used in a machine-learning classifier to identify what functionality the WebAssembly functions implement. We use Chapter 5 to demonstrate that combining an intermediate representation, albeit a simple one, 72 with machine learning classification can be useful for identifying the purposes of WebAssembly functions. However, WASPur can only label WebAssembly code at the function level of granularity, and the labels must come from a predetermined set of classes. These limitations will highlight the need for a more robust method for understanding WebAssembly modules in the form of binary decompilation. Specifically, this work makes the following contributions: • We propose an intermediate representation (IR) to abstract underlying semantics of WebAssembly applications that enables syntax-resilient analysis. • We construct a dataset of diverse WebAssembly samples by crawling real-world websites, Firefox add-ons, Chrome extensions, and GitHub repositories. • We perform a comprehensive analysis of the collected WebAssembly samples. We identify the purposes of these samples and classify them into 12 categories. • We develop an automated classification tool, WASPur, that can accurately label a given WebAssembly function with an appropriate function name according to its functionality with an 88.07% accuracy rate. 5.2 System Design To help developers understand the functionality implemented by WebAssembly modules, we develop WASPur, an automated tool that leverages a semantics-aware intermediate representation (IR) designed to capture the effects produced by WebAssembly instructions. WASPur classifies the functions in a WebAssembly module using two main components, as shown in Figure 5.1. The first component, the Abstraction Generator, collects the abstractions for all functions within the module to represent each function in our IR (Section 5.2.1). The second component, the Classifier, uses the sequence of abstracted IR units as input into a neural network classifier (Section 5.3). The Classifier is trained on the names of functions repeatedly 73 Abstraction Generator Classifier WebAssembly Binary $func: Predicted Function Name Collected Samples Training Samples Abstraction Sequence w/ Function Name New Abstraction Sequence w/o Name $func: $some: $name: Figure 5.1: WASPur System Overview. fig:system_overview found in WebAssembly modules and outputs the probability that an inspected function belongs to the group of similar named functions. 5.2.1 Abstraction Generator sec:waspur-program_lifter The goal of our approach is to generate a high-level intermediate representation (IR) that can recover semantic meaning from the low-level WebAssembly bytecode. To produce the high-level IR, WASPur is constructed on a core set of abstraction rules: def:transformation Definition 1 (Abstraction rule) An abstraction rule is a tuple (S, a, Def) where: • S represents one or more stack operations that simulate the effects of a WebAssembly instruction on the virtual stack, • a is a transformation function that maps a WebAssembly instruction to a C-like abstraction, and • Def is the definition set of all alive variables at the current code location. We present abstractions rules that abstract WebAssembly bytecode into five groups: 74 Table 5.1: Abstraction Rules. table:abstraction-rules Instruction Stack Simulation1,2,3 Definition Set Update Abstraction Numeric Instructions i32.const c push(c) i64.mul pop()→e1, pop()→e2, push(e1 × e2) f32.eq pop()→e1, pop()→e2, push(e2 = e1) f64.max pop()→e1, pop()→e2, push(max{e1,e2}) i32.eqz pop()→e push(e = 0) Parametric Instructions drop pop()→e Variable Instructions get_local v push(v) get_global v set_local v pop()→e Def(v) = Def(v) ∪ {e l } v = e; set_global v tee_local pop()→e, Def(v) = Def(v) ∪ {e l } v = e; push(e) Memory Instructions i32.load pop()→e, push(R32(e)) i32.store pop()→e1, Def(e2) = Def(e2) ∪ W32(e2,e1); pop()→e2 {addr(e1) l } i32.store8 pop()→e1, Def(e2) = Def(e2) ∪ W8(e2,e1); pop()→e2 {addr(e1) l } Control Instructions loop g: g: if I pop()→e if (e) { br g goto g; br_if g pop()→e if (e) goto g; block B B: { end } call f paraNum(f)→n, for(i = n;i > 0;i-–) f(e1,e2,...,en); pop()→ei call_indirect pop()→e, f(e1,e2,...,en); paraNum(e)→n, for(i = n;i > 0;i-–) pop()→ei 1. retLen() gives the number of return values (either 1 or 0) of current function. 2. paraNum(f) returns the number of return values of function f. 3. func(e) returns the function name by checking the function table with index e. 75 1. Numeric Instructions: Perform numeric computations on stack values. 2. Parametric Instructions: Manipulate virtual stack without additional computations. 3. Control Instructions: Change the control flow of the program using values from the stack. 4. Variable Instructions: Assign and fetch the local variables. 5. Memory Instructions: Assign and fetch memory values. Table 6.1 shows a subset of WebAssembly instructions and how we abstract each group of instructions. Numeric Instructions WebAssembly execution is based on a stack machine architecture, so our abstraction models WebAssembly instructions based on their effects on the stack. Modeling the stack allows our abstractions to capture the data and control flows of the program. The first group of abstractions models the instructions that perform numeric computations on stack values. For example, the instruction “i32.const c” pushes a 32-bit constant c onto the stack. The prefix i32 indicates that the data type of c is a 32-bit integer. Similarly, a 64-bit integer is prefixed by i64, and a 32-bit floating point number is prefixed by f32. We capture these instructions by applying them to symbolic values representing the parameters, variables, and loaded values. Parametric Instructions This group includes two WebAssembly instructions, “drop” and “select.” These instructions can drop a value from the stack without performing other computations. We model these instructions through their effects on the virtual stack. 76 Control Instructions WebAssembly supports control flow constructs such as if, loop, and block. Conditional statements such as “if I” are abstracted to “if (e),” where the condition e is popped from the stack. The abstractions defined within the scope of the if block are stored in a list of inner abstractions. We model the “block” and “loop” instructions using “block” and “for” abstractions, respectively. Both abstractions define a label, e.g., “loop g:,” that encapsulates a block of code. The ‘br g” and “br_if g” instructions control whether the control flow will go to the beginning of the labeled block. To model this behavior, we also store any abstractions defined with the blocks into the list of inner abstractions. The “for” abstraction additionally contains the condition that controls whether the loop terminates. “call” instructions make direct calls with explicit function names, whereas “call_indirect” instructions make indirect calls using an index to the function table. We model both instructions with corresponding “call” and “call_indirect” abstractions. Variable Instructions Variable instructions can either load values from local or global variables or assign values to them. To model the instructions loading variables such as “get_global,” we use symbolic values to represent them on the stack. To model the instructions that assign values to variables, such as “set_local,” we use the “set” abstraction to record information on the targeted variable and assigned value. We also record the history of variable assignments in the variable definition set. For example, to handle the operation “set_local v,” our definition set would be updated to reflect that the local variable v currently contains the value e. Memory Instructions Similar to the variable instructions, there are memory instructions that can load (e.g., “i64.load”) or store (e.g., “i32.store”) values in the linear memory. We model the loading instructions using symbolic values 77 to represent the value referred to at a specific memory index. We model the memory store instructions by constructing a “store” abstraction that tracks the targeted memory location and value. We also record the history of these memory stores in the variable definition set. For example, “i32.store8” would be abstracted to “W8(e2, e1)”, where e2 is the source address and e1 is a destination address. The definition set is updated to reflect that the memory index e2 contains the value e1. We observe that WebAssembly applications typically use consecutive memory copy operations, so we merge consecutive “store” abstractions into a single abstraction based on two rules. Sequential writes to consecutive memory buffers are simplified to a “memcpy” abstraction by inferring the starting address, destination address, and memory length to copy from the “store” abstractions. The semantics of writing to consecutive memory buffers can also be realized by using a loop to write each byte and similarly inferring the addresses and memory size to copy. 5.2.2 Applying Abstractions sec:cfg-construction To apply our abstractions, we first build an intraprocedural control flow graph (iCFG) for each function. This graph contains the abstraction sequence constructed by traversing the instruction sequence of a single function. A small set of transformations are applied to condense abstractions, such as combining consecutive repeated operations into loops. After the iCFG of each function is built, we then construct an interprocedural CFG (ICFG) by linking individual iCFGs on “call” abstractions. A separate ICFG is built for each function, with the desired function being used as the starting point for graph traversals. We limit the depth to two levels of calls to prevent cycles caused by recursive functions. For “call_indirect” abstractions, we link all functions matching the declared function type. 78 5.3 Classifier sec:classifier_design Using the IR constructed from the program abstractions, the classifier determines the functionality of a function by predicting the function name of a function with a similar abstraction trace. We describe the details of how the abstractions are encoded and how the classifier is trained in the following sections. 5.3.1 Encoding Abstractions as Features The classifier uses a neural network model to predict labels for the given abstraction sequence. The input needs to be encoded into a numeric representation to be fed into the model. Our input into the neural network is the sequence of abstractions produced when traversing the interprocedural control-flow graph (ICFG) of the targeted function. The sequence is then treated as a string of the abstraction types, e.g., “set set for store if ...” This string is embedded as a numeric vector with an integer representing one of the eight abstraction types we define. The vector requires a predefined sequence length, so abstraction traces longer than this length are truncated. 5.3.2 Training the Classifier The classifier is trained on the generated function abstraction sequences of the collected WebAssembly files. To train and evaluate the classifier, the WebAssembly functions with non-minified names are grouped together by their abstraction sequences. We use these function names as the labels for classification. The label strings are encoded using a multi-hot encoding scheme to map each label to an index of a numeric vector. The classifier outputs a vector whose floating-point values correspond to the probabilities that a certain label should apply to the sample. The classifier is trained and evaluated by splitting the dataset into a training set of 80%, a validation set of 10%, and a test set of 10%. 79 5.3.3 Neural Network Architecture The neural network underlying our classifier takes in the abstraction sequence as input. An embedding layer encodes the abstraction sequence string as a numeric vector of at most 250 integers. Each hidden layer uses the fully connected Dense layer type provided by TensorFlow [5]. The output layer consists of 189 units that use the SoftMax activation function. The units in this layer correspond to the indices of the label values predicted by the network. The network is configured to use the cross-entropy loss between true and predicted labels as the loss function, and it uses the Adam gradient descent method [167] as the optimization algorithm. We configure the network to use 30 iterations when training the model. The classification performance of our model depends on using appropriate values for the hyperparameters. We tune the hyperparameters of the neural network model to identify a suitable configuration for predicting labels from the given abstraction sequence input. To identify the optimal number of hidden layer units, activation function, and number of layers for our model, we construct our neural network using different values of hyperparameters to identify the highest accuracy value that our classifier can attain. 5.4 Data Collection and Handling sec:waspur-data-collection We collect WebAssembly samples to build the training and evaluation datasets for our neural network model. We describe our process for collecting a diverse set of WebAssembly binary samples from various sources in the following section. We also describe these samples in detail and the use cases that they implement. 5.4.1 Data Acquisition We collect WebAssembly samples from four sources: (1) Alexa top 1 million websites, (2) 17,682 top Chrome extensions sorted by installed users, (3) 16,385 popular Firefox add-ons sorted by installed users, and (4) 112 million GitHub repositories. 80 5.4.2 Alexa Top 1 Million Websites We crawled the Alexa top 1 million websites from October 2018 to May 2020. For each website, we visited the homepage and all first-level subpages. We decided to limit the crawling to first-level subpages rather than all subpages because a full scan would require hours for complex websites that include thousands of subpages. To download WebAssembly binaries running on a page, we modified the Chromium browser version 77 with the “–dump-wasm-module” flag enabled to dump any WebAssembly module the browser decodes [302]. 5.4.3 Chrome Extensions We get WebAssembly samples from Chrome extensions by running all extensions with more than 1,000 users through the modified Chromium browser. We crawled the Chrome extensions from March 25 to March 30, 2019. It took one day to download all the Chrome extensions and four days to assess each extension. This resulted in a total of 17,682 Chrome extensions. 5.4.4 Firefox Add-ons Samples from Firefox add-ons were obtained by crawling the official Firefox Add-ons website to download the .xpi add-on archives. The .xpi archives were scanned for the files ending with “.wasm”. We crawled the Firefox add-ons on July 30, 2019. It took one day to download and scan all the add-ons. In total, 16,385 Firefox add-ons were analyzed. 5.4.5 GitHub Repositories We obtained WebAssembly samples from the Public Git Archive dataset [205] using the pga command line tool [110]. We specified the “--lang” filter to obtain the repositories using WebAssembly as the language 81 filter. We then scanned these repositories using the GitHub REST API [140] to find and download all WebAssembly binary files (.wasm) [140]. This process took one day and was performed on October 3, 2019. 5.4.6 Module-Level Categorization subsec:automatedclassifier Identifying the category (i.e., the intended purpose) of WebAssembly programs found in the wild is crucial for understanding the landscape. To discover the intended purposes of the samples, we manually inspect and label the files by relying on four types of information obtained from the WebAssembly binaries: import function names, export function names, internal function names, and file source. As shown in Table 5.2, we categorize the samples into 12 distinct categories. We find a variety of these categories across all the sources of WebAssembly samples that we investigated. Function names in the modules usually carry informative descriptions of the module’s use case. For a source file in a GitHub repository, the file’s use in the context of the project is used to identify the category features. For browser extensions, we looked at the extension’s description page and the WebAssembly and JavaScript files bundled in the extension archive to identify what the extension’s purpose was and what role the WebAssembly module played. In total, we produce more than 204,619 signatures from the import, export, and internal function names of the modules. 5.4.7 Function-Level Categorization sec:function-level-categorization Understanding the purposes of the individual WebAssembly functions comprising the module can also help developers understand the whole module as well. WebAssembly is a compiled language that usually undergoes compiler optimizations. In many cases, these optimizations minify the function names of the WebAssembly functions defined in the module, as well as in the accompanying JavaScript code. We observe 923 modules of 1,829 total modules use minified function names, removing a key piece of information from developers seeking to understand the module functionality. 82 Table 5.2: WebAssembly Use Case Categories. table:categories Category Description Compression Performs data compression operations. Cryptography Performs cryptographic operations (e.g., hashing). Game Implements stand-alone online games. Text Processing Performs text or word processing. Image Processing Analyzes or edits images. Numeric Provides commonly used mathematical or Processing numeric functions. Support Test Stub Probes environment for WebAssembly support. Standalone Apps Independent standalone programs. Auxiliary Provides commonly used data structures Library or utility functions. Cryptominer Performs cryptocurrency-mining operations. Code Carrier Stores JavaScript/CSS/HTML payloads. Unit Test Ensures conformance to language specification. To identify the intended functionalities of WebAssembly modules, we leverage the presence of function names in the collected modules. These function names can indicate the presence of C, C++, Rust, etc... common utilities, such as malloc and strcmp. Other function names indicate that the functions implement application-specific behavior, such as the names AutoThresholdImage and IsPDF. We create labels from the names of functions that appear in at least two unique modules. We also condense similar function names, e.g., $malloc, $_malloc, and $memory.allocate, into a single label representing the group, e.g., malloc. We obtain 189 different function categories through this process. Function names appearing across multiple module categories indicate that these functions implement common utilities. 5.5 Evaluation sec:waspur-eval We train and evaluate WASPur on the dataset of 1,829 unique WebAssembly files were collected from the Alexa top 1 million websites, Chrome extensions, Firefox add-ons, and GitHub repositories. To use these samples, we extract every function and build the abstraction IR for each function. We then encode the abstraction sequence as a numeric vector to use in the neural network model. To identify the neural 83 0 5 10 15 20 25 30 Iteration 0.4 0.6 0.8 Value Accuracy Precision Recall F1 Score Figure 5.2: Training Metrics over 30 Iterations. fig:classification_metrics network hyperparameters that provide the best predictive power, we measure the classification metrics of our model using different numbers of hidden units within a layer, different activation functions, and different numbers of layers. 5.5.1 Function Name Dataset To evaluate WASPur, we use this dataset of WebAssembly modules to build a dataset of individual WebAssembly functions. For each function, we record its name, abstraction traversal sequence, and its parent module. We then use the function name to provide a label for the function, according to our description in Section 5.4.7. Our function dataset consists of 11,524,686 functions extracted from the 1,829 unique WebAssembly files. Of these functions, 151,662 have function names while 11,373,024 have no function names because of optimization or minification steps. We use the 151,662 labeled functions to train our classifier. 5.5.2 Classification Results Figure 5.2 presents the training metrics obtained by the neural network using the best-performing hyperparameters: 3 hidden layers, 1024 hidden units per layer, and the ReLU activation function. After training the neural network, evaluating the network on a test set shows that the model can obtain a final accuracy of 88.07%. This metric describes how often the model correctly classifies a sample with the correct label. The 84 precision of the model, which describes how often the model correctly outputs a label whenever it predicts that label, is 0.91. The recall of the model, which describes how often the model correctly outputs a label against all the samples that have that label, is 0.87. The model achieves an F1 score of 0.89. 5.6 Discussion 5.6.1 Threats to Validity sec:threats 5.6.1.1 Internal Validity Our study results are subject to possible errors in the manual inspection processes of labeling the WebAssembly samples and grouping the function names. These subjective steps can be biased due to our inference of the code’s intention and the compilation practices in the lack of documentation. To reduce this threat, two authors analyzed the samples separately and discussed inconsistent results until they agreed on the labels. 5.6.1.2 External Validity Our choice of when and where to search for WebAssembly samples may affect our results. While we have attempted to search for a large set of WebAssembly files from a variety of sources and during separate times, our findings may not be applied to other datasets. Moreover, we spend significant effort to manually analyze the samples. Specifically, we spent approximately 700 person-hours to analyze the 6,769 collected samples. 5.6.1.3 Construct Validity The metrics and measurement procedures we used to assess the prevalence of WebAssembly, its use cases, and portability practices can construct validity threats. We may have missed other measures and metrics that would better or further support our conclusions. To mitigate this threat, we examined our data via several 85 ways of measurement, including analyzing the group statistics of WebAssembly modules quantitatively and studying the individual use cases and indicative signatures qualitatively. 5.7 Summary We present WASPur, an automated classification tool that identifies the purpose of an individual WebAssembly function. The tool leverages a semantics-aware intermediate representation to identify the semantics of WebAssembly functions whose purposes are known. To train and evaluate the classifier, we construct a dataset of 6,769 WebAssembly samples, with 1,829 being unique, collected from real-world websites, Chrome extensions, Firefox add-ons, and GitHub repositories. We describe the use cases and file statistics found from these collected samples to gain insight into the dataset. We evaluate the classifier after training it on labeled functions and find that it achieves an accuracy of 88.07%. 86 Chapter 6 MinerRay: Semantics-Aware Analysis for Ever-Evolving Cryptojacking Detection chp:minerray In this chapter (as well as in Chapter 7), we begin to discuss static program analyses for WebAssembly that relate to security issues (Section 1.2.3). We explore how, thanks to its fast performance and difficulty in understanding, WebAssembly has been abused to launch large-scale cryptojacking attacks that secretly mine cryptocurrency in browsers. This chapter presents MinerRay, a generic scheme to detect malicious in-browser cryptominers. Instead of leveraging unreliable signatures or runtime features, MinerRay infers the essence of cryptomining behaviors that differentiate mining from common browser activities in both WebAssembly and JavaScript contexts. Additionally, to detect stealthy mining activities without user consent, MinerRay checks if the miner can only be instantiated from user actions. MinerRay is evaluated on over 1 million websites. It detects cryptominers on 901 websites, where 885 secretly start mining without user consent. In addition, we compare MinerRay with five state-of-the-art signature-based or behavior-based cryptominer detectors (MineSweeper, CMTracker, Outguard, No Coin, and minerBlock). The results show that MinerRay is effective and robust in detecting evolving cryptominers, yielding more true positives and fewer errors. 87 The content of this chapter largely come from the corresponding publication of the same name [259]. This dissertation’s author is also the main author of this paper. The author also implemented the MinerRay tool, performed the experiments evaluating the tool, and handle most of the writing for the published paper. 6.1 Motivation and Contributions 6.1.1 In-Browser Cryptomining Since Bitcoin [225] was introduced in 2009, over 2,300 cryptocurrencies have emerged [55]. In just a decade, cryptocurrencies have grown from a tiny niche to a huge industry with a $250 billion market capitalization [75]. Thanks to recent advances in web technologies, certain cryptocurrencies can be mined directly in the web browser. Since the first in-browser cryptomining service CoinHive was launched [53] September 2017, such services have become a possible revenue model for website owners. Simply by including a script, websites can use client browsers to make money [249]. However, this paradigm has been abused to mine cryptocurrencies without user consent. In Q4 2017, instances of cryptojacking have skyrocketed (increased by 8500%) [233], where cryptojacking malware that embedded CoinHive miners were found on over 30,000 websites [220]. Since then, in-browser cryptojacking has become one of the most prominent online threats [290, 203, 301]. Some miners use WebAssembly [316] for its performance advantage over JavaScript. For instance, Sumokoin [248] and Lethean [82] are built atop the CryptoNight hashing algorithm [276] or its variants like CryptoNight-Heavy [247]. Some are implemented in JavaScript. For example, CoinIMP [54] is leveraging asm.js [14] as an alternative to WebAssembly. Others use both JS and WebAssembly. For example, WebDollar [340] implements the Argon2 hash algorithm [30]. Moreover, mining services keep introducing new cryptominers and updating underlying hashing algorithms. 88 Despite their differences, the core of all these cryptomining algorithms is to compute hashes via hash functions. A hash function maps an input of arbitrary length to a fixed-length output. Such computations are usually done in an iterative manner. In particular, the input message M is partitioned into blocks of a specific size. Then, a compression function f is applied iteratively on each message block mi to compute the hash hi = f(hi−1, mi) for i = 1 to n. Therefore, each iteration is fairly segregated due to the data dependencies from its previous iteration. Compared to common programs running in browsers, this iterative procedure is unique and a likely indicator of cryptomining. Based on this observation, MinerRay looks for the semantics corresponding to the critical steps above and identifies hash functions. 6.1.2 Evolving Cryptominers In-browser cryptominers are becoming more diverse and sophisticated. For example, CryptoNight, one of the most popular cryptomining algorithms, has more than 9 versions originated from 3 variants [7]. In addition, cryptominers are written in more diverse languages. WebAssembly is no longer the only choice for in-browser cryptominers. The recent trend of developing less computation extensive PoW (Proof-of-Work) algorithms [91] also opens the opportunities for JavaScript (and asm.js) based cryptominers. In fact, there are already miners using both WebAssembly and JavaScript [340]. 6.1.2.1 Variant Algorithm (uPlexa [305]) uPlexa is an untraceable digital currency atop CryptoNight-UPX algorithm. It has gained considerable popularity. CryptoLoot switched from Monero to uPlexa in October 2019. Given Monero and uPlexa are based on different hashing algorithms (CryptoNight and CryptoNight-UPX respectively), we conjectured detection tools focusing on CryptoNight will fail to detect uPlexa. 89 6.1.2.2 JavaScript-Based Miners There are JavaScript based cryptominers including CoinIMP [54] and JSECoin [296]. CoinIMP is written in JavaScript implementing lyra2-webchain egalitarian algorithm [344]. In particular, it leverages asm.js implementation as an alternative to WebAssembly, providing better user-experience when WebAssembly technology is not available at runtime. CoinIMP and JSECoin vary greatly in program languages, signature functions and runtime behaviors. 6.1.2.3 JS-WebAssembly Hybrid Miner (WebDollar [340]) WebDollar uses both JavaScript and WebAssembly. The Argon2 hash function is implemented in WebAssembly, and its output is processed by JavaScript to iteratively computes the hash until the result contains the desired number of leading 0s. To detect such a hybrid miner, in addition to understanding both JavaScript and WebAssembly programs, the inter-operation should be analyzed to connect the JavaScript and WebAssembly programs. We develop a robust cryptojacking detector, MinerRay, which is resilient to variants of hashing algorithms and implementations. Unlike existing strategies based on fragile patterns such as the number of particular operations, execution time, or WebAssembly features, our technique relies on program semantics that are invariant across programming languages and implementations of cryptomining algorithms. Moreover, MinerRay distinguishes malicious cryptojackers from benign ones, by inspecting interactions between users and cryptomining modules. 6.1.3 Contributions This chapter makes the following contributions: • We propose an intermediate representation (IR) to abstract underlying semantics of miners written in WebAssembly and JavaScript, which supports cross-language analysis and is resilient to variants 90 (e.g., different versions of the same hashing algorithm or binaries generated by different compilers) (Section 6.3.1). • We develop a light-weight static analysis that infers the critical steps of hashing and reasons about user consent (Section 6.3.3). • We evaluate MinerRay on over 1 million websites. It identified miners on 901 websites connecting to 12 unique mining services, where 885 websites start mining without user consent. • We perform an extensive comparison study with five state-of-the-art detectors, namely: MineSweeper, CM-Tracker, Outguard, No Coin, and minerBlock. MinerRay detected the most websites with the least errors (false positives and false negatives combined). 6.2 Threat Model In-browser Cryptojacking. We assume crypto miners are delivered via web pages and steal clientside computing resources without consent. Any user visiting that page becomes a victim of in-browser cryptojacking attacks. Attackers can host such code on their websites or compromised websites. They can even deliver miners via malicious advertising or other third-party services. Cryptojacking outside of the web browsers is out of the scope. Impact of User Consent. Existing cryptominer detectors often focus on identifying cryptomining activities, without considering users’ consent. Note that cryptomining activity itself is not malicious, while it is malicious if the mining activity is done without users’ awareness. WAF takes users’ consent into account. 6.3 System Design Figure 6.1 shows the workflow of MinerRay, consisting of three major components: Programming Language Lifting (Section 6.3.1), CFG Construction (Section 6.3.2), and Cryptojacking Detection (Section 6.3.3). 91 Stadnard Wasm Binary Intraprocedural CFG Hash Function Inference User Consent Call Graph Interprocedural CFG Detection Result MinerRay CFG Construction Cryptojacking Detection Binary Converter/ JS-Wasm Compiler Programming Language Lifting asm.js-derived Wasm Binary asm.js JavaScript Program Program Abstraction * IR: Intermediate Representation Figure 6.1: Overview of MinerRay. fig:minerray-overview 6.3.1 Programming Language Lifting sec:program_lifter Cryptominers are usually implemented in multiple languages such as WebAssembly and JavaScript. Although the main module is usually written in one language, its cryptomining algorithm components can be written in both WebAssembly and JavaScript [340]. Such diversity imposes a significant challenge in analyzing cryptominers in the wild, as they can be implemented by a combination of the above diverse languages. To effectively analyze cryptominers, we first convert non-standard asm.js-derived WebAssembly to standard WebAssembly binaries. Then, we lift the standard WebAssembly into a high-level intermediate representation (Section 6.3.1). We then leverage existing JavaScript parsers to generate abstract syntax trees representations of the JavaScript files. We develop our detection analyses leveraging both representations to enable a cross-language analysis. subsec:programabstraction We introduce a set of abstraction rules to translate WebAssembly opcodes to our IR, which is later used to capture high-level semantics correspond to the critical tasks in cryptominers. Table 6.1 shows a subset of the abstraction rules. Due to the space limit, we only list the key WebAssembly instructions: accessing local variables and selective unary/binary operations with operand(s) of type i32 (32-bit integer). Operations 92 Table 6.1: Abstraction Rules. table:abstraction-rules Instruction Stack Operation1,2,3 Def Set Computation Abstraction Instruction Stack Operation Def Set Computation Abstraction get_local v push(v) loop g: g: set_local v pop()→e Def(v) = Def(v) ∪ {e l } v = e; if I pop()→e if (e) { tee_local v top()→e Def(v) = Def(v) ∪ {e l } v = e; br g goto g; i32.const c push(c) br_if g pop()→e if (e) goto g; i32.add pop()→e1, pop()→e2, return if(retLen()==1) return e; or push(e1 + e2) pop()→e return i32.shl pop()→e1, pop()→e2, block B B: { push(e2 ≪ e1) i32.gt_s pop()→e1, pop()→e2, end } push(e2 > e1) i32.load call f paraNum(f)→n, pop()→e, push(R32(e)) for(i = n;i > 0;i-–) f(e1,e2,...,en); pop()→ei i32.store call_indirect pop()→e, f(e1,e2,...,en); pop()→e1, Def(e2) = Def(e2) ∪ W32(e2,e1); paraNum(e)→n, pop()→e2 {addr(e1) l } for(i = n;i > 0;i-–) pop()→ei 1. retLen() gives the number of return values (either 1 or 0) of current function. 2. paraNum(f) returns the number of return values of function f. 3. func(e) returns the function name by checking the function table with index e. on global variables (get_global, set_global) and value types other than i32 (i64, f32, and f64) are modeled similarly and thus omitted. 6.3.1.1 Abstracting Stack Operations WebAssembly execution is based on a stack machine architecture, so, our abstraction models WebAssembly instructions based on the effects on the stack. It captures the data and control flows. In particular, for instructions that do not assign variables or contain control constructs, we only model their execution on a virtual stack. “i32.const c” pushes an i32 constant c onto the stack. “i32.shl” pops two values from the stack, performs a shift left operation and pushes the resulting i32 value back to the stack. “i32.load” pops the top value on stack as an address, reads 4 bytes from that address and pushes the resulting i32 value onto the stack. Note that the prefix i32 indicates the size of the operand is a 32-bit integer. For a 64-bit integer, i64 will be used as a prefix. 93 For assignments, we update the alive variable definitions and generate C-like abstractions. For example, “set_local v” pops a value e from the stack and assigns e to v. This instruction is abstracted to “v = e”. “tee_local” is similar to “set_local”, except that the top value on the stack is not popped. “i32.store” is modeled by “W32(e2, e1)”, where the e1 and e2 are the values popped from the stack firstly and secondly. 6.3.1.2 Abstracting Structured Control Flow Constructs WebAssembly supports control flow constructs such as block, loop and if. We model them as abstract control flow modules. Conditionals and loops are modeled by goto and guarded goto: “loop g:” creates a label g and is abstracted as “g:”. “if I” is abstracted to “if (e)”, where the condition e is popped from the stack. “br g” is a jump to label instruction. “br_if g” is a conditional jump to label g with the condition popping from the stack. A return instruction either returns a value that is popped from the stack or returns a special value “None”. Direct calls are parameterized with explicit function names whereas indirect calls are parameterized with an index to the function table. Function parameters are obtained from the stack. Note that each instruction is annotated with a label, denoting its line number in the program. (b) Merge Consecutive Writes in a Loop. W32 (da, R32 (sa)); W32 (da+4, R32 (sa+4)); . . . W32 (da+4i, R32 (sa+4i)); memcpy (da,sa,16i); L: W8 (da+idx, R32 (sa)); sa = sa + 1; idx = idx + 1; if (idx < idx+len) goto L; memcpy (da+idx, sa, len); sa = sa + len; idx = idx +len; (a) Merge Sequential Writes. Figure 6.2: Abstraction Merging Rules. fig:merge 94 6.3.1.3 Abstracting Memory Operations We observe that hash functions extensively use consecutive memory copy and access operations. We further merge consecutive memory buffer writes based on the rules summarized in Figure 6.2. Sequential writes to consecutive memory buffers are simplified to a single memset or memcpy statement (Figure 6.2(a)), where parameters “da” and “16i” represent the destination address and the number of bytes to be written respectively. The semantics of writing to consecutive memory buffers can also be realized by using a loop to write each byte (Figure 6.2(b)). 6.3.2 CFG Construction sec:cfg-construction Given the lifted IR obtained in Section 6.3.1, we extend our analysis to understand the entire program. Specifically, we construct a control flow graph for each procedure and link them together to obtain an interprocedural control flow graph for the program. 6.3.2.1 Intraprocedural CFG We first build an intraprocedural CFG for each function. Then, we identify loops and repetitive operations (i.e., manually unrolled loops) to further abstract them to higher level representations. For example, a “for” construct that stores the same constant value (e.g., 0) in consecutive memory will be replaced by a memset abstraction. 6.3.2.2 Interprocedural CFG (ICFG) We link intraprocedural CFGs together to build an interprocedural CFG (ICFG) and represent the entire program. There are two major challenges: (1) A recursive call in a program can introduce loops. When a recursive function call is found, we simply treat it as an iterative operation for further analyses. (2) An indirect function call may result in an imprecise interprocedural CFG. As targets of indirect calls are 95 only known at runtime, MinerRay conservatively assumes that any functions that match with the callee’s function type can be possible targets. This results in a larger ICFG but does not introduce false negatives. 6.3.3 Hash Function Detection sec:design-hash-function A hash function generally has these critical steps to (1) initialize hash state, (2) hash each message blocks, (3) store remaining data in a temporary buffer, (4) pad and hash the last partial block, and (5) generate the hash from the final state. As suggested in [84, 23, 240], almost all existing cryptographic hashes can be described as functions based on a block cipher, where steps 2, 3 and 4 are necessary in block cipher implementations. Therefore, the key idea of our approach is to see if a program exhibits semantics matching above steps. 6.3.3.1 Hash Function Semantics subsec:hashfunction_semantics_template We formulate the semantics of above critical steps as a template. Steps 1 and 5 model the initialization and the result fetching. They are not strong signals due to the variances in hashing algorithms and excluded as patterns. • Step-2 captures the iterative block hashing loop. In each iteration, the compression function is invoked on each block. The block size, the message pointer, the original message length and the remaining message length should be inferred. • Step-3 processes the remaining message. If the remaining message exists, it is copied to a temporary buffer. The temporary buffer and buffer length should be identified. • Step-4 represents the padding process for the remaining message. A memset or memcpy is used to fill out the buffer to the full block size, starting from the current temporary buffer length. 96 6.3.3.2 Matching Models to Programs Algorithm 1 describes the algorithm to identify the critical steps: D_HashEachBlock recognizes message block hashing loops. D_StorePartialBlock checks if a partial block is copied to a temporary buffer. D_PadLastBlock detects if the last partial block is padded. The detection algorithm takes the control flow graph (CFG) of the abstracted program and the common sizes of block ciphers (e.g., 256 bits) as input. The final output represents potential hash algorithms identified from the program. Algorithm 1: Inferring Hash Functions alg:detection Input: CFG(V, E), BSIZES[]={256,512,1024} Output: HashCandidates[], HashEachBlock[], StorePartialBlock[] 1 D_HashEachBlock(CFG(V,E),BSIZES): 2 for loop l ∈ V do 3 for size ∈ BSIZES do 4 cond_stmt ← conditional statement in l alg:partition_start 5 de_stmt, in_stmt ← statement in l de/increased by size 6 if (de_stmt||in_stmt) && MoreFullBlocks(cond_stmt) then 7 block_size ← size 8 rem_msglen, block ← operand in de_stmt, in_stmt 9 msglen ← get last definition of rem_msglen 10 HashEachBlock = HashEachBlock ∪ {l, block_size, rem_msglen, block, msglen} alg:partition_end 11 D_StorePartialBlock(CFG(V,E), HashEachBlock): 12 for construct l ∈ HashEachBlock do 13 for conditional node m ∈ Successors(l) do 14 cond_stmt ← conditional statement in m 15 if (PartialBlockExists(cond_stmt) && memcpy exists) then alg:PartialBlockExists 16 tmp_buf, rem_msg, rem_msglen ← memcpy args 17 tmp_buflenptr ← store rem_msglen 18 StorePartialBlock = StorePartialBlock ∪ {l, m, tmp_buf, tmp_buf lenptr} 19 D_PadLastBlock(CFG(V,E), StorePartialBlock): 20 for construct m ∈ StorePartialBlock do 21 for conditional node n ∈ Successors(m) do 22 cond_stmt ← conditional statement in n 23 if IsNotFullBlock(cond_stmt) && memcpy/set exists then 24 if PadConstantsToBuffer(args of memcpy/set) then 25 l ← HashEachBlock construct of m 26 HashCandidates = HashCandidates ∪ {l, m, n} D_HashEachBlock iterates all loop nodes in the CFG to detect nodes that partition a message to fixedsize blocks. Specifically, the function checks the loop condition and in/decrement expressions (lines 4-10). If a variable is decremented by a particular block size, it is a candidate of the remaining message length (rem_msglen). Similarly, the current message pointer block is inferred if a variable is increased by the 97 block size. It compares the lengths of the remaining message and the full block in a conditional statement to match against hashing iteration predicates. D_StorePartialBlock checks the successors of the nodes in HashEachBlock and looks for predicates guarding the statements that copy the remaining message to a temporary buffer. At line 15, a helper function PartialBlockExists checks if a partial block exists. The condition should either compare the remaining message length (rem_msglen) with 0 or compare the current message pointer (block) with the entire message size (msglen). If the partial block exists and memcpy can be found within this node, the pointer to the temporary buffer (tem_buf) and its length (tem_buflenptr) can be inferred. Then, StorePartialBlock is updated with the ancestor loop node (belongs to HashEachBlock), the current predicate node and the inferred variables. D_PadLastBlock checks successors of the obtained nodes and looks for predicates padding the last partial block. IsNotFullBlock compares the temporary buffer length with the block size and checks memory operations (memcpy/memset). Particularly, if the destination address starts from “tmp_buf + tmp_- buflenptr” and the length equals to “block_size - rem_msglen”, we consider the last partial block is padded. Once the detection algorithm identifies the nodes representing above hashing steps, MinerRay further checks if the nodes are within a loop construct to decide the presence of a cryptominer. 6.3.3.3 User Consent Inference sec:design-user-consent We consider miners that inform users of crypto mining activity legitimate. If a detected miner does not seek consents from users, we say it is involved in a cryptojacking attack. To this end, we inspect a web page with miners and see if the user is informed or not. Specifically, we first look for simple strings like “mining", “CPU", etc... on the web page. To determine if the user permission is requested, we explore HTML user events and see if the miner instantiation can be triggered without user actions. 98 button.onClick = () => { xhr.onreadystatechange = function() { if (xhr.readyState === xhr.DONE) { p.postMessage({type: 'auth-success'}); } }; xhr.open('POST','authedmine.com/ auth/'); xhr.send('auth&key=' + siteKey); } 1 2 3 4 5 6 7 8 9 obj.onClick() Network Response parent.onMessage() miner.work() miner.hash() Figure 6.3: User Consent Call Graph on synonymus.cz. fig:consent-call-graph Figure 6.3 illustrates how synonymus.cz requests user consent using a pop-up message. The JavaScript snippet creates an onClick event handler for the “Allow for this session” button. When the button is clicked, the onClick handler will eventually triggers the miner instantiation, starts the WebWorker threads (Miner.work()), and starts mining by calling the Miner.hash(). Without losing generality, we assume user consent is collected by clicking HTML elements. We focus on exploring invocations of the onclick events of HTML objects that may instantiate WebAssembly cryptominers. However, because the JavaScript snippets instantiating cryptominers are usually obfuscated (especially in malicious scenarios), static methods are unlikely to work. Therefore, we develop a simple dynamic approach that explores the onclick events and checks if WebAssembly instantiation APIs such as WebAssembly.instantiate can be invoked. Although event explorations could be improved using the methods presented in [13, 230], we observed this simple approach is sufficient in practice, because, in part, most malicious websites instantiate WebAssembly miners without user interference. 99 6.4 Evaluation sec:minerray-eval 6.4.1 Experimental Setup 6.4.1.1 Implementation We use WABT [317] to convert WebAssembly binary to its text format. We develop a parser in Node.js to construct control flow graphs. To infer user consent, we use Esprima [130] to instrument JavaScript and build the dynamic call graphs. 6.4.1.2 Dataset MinerRay was evaluated on the top 1.2 million websites from the Alexa Top 10 Million Websites list. In total, we have crawled and investigated 1,246,074 websites. These websites were crawled starting from January 2019 to January 2020. For each website, we crawled the homepage and waited an additional 5 seconds after the page is fully loaded. We also visited all links on the homepage and crawled these sub-pages. We compiled the Chromium web browser with “–dump-wasm-module” flag to dump all WebAssembly binaries that it encounters. 6.4.1.3 Cryptominer Detection Tools We compare MinerRay with three state-of-the-art WebAssembly cryptomining detectors (MineSweeper [170], CMTracker [135] and Outguard [162]) and two signature-based browser extensions (No Coin [161] and minerBlock [74]). MineSweeper detects CryptoNight-based miners by counting the number of bit operations to recognize cryptographic hash functions. CMTracker detects cryptominers by calculating cumulative time spent on signature hash functions and profiling stack structures for repetitive behavioral patterns. If 10%+ of the execution time is spent on hash functions or a repeated call chain occupies 30%+ of the whole execution time, it is considered to be a miner. Outguard uses machine learning techniques to detect 100 cryptominers. It develops a set of features (such as the presences of WebAssembly and signature hash functions, and the numbers of web workers/WebSocket connections) from the dynamic trace collected by an instrumented web browser. No Coin and minerBlock are blacklist-based techniques. No Coin uses a blacklist to block network requests matching the URL patterns on the list. minerBlock leverages both blacklist and script-scanning to look for potentially dangerous mining patterns, which makes it effective in detecting miners with code embedded into the JavaScript file. 6.4.2 Cryptominer Detection Results 0 50 100 150 Number of Websites Figure 6.4: Miners by Alexa Rank. fig:results_by_alexa_rank In total, MinerRay found cryptominers on 901 websites. Figure 6.4 shows the number of sites serving cryptominers. We observed the number of cryptominers decreases for lower ranked websites. We investigate miner distributions on the landing pages or subpages. Among the 901 websites, we observed 560 miners on the landing pages and 341 on subpages. We manually check the purpose of the websites. Most of detected sites are torrent websites, movies and videos, etc..., which is probably because users are likely to stay longer on these streaming websites for more profits. 0 50 100 150 200 250 Number of Websites Figure 6.5: Miners by Cryptomining Services. fig:crypto_services 101 We observed 12 unique mining services. Figure 6.5 shows the number of sites using each service. Note that the same site could have used multiple miners that connected to different mining services. As shown in the figure, CoinHive was the most popular mining service deployed on 237 websites, followed by CryptoLoot found on 186 websites. 6.4.3 Comparison Study sec:eval-comparison 901 (Detected) 900 (Detected) 809 (Detected) 810 (Detected) 742 (Detected) 838 (Detected) 6 (False Alarm) 1406 (False Alarm) 1 (Missed) 92 (Missed), 1 (False Alarm) 91 (Missed), 2 (False Alarm) 159 (Missed) 63 (Missed) 0 500 1000 1500 2000 2500 MinerRay MineSweeper CMTracker Outguard No Coin minerBlock Number of Samples Detected False Alarm Missed Figure 6.6: Comparison with State-of-the-Art Detectors. fig:detectors_comparison From our dataset, we scan 3,825 websites. Specifically, we run the three existing tools and MinerRay on the websites. To that end, we find that there are 2,306 websites that are detected by at least one detector. Among the 2,306 websites, we manually analyze each of them to verify whether the website conducts cryptomining or not. As shown in Figure 6.6, there are three outcomes: “Detected”, “False Alarm”, and “Missed”. “Detected” means that the websites detected by each tool are correctly identified as cryptominers. “False Alarm” represents the cases that are not cryptominers. We manually verified all false alarms and observed that they are typically from online games, cryptography programs, and compression utilities. “Missed” indicates the websites have cryptominers but are not detected. Note that we manually verified all 2,306 websites detected by at least one detector in our experiment, which forms the ground truth of our experiment. As shown in Figure 6.6, MineSweeper and MinerRay detect the most samples correctly. Other techniques miss many websites containing malicious cryptominers (92, 91, 159, and 63 by CMTracker, Outguard, No 102 Coin, and minerBlock, respectively). However, MineSweeper significantly suffers from false alarms. It detects 1,406 websites that turned out to be benign. The result shows that MinerRay outperforms existing techniques. It detects the most malicious websites, without causing a significant number of false alarms. Note that MinerRay raises 6 false alarms where it incorrectly identified non-hashing or benign WebAssembly code as miners, which are from a game and cryptographic library. Those cases follow similar patterns described in Section 6.3.3.1 (e.g., block-based data computations and copy operations). However, the data content and sources differ from real cryptominers. To handle the cases, we may need to implement data-flow analysis to understand more details regarding the data going through the computations. We leave this as our future work. 6.4.4 User Consent Results sec:user-consent Out of 901 websites with cryptominers, we found only 16 websites informed users of the background cryptomining. In particular, 13 (81%) started to mine automatically except for only 3 websites that ask for consent before starting the mining. Nearly half of them (7 out 16) do not offer a way to disable the miner. 15 do not allow users to limit CPU usage. 5 websites present unnoticeable text warnings at the bottom of the page. Additionally, the text messages generally describe that a cryptocurrency (e.g., XMR) will be mined, which is difficult for a non-technical user to understand. Note that, if we consider those cryptominers that inform users as non-malicious, other cryptominer/cryptojacking detectors cause false alarms because of them. In particular, we find that the 16 websites MinerRay identified to inform users are all detected by existing detectors, leading to 16 additional false alarms for other detectors except for MinerRay. 103 6.4.5 Performance and Memory Overhead We report the average space and runtime overhead for scanning a sample program (e.g., JS and WebAssembly). Specifically, the average file size (the second column) MinerRay processed during the evaluation is 447.39 KB. The average memory overhead by MinerRay at runtime is 37.23 MB, which is fairly small and negligible in modern machines. To scan a program, MinerRay constructs both intra- and inter-procedure control flow graphs. The average time spent on the graph construction is 427.78 ms. Then, MinerRay takes 1,398.6 ms to scan the program to determine whether it is a malicious cryptominer or not. Overall, the average scanning time is less than 1.9 seconds (1,825 ms). 6.5 Discussion 6.5.1 Evaluation Data Collection Methodology The web crawlers we built visited the homepage of each website and waited an additional 5 seconds after the page is fully loaded. In addition, we visited all available links on the homepage and crawled these sub-pages to increase the chance of observing cryptominers. It is possible that cryptominers are on particular pages that the web crawlers did not visit or that may be triggered only under certain user navigation patterns or idle time. Even though testing a website with comprehensive coverage is orthogonal to our work, MinerRay may find more cryptominers if being incorporated with more effective web crawling techniques. 6.5.2 Generality of Our Technique As we discussed in Section 6.3.3, almost all existing cryptographic hash functions can be described as being based on a block cipher, where the semantics are necessary for block cipher implementations. Thus, our technique is general. However, to support a new language other than WebAssembly or JavaScript, the IR would need to be defined on how the steps within the technique are commonly implemented in the new 104 programming language. Depending on the complexity of the language, this may be challenging to create. Additionally, since other languages support many external libraries providing hashing functionality, it may increase the difficulty in creating a mapping to the IR that is complete. 6.5.3 Complementary to Existing Approaches Existing detectors rely on signatures or runtime features and do not consider program semantics. We believe considering semantics could be complementary to existing techniques. In addition, existing tools require manual efforts to complete the user consent analysis, while MinerRay can infer user consents by analyzing the program. 6.6 Summary This chapter presents MinerRay, an effective detection technique which automatically detects the probable existence of stealthy cryptominers on a website. Instead of focusing on particular URLs or signature functions, MinerRay identifies the presence of malicious cryptominers by inferring hash function semantics and reasoning about user consent of mining activity. We provide a systematic study of in-browser cryptominers on Alexa Top 1.2 Million websites. Our evaluation results show that MinerRay achieves high accuracy and is more effective than signature-based approaches in detecting stealthy cryptominers. In addition, we study the methods websites alerted users to mining activity and provided recommendations for better practices. 105 Chapter 7 Wobfuscator: Obfuscating JavaScript Malware via Opportunistic Translation to WebAssembly chp:wobfuscator Chapter 6 presents a malicious use case of WebAssembly, in-browser cryptojacking, as well as static detection technique to identify offending modules. In this chapter, we use program analysis to enable a novel JavaScript obfuscation technique leveraging WebAssembly. Through this attack scheme, we demonstrate the limitations of current static analysis approaches for WebAssembly applications as they lack support for an important feature: cross-language analysis. We describe Wobfuscator, the first technique for evading JavaScript malware detection by moving parts of the computation into WebAssembly. The code obfuscation technique transforms a given JavaScript file into a new JavaScript file and a set of WebAssembly modules. By changing the malicious JavaScript code, our work aims at evading static malware detectors. The rationale is that static detectors may be used on their own or serve as a filter for which scripts to analyze dynamically. That is, evading static malware detectors gives a huge benefit to attackers. Transforming parts of a JavaScript file into WebAssembly is far from trivial. JavaScript is dynamically typed, has complex objects, and provides direct access to browser APIs. In contrast, WebAssembly is statically typed, has only four low-level, primitive data types, and it can access browser APIs only indirectly 106 by importing them from JavaScript. Because of these fundamental differences, general JavaScript-toWebAssembly translation is practically impossible, which is reflected in the fact that WebAssembly never has been touted as a replacement for JavaScript but as a way to complement it [123]. The core technical contribution of Wobfuscator is a set of code transformations that extract carefully selected parts of behavior implemented in JavaScript for translation into WebAssembly. The approach is opportunistic in the sense that it translates JavaScript to WebAssembly where it helps to evade malware detectors, without compromising the correctness of the code. We evaluate Wobfuscator with a dataset of 43,499 malicious and 149,677 benign JavaScript files, as well as six popular JavaScript libraries. Our results show the following. First, the approach is effective at evading state-of-the-art, learning-based static malware detectors: Applying our transformations reduces the recall of the four studied detectors [254, 76, 97, 96] to 0.18, 0.63, 0.18, and 0.00, respectively. Second, the obfuscation preserves the semantics of the transformed code: Obfuscating six popular JavaScript libraries and running their 2,017 tests shows no observable changes in the behavior of the tested code. Finally, we find that our tool only takes on average 8.9 seconds to apply all the transformations to a project (with on average 4,152 lines of code) and adds on average 31.07% of overhead during runtime. Overall, these results show that Wobfuscator is practical for use in real-world programs. The content in this chapter is taken from its corresponding publication [261]. The author of this dissertation is also the main author of the corresponding paper and performed the implementation of Wobfuscator, designed and performed the evaluation of Wobfuscator, and handled the majority of the paper writing. 7.1 Motivation and Contributions Various kinds of attacks target the browser, e.g., drive-by malware [250, 72, 196], malicious code deployed via script-based browser augmentation markets [311], browser-based cryptomining without user consent [170, 107 264, 135], malicious browser extensions [44, 346], and browser-based phishing [10]. A recent report estimates that orchestrated phishing campaigns alone create 1.7 to 2 million malicious payload URLs each month [215]. To protect users against executing malicious scripts in their browser, JavaScript malware detectors warn about such scripts before or while executing them. One line of work statically analyzes scripts before they are executed [76, 254, 288, 97, 96], e.g., by intercepting them already in the network, as part of a browser, or as part of a separate anti-virus tool. As a response, attackers try to hide the maliciousness of scripts via obfuscation and evasion techniques [268, 280]. Progress by attackers and defenders leads to an arms race between increasingly sophisticated obfuscation and evasion techniques on one hand and increasingly effective malware detectors on the other hand. Currently, the most effective malware detection techniques use learning-based classifiers to distinguish malicious JavaScript files from benign JavaScript files [76, 254, 288, 97, 96]. While the focus on JavaScript historically makes sense, JavaScript is not the only language of the client-side web anymore. WebAssembly [123] is another language that is widely available in browsers. In addition to the many positive uses of WebAssembly, the language provides a new opportunity to attackers for evading malware detectors – an opportunity that, to the best of our knowledge, has not yet been explored. For these reasons, we develop Wobfuscator, a code obfuscation technique that hides JavaScript code by partially translating it into WebAssembly code. In summary, the contributions of this project are the following: • The first technique to use WebAssembly as a means for obfuscating the behavior of malicious JavaScript code. • A set of code transformations that translates carefully selected JavaScript code locations into WebAssembly. 108 Input: Malicious JavaScript Output: JavaScript + WebAssembly var foo = "evil"; eval(...); JavaScript AST Approach • Identify & check Translation sites • Generate WebAssembly • Rewrite JavaScript Parse Potential Translation Site (module (memory ...) (func + ...) Instantiation Site Async Wrapper Translation Site Figure 7.1: Overview of Wobfuscator. fig:overview • A comprehensive evaluation showing that the approach effectively evades state-of-the-art static malware detectors while preserving the semantics of the original code. 7.2 Design subsec:overview-challenges sec:design Figure 7.1 gives an overview of Wobfuscator. The input is a JavaScript file, which we parse into an AST. Next, the approach identifies potential translation sites, i.e., code locations that (i) are relevant for detecting malicious code and (ii) can be translated into WebAssembly in a semantics-preserving way. Instead of aiming at a general JavaScript-to-WebAssembly translation, the approach opportunistically targets only those code locations that fulfill these two requirements. To move behavior into WebAssembly, Wobfuscator generates WebAssembly code for each translation site and then transforms the JavaScript AST to utilize the generated code. The AST is transformed in three ways. First, at an instantiation site, we add code to load the WebAssembly module into the application. Second, at each of the selected translation sites, we modify the code to access properties and functions provided by the WebAssembly module(s). Third, at the root node of the AST, we conditionally wrap the script into an anonymous, async function, referred to as the async wrapper, to support asynchronous keywords in the code. The remainder of this section explains these transformations in detail. Finally, the output of Wobfuscator is the transformed JavaScript code along with one or more generated WebAssembly modules. 109 We encountered several challenges in designing transformations that move JavaScript behavior into WebAssembly modules. These challenges include: (i) handling the limited data types that can be passed between WebAssembly and JavaScript, (ii) handling JavaScript’s complex scope rules, and (iii) handling WebAssembly’s limited control-flow constructs. Because of these and other differences, Wobfuscator is based not on complete but opportunistic translation, i.e., transforming code where it helps to evade malware detectors, without sacrificing correctness. 7.2.1 Transformations subsec:transformations The goal of our approach is to generate WebAssembly modules that can reproduce the functional behavior of specific JavaScript code snippets. To produce these modules, Wobfuscator is constructed on a core set of transformation rules: def:transformation Definition 2 (Transformation rule) A transformation rule is a tuple (L, t, p) where: • L represents a set of code locations where the transformation may apply, • t is a transformation function that maps JavaScript at a code location in L to rewritten JavaScript code and one or more WebAssembly modules, and • p is a precondition for applying t, expressed as a predicate on a code location and its surrounding context. We present seven transformation rules that target different language features of JavaScript. The transformation rules target: (i) data literals (ii) function calls, and (iii) control flow constructs. Table 7.1 illustrates the transformation rules. The transformation rules use several JavaScript primitives to interact with WebAssembly: • instanWasm(source, impObj) instantiates a WebAssembly module from source and returns the module. The optional parameter impObj is an object containing the functions to be imported into the created WebAssembly module. 110 Table 7.1: Transformation Functions. table:transformation-functions Rule JavaScript (Before) JavaScript (After) WebAssembly (After) T1-StringLiteral 1 var s = "lit"; 1 // Instantiation Site 2 let m = instanWasm(source); 3 let buf = m.exports.memory.buffer; 4 // Translation Site 5 let startInd = m.exports.d1; 6 var s = loadStrFromBuf(buf, ,→ startInd); 1 (global $d1 (export "d1") ,→ (mut i32) (i32.const 0)) 2 (memory (export "memory") 1) 3 (data $data0 (i32.const 0) ,→ "lit\00")) T2-ArrayInitialization 1 var arr = new ,→ Array(); 2 arr[i1] = num1; 3 arr[i2] = num2; 1 // Instantiation Site 2 let m = instanWasm(source); 3 let buf = m.exports.memory.buffer; 4 // Translation Site 5 m.exports.f(); 6 var arr = loadArrFromBuf(buf, ,→ startInd, len); 1 (memory (export "memory") 1) 2 (func $f (export "f") 3 i32.const $i1 4 i32.const $num1 5 i32.store 6 i32.const $i2 7 i32.const $num2 8 i32.store) T3-FunctionName 1 eval(str); 1 // Instantiation Site 2 let m = instanWasm(source); 3 let buf = m.exports.memory.buffer; 4 // Translation Site 5 let startInd = m.exports.d1; 6 window[loadStrFromBuf(buf, ,→ startInd)](str); 1 (global $d1 (export "d1") ,→ (mut i32) (i32.const 0)) 2 (memory (export "memory") 1) 3 (data $data0 (i32.const 0) ,→ "eval\00")) T4-CallExpression(a) 1 f(a); 1 // Translation Site 2 let impObj = {imports: {impFunc: () ,→ => f(a)}}; 3 let m = instanWasm(source, impObj); 4 m.exports.f0(); 1 (func $f0 (export "f0") 2 call $impFunc) ;; JS ,→ import T4-CallExpression(b) 1 let r = f(a); 1 // Translation Site 2 let impObj = {imports: {impFunc: ,→ f}}; 3 let m = instanWasm(source, impObj); 4 let r = m.exports.f0(a); 1 (func $f0 (export "f0") (param externref) (result externref) ,→ ,→ 2 local.get $p 3 call $impFunc) ;; JS ,→ import T5-IfStatement 1 2 if(cond) { 3 stmt1; stmt2; ... 4 } else { 5 stmt3; stmt4; ... 6 } 1 // Translation Site 2 let impObj = {imports: { 3 imp1:() => {stmt1; stmt2; ...}, 4 imp2:() => {stmt3; stmt4; ,→ ...}}}; 5 let m = instanWasm(source, impObj); 6 m.exports.f(cond ? 1 : 0); 1 (func $f (export "f0") ,→ (param $p) 2 local.get $p 3 if ;; label = @1 4 call $imp1 ;; JS import 5 else 6 call $imp2 ;; JS import 7 end) T6-ForStatement T7-WhileStatement 1 for(init;cond;incre) ,→ { 2 stmt1; stmt2; ... 3 } 4 5 while(cond) { 6 stmt1; stmt2; ... 7 } 1 // Translation Site 2 init; 3 let impObj = { 4 imports: { 5 cond:() => {return cond ? 1 : ,→ 0}, 6 incre:() => {incre}, 7 body:() => {stmt1; stmt2; ...} 8 } 9 }; 10 let m = instanWasm(source, impObj); 11 m.exports.f(); 1 (func $f (export "f0") 2 block $L0 3 loop $L1 4 call $cond ;; JS ,→ import 5 i32.eqz 6 br_if $L0 7 call $body ;; JS ,→ import 8 call $incre ;; JS ,→ import 9 br $L1 10 end 11 end) 111 • loadStrFromBuf(buffer, startIndex) creates a string from the buffer starting from the byte offset startIndex and ending at the first null byte (i.e., \00) after startIndex, where buffer is the WebAssembly module linear memory. • loadArrFromBuf(buffer, startIndex, length) creates an array from buffer of size length starting from the byte offset startIndex, where buffer is the WebAssembly linear memory. 7.2.1.1 Obfuscating String Literals sec:transformation string literals JavaScript malware frequently uses encoded strings to hide malicious code [349]. These encoded strings can be critical for malware detectors which learn the string patterns and their encoding schemes [76, 278]. To evade the detection of encoded strings, we define a transformation rule T1-StringLiteral (LT1, tT1, pT1) where: LT1. The code locations where transformation rule T1 may apply are all AST nodes of Literal type with string values. tT1. The transformation function is defined in row T1-StringLiteral in Table 7.1. To obfuscate a string literal “lit", tT1 generates a WebAssembly module that defines a memory and exports it to JavaScript (line 2). The memory is used to store the string literal “lit” at offset 0 (line 3). To reconstruct this string in JavaScript, a variable $d1 containing the offset is defined and exported (line 1). Each string is terminated with a null byte (i.e., \00), so it can be reconstructed by reading the linear memory from the starting index until the first null byte is found. The variable buf points to the memory exported by WebAssembly (line 3). At the translation site, a variable startInd gets the starting index of the string literal stored in linear memory (line 5). Finally, the string “lit" is reconstructed using the primitive loadStrFromBuf with arguments buf and startInd (line 6). 112 pT1. This transformation can be applied in locations where JavaScript allows for replacing a string literal with a function call. Specifically, this excludes string literals used in import or require statements, as such strings should be known when bundling modules together. 7.2.1.2 Obfuscating Arrays sec:transformation arrays Malicious files often contain arrays of numeric literals representing character codes used to reconstruct malicious strings. To obfuscate these arrays, we exploit the fact that the linear memory of WebAssembly modules is implemented through an array buffer, which can naturally map to a JavaScript numeric array, in transformation rule T2-ArrayInitialization (LT2, tT2, pT2) where: LT2. T2 may apply on NewExpression AST nodes (e.g., new Array) or ArrayExpression AST nodes representing array literal expressions (e.g., [1,2,3]). In addition, the transformation also condenses any following AssignmentExpression nodes used to initialize the array values into a single JavaScript function call (e.g., arr[1] = 42; arr[2] = 97;. . .). tT2. The transformation tT2 is defined in row T2-ArrayInitialization in Table 7.1. The original JavaScript code creates an array arr and initializes the elements arr[i1] and arr[i2] with numerical values num1 and num2, respectively. After the transformation, tT2 produces a WebAssembly module that creates a memory and exports it (line 1). A function $f is defined which stores the numbers, num1 and num2, at the specified offsets, i1 and i2, inside the linear memory (lines 2-8). The transformed JavaScript code instantiates a WebAssembly module (line 2) and creates a variable to read from the exported memory (line 3). At the translation site, the export function f is called to write num1 and num2 to the linear memory at offset i1 and i2 (line 5). Finally, loadArrFromBuf() to returns a JavaScript Array object containing the numerical values copied from the linear memory buffer specified by a starting index and length (line 6). pT2. This transformation can be applied to arrays initialized with numeric literals. Specifically, we apply the transformation only if the array initialization is one of the following: (i) a new Array expression 113 followed by assignment statements inserting only numeric literals; (ii) a new Array expression with numeric literal arguments; (iii) an ArrayExpression only containing numeric literals. 7.2.1.3 Obfuscating Function Names sec:transformation function ids Several built-in JavaScript functions, such as the notorious eval function, are commonly exploited by attackers. As a result, detectors may consider the names of these built-in functions suspicious and use them as part of the signatures for malware detection [254, 76, 18]. To evade detection, we remove suspicious function names from the JavaScript code through a transformation rule T3-FunctionName (LT3, pT3, tT3) where: LT3. T3 may apply on CallExpression nodes or NewExpression nodes that contain specific identifier names. tT3. The transformation function is defined in row T3-FunctionName in Table 7.1. To obfuscate the function name eval, tT3 removes the function name from JavaScript and stores it in WebAssembly linear memory. In the WebAssembly code, a global variable $d1 is defined and exported with a value of 0 (line 1), which is the starting index of the function name “eval” stored in the linear memory. Next, a memory is created and exported (line 2). To initialize the linear memory, a data section is defined that contains “eval” at offset 0 (line 3). In the transformed JavaScript code, it instantiates a WebAssembly module (line 2) and defines a variable to access the linear memory (line 3). The variable startInd is assigned the value of the exported d1 that contains the starting index of “eval” in the linear memory (line 5). Finally, loadStrFromBuf is used to create the string “eval,” and eval is called from the window object with str, the same argument used in the original eval() (line 6). pT3. This transformation can be applied to call expressions or new expressions referencing global functions accessible through the window object. Specifically, we identify eight global functions commonly used in malicious files and apply the transformation only if the identifier is in the following list: eval, 114 escape, atob, btoa, WScript, unescape, escape, Function, and ActiveXObject. While these functions are not inherently malicious, many of the analyzed malware files use these functions to decode and execute hidden code. 7.2.1.4 Obfuscating Calls sec:transformation calls Aside from the suspicious functions, we construct a transformation rule T4-CallExpression for general JavaScript function calls. This transformation converts function calls in JavaScript into a call of an exported WebAssembly function, which in turn performs a function call in WebAssembly to an imported JavaScript function. Unlike T3-FunctionName, which completely removes suspicious function names from JavaScript, this transformation modifies the context of the function being used in call sites that AST-based malware detectors use when scanning for malicious code. There is a trade-off in this transformation between compatibility with the WebAssembly Minimum Viable Product (MVP) version and the amount of transformable function calls. Thus, we create two variations of this transformation: T4-CallExpression(a) is fully compatible with the WebAssembly MVP (i.e., uses no language extensions) but can only be applied on functions that do not return a value; T4-CallExpression(b) transforms functions with return values but requires the WebAssembly Reference Types proposal [338]. However, as this proposal is now included in the WebAssembly specification (as of version 2.0 [339, 111]), this method provides a different transformation rule for calls that leverages the latest features of WebAssembly. T4-CallExpression(a) is a transformation rule (LT4a, pT4a, tT4a) where: LT4a. LT4a are CallExpression nodes containing the identifier and arguments of a function call. tT4a. tT4a is defined in row T4-CallExpression(a) in Table 7.1. To obfuscate a JavaScript function call f(a), tT4a moves the function call into an anonymous function that is imported into WebAssembly (line 2). At the original call site, a WebAssembly export function f0 is called (line 4). In the WebAssembly code, the function that wraps the JavaScript function call is imported as $impFunc. The function $f0 is exported 115 and calls the imported function $impFunc (lines 1-2). Note that the anonymous function used to wrap the original JavaScript function always has the same type signature, i.e., a void function with no parameters. Thus, the same WebAssembly module can be compiled once and reused for every replaced function call, changing only the import object containing the appropriate JavaScript function. pT4a. This transformation can be applied to locations where the call expressions do not have a return value that is used in an assignment or in another expression, as the primitive data types of WebAssembly, i32, i64, f32, f64 cannot represent all possible JavaScript function return values. The generalized variant T4-CallExpression(b) is a transformation rule (LT4b, pT4b, tT4b) where: LT4b. LT4b is the same as LT4a. tT4b. tT4b is defined in row T4-CallExpression(b) in Table 7.1. The WebAssembly Reference Types proposal [338] adds the WebAssembly value type externref that can be used to pass references of arbitrary JavaScript values to and from WebAssembly. With this new type, the transformation only needs to import a reference to the transformed function. Specifically, the transformed JavaScript code only imports the function reference f into WebAssembly (line 2). In the original call expression, the function f is replaced with a WebAssembly export function f0 that takes in the argument a of the original call and returns any value that the original function outputs (line 4). In the WebAssembly code, the export function $f0 takes in a parameter of type externref and returns a value of type externref (line 1). Inside $f0, it calls the imported function $impFunc with a value that is the argument passed into $f0 (lines 2-3). The benefits of this transformation over T4a include supporting more function calls by including those with return values and moving more behavior into WebAssembly than T4a. pT4b. This variant of the call transformation can be applied only if: (i) the callee function does not contain a reference to this; (ii) the arguments of the call expression cannot contain a reference to this; (iii) the data dependencies of the variables referenced within the callee cannot contain this; (iv) the callee function is not a method of a literal value; and (v) the callee function is not the special functions bind or 116 super. Conditions (i)-(iv) must be met because the value of this is changed when calling the function from within the WebAssembly module, which can lead to incorrect behavior. Condition (v) is needed as bind and super have restrictions on how they are called, so these functions cannot be passed as imports to the WebAssembly module. 7.2.1.5 Obfuscating If Statements sec:transformation if By leveraging the control-flow instructions in WebAssembly, the behavior of if-else statements in JavaScript can be moved to WebAssembly, removing the syntactic information while preserving the semantics. To this end, we present transformation rule T5-IfStatement (LT5, pT5, tT5) where: LT5. The transformation rule applies to IfStatement nodes. tT5. The transformation is defined in row T5-IfStatement in Table 7.1. At the translation site, tT5 wraps the code within the if- and else-blocks in two anonymous functions that are imported into the WebAssembly module (lines 2-4). A WebAssembly export function f is called and the result of the test condition of the original if-statement is converted to a (zero or one) integer that is passed as the argument to f (line 6). Within the WebAssembly module, the two functions wrapping the code within the if- and elseblocks are imported as $imp1 and $imp2 (lines 4,6). The exported function $f takes in an integer parameter which will be either zero or one to function as a Boolean (lines 1-7). $f contains if-else instructions that are decided by the function parameter p. If p is non-zero, the if instruction calls $imp1 that contains the code originally in the if-block. Similarly, if p is zero, then the else instruction calls $imp2 containing the code originally in the else-block. By leveraging the if-else instructions in WebAssembly, the original semantics are preserved while the use of the control-flow statement is hidden from the JavaScript syntax. pT5. The if statements can be transformed if the code blocks do not include the keywords break, continue, return, yield, or throw. These keywords are not compatible with moving the inner code blocks into functions. 117 7.2.1.6 Obfuscating For and While Loops sec:transformation for To obfuscate for loops, we define transformation rule T6-ForStatement (LT6, pT6, tT6): LT6. The transformation locations where this rule applies are ForStatement nodes, which represent C-style for loops. tT6. The transformation is defined in row T6-ForStatement in Table 7.1. At the translation site, tT6 wraps the loop condition, increment and body in three JavaScript functions that will be imported to the WebAssembly module (lines 3-8). The loop counter initialization can be hoisted out of the loop scope safely (line 2). Similar to the T5-IfStatement transformation, the three JavaScript functions that are used to wrap the loop condition, increment, and body are imported as $cond, $body, and $incre, respectively (lines 4, 7, 8). The export function $f emulates the behavior of the for loop using the loop instruction and the three import functions. The original code site is replaced with a call to $f. pT6. The precondition is the same as pT5. Similar to for loops, we obfuscate while loops with rule T7-WhileStatement (LT7, pT7, tT7) where: LT7. The rule applies to WhileStatement nodes. tT7. The transformation, defined in row T7-WhileStatement in Table 7.1, is similar to tT6. The only differences are that in tT7, there is no loop increment nor loop counter initialization. pT7. The precondition is the same as pT5 and pT6. 7.2.2 Synchronous and Asynchronous WebAssembly Instantiation sec:sync async For each transformation rule in Section 7.2.1 (aside from T4-CallExpression(b)), we develop two variants differing in whether they instantiate the WebAssembly module synchronously or asynchronously, i.e., the implementation of the instanWasm() primitive. The synchronous variants implement the primitive by using the WebAssembly.Module and WebAssembly. Instance constructor functions. Figure 7.2 shows the code, where in line 1, variable m is set to the compiled 118 1 let m = new WebAssembly.Module( 2 new Uint8Array(decodeBase64('...'))); 3 return new WebAssembly.Instance(m, impObj); Figure 7.2: Synchronous WebAssembly Instantiation. fig:sync-variant WebAssembly.Module object. The WebAssembly.Module constructor accepts a typed array containing the module bytes. Hence, we encode the module bytes into a base64 string, which is decoded to a typed Uint8Array at runtime (line 2). On line 3, the module object m, along with the import object, is then passed into the WebAssembly.Instance constructor. The returned instance object can be utilized by the transformation functions. Since this is standard, synchronous code, there are no restrictions on how to integrate instantiation into the original JavaScript application. However, because the WebAssembly.Module constructor can block the JavaScript main thread, browser vendors discourage this method. Chromium in particular even limits the input module to at most 4KB in size [195] and throws an exception otherwise. To get around this limitation, each synchronous transformation can emit one or more WebAssembly modules. Specifically, transformations T3-FunctionName, T4-CallExpression(a), T5-IfStatement, T6- ForStatement, and T7-WhileStatement only emit one WebAssembly module per file transformed. T1- StringLiteral, T2-ArrayInitialization, and T4-CallExpression(b) can emit one or more modules since the data stored within the modules can grow larger than 4KB. When a single WebAssembly module grows too large, e.g., because it contains many string literals in its data section, we split it into multiple modules to keep each under 4KB. For string literals larger than 4KB, the string literal is split across multiple modules and joined in the string reconstruction phase. We also support asynchronous instantiation of the WebAssembly modules, which is the method that browser providers recommend. Asynchronous instantiation has several benefits, including unrestricted module size and the ability to put the generated WebAssembly modules in separate files. These benefits allow each transformation to only emit a single WebAssembly module. 119 1 async function someFunction(){ 2 // Transformation site: ... 3 await (async () => { 4 let mod = await WebAssembly.instantiateStreaming( 5 fetch("generated_module.wasm"), impObj); 6 //Transformation function code... 7 })() 8 } Figure 7.3: Instantiation of the Asynchronous Variant. fig:async-variant For these variants, the instanWasm() primitive is implemented via the WebAssembly.instantiateStreaming() function, as shown in Figure 7.3 (line 4). This API spawns compilation on a separate thread, thus not blocking the main thread of execution. Since this API returns a Promise, we need to add the await keyword to allow the Promise to resolve before continuing. Line 6 represents a placeholder for the transformation code of T1-T7. The await keyword can only be employed in asynchronous functions, so we wrap the instantiation in an async anonymous function (line 3). Similarly, since the anonymous function is an async function, its invocation must also have the await keyword added (line 3). The enclosing function, someFunction, now contains the await keyword, so the function definition must have the async keyword added (line 1). Elsewhere in the code, any function calls to someFunction would also require adding the await keyword. As this example shows, inserting the async/await keywords to a translation site causes these keywords to be propagated to functions and call sites throughout the file. This keyword propagation makes the asynchronous transformations non-trivial to design and implement. Specifically, we encountered three code locations that are difficult to propagate the async/await keywords to. First, anonymous functions used as parameters in other function calls, e.g., .map, are difficult to handle. Depending on the return type of the anonymous function, the await keyword may need to be added within the called function’s definition or to the function invocation. Second, class constructors cannot be made async, so a check must be done to detect if a constructor is in the call chain of any function. Third, if a 120 transformed function is exported from a module, any other files using the function as an import must be checked for functions and function calls to add async and await to. All of the transformation rules have both synchronous and asynchronous variants except for T4- CallExpression(b). T4-CallExpression(b) relies on an WebAssembly reference type features that impose complex preconditions. Adding the asynchronous restrictions to this may break the original semantics and lead to incorrect transformations. T4-CallExpression(b) exposes more translation sites than T4-CallExpression(a), increasing the number of edge cases that can be encountered. We leave this combination as future work. 7.2.3 Applying Transformations We now present the overall algorithm for applying these transformations to a given JavaScript AST. The input to the algorithm is a list of transformation rules and the AST of the original JavaScript file. The algorithm consists of three steps: (a) Identifying AST nodes where transformations should be applied, i.e., translation sites; (b) rewriting the AST by modifying the subtrees rooted at the translation sites; and (c) adding code to the AST root to instantiate the generated WebAssembly modules. The algorithm outputs the transformed AST corresponding to the obfuscated JavaScript code. Identifying AST Nodes as Translation Sites To identify translation sites, we perform a pre-order traversal of the AST starting at the root node. For each visited node n, the algorithm iterates through the list of transformation rules and checks which rules are applicable. A transformation rule (L, t, p) is applicable if the node n is in the set L of code locations and if the precondition p holds for n. A set of translation site nodes is produced for each transformation rule. Rewriting AST Subtrees After identifying all translation site nodes, the next step is to rewrite the subtrees rooted at these nodes. The algorithm applies transformations based on the size of the syntactic structures they target. Specifically, we iterate through the transformation rules in this order, applying each 121 rule to all applicable subtrees before moving on to the next rule: T1-StringLiteral, T2-ArrayInitialization, T3-FunctionName, T4-CallExpression, T5-IfStatement, T6-ForStatement, T7-WhileStatement. This ordering ensures that transformations targeting finer-grained syntactic structures, such as string and array literals, are performed prior to transformations targeting coarser-grained structures, such as loops. If more coarsegrained transformations were applied first, the change could prevent more fine-grained transformations from being applied. For each rule (L, t, p), the algorithm visits all translation site nodes and applies the transformation function t, which modifies the AST in-place and yields a WebAssembly module used in the rewritten code. The output of this step is the rewritten AST and a set W of WebAssembly modules. Adding WebAssembly Instantiation Code The final step is adding code to instantiate the WebAssembly modules W. To this end, the algorithm inserts statements at the beginning of the script, i.e., at the root of the AST. We encode each module in W as a base64 string and add statements that decode and instantiate the modules. 7.3 Evaluation We evaluate Wobfuscator and its ability to obfuscate malicious JavaScript using opportunistic translation to WebAssembly along the following main research questions: • RQ1 – Effectiveness: How effective is the approach at evading state-of-the-art JavaScript malware detectors and which transformations are most effective? How does our approach compare with other state-of-the-art obfuscators? • RQ2 – Correctness: Do our code transformations preserve the semantics of the transformed code? • RQ3 – Efficiency: How much runtime and code size overhead do the transformations impose, and how long does applying the transformations take? 122 To investigate these questions, we perform a comprehensive analysis on the effectiveness of our transformations in evading detection. We evaluate state-of-the-art detection tools against Wobfuscator on a large dataset of malicious and benign files. We evaluate the obfuscation advantage produced by Wobfuscator by comparing our approach against state-of-the-art open-source obfuscation tools. Lastly, we use the extensive test suites of widely used and mature npm modules to verify the correctness of our tool and demonstrate the runtime and code size overhead are acceptable for real-world usage. 7.3.1 Experimental Setup We describe the prerequisites necessary to evaluate our approach along the three dimensions described above. We describe our constructed dataset of JavaScript samples as well as the sets of state-of-the-art JavaScript malware detectors and state-of-the-art JavaScript obfuscators inspected. 7.3.1.1 Datasets Due to different requirements, we use different datasets of JavaScript programs for different research questions. To answer RQ1, we need to train and apply state-of-the-art JavaScript malware detectors to large sets of real-world benign and malicious JavaScript code. The benign code consists of 149,677 files from the JS150k dataset [253]. The malicious code consists of 43,499 samples, with 2,674 samples from VirusTotal [315], 39,450 samples from the Hynek Petrak JavaScript malware collection [242], and 1,375 samples from the GeeksOnSecurity malicious JavaScript dataset [202]. These datasets are broken down further by the malware categories that they contain, such as trojans, ransomware, droppers. Answering RQ2 and RQ3 requires executing code before and after applying our transformations. We use a dataset of popular and large JavaScript projects on NPM with their test suites. To identify suitable projects, we select from the most depended-upon NPM modules [4] six modules that contain extensive test suites (first column of Table 7.4). 123 7.3.1.2 JavaScript Malware Detectors subsec:javascript-malware-detectors We evaluate our obfuscation technique against four state-of-the-art, static, learning-based JavaScript malware detectors. To train them, we split the benign and malicious datasets into training, validation, and test sets containing 70%, 15%, and 15% of the samples, respectively. We follow the steps provided by each project to train the detection models with the desired configuration. Cujo [254] is a hybrid JavaScript malware detector that detects drive-by download attacks. It performs a lexical analysis of JavaScript files run on a website as well as a dynamic analysis by monitoring abstracted runtime behaviors. For our evaluation, we use the static detection part, based on a reimplementation of Cujo provided by Fass et al. [19]. Zozzle [76] is a mostly static in-browser detection tool that uses syntactic information, such as identifier names and code locations, obtained from a JavaScript AST to identify malicious code. These features are input to a Bayesian classifier to label the samples as benign or malicious. We rely on a reimplementation of Zozzle provided by Fass et al. [20]. JaSt [97] is a static detector of malicious JavaScript that uses syntactic information from the AST to produce n-grams of sequential nodes to identify patterns indicative of malicious behavior. We use the implementation made available on the project’s GitHub page [17]. JStap [96] is a static malware detector that leverages syntax, control-flow, and data-flow information by creating an AST, a Control Flow Graph (CFG), and a Program Dependency Graph (PDG), depending on the configuration. The tool extracts features either by constructing n-grams of nodes or by combining the AST node type with its corresponding identifier/literal value. In our evaluation, we focus on the PDG code abstraction with both the n-grams and values feature extraction modes. We use the implementation available on GitHub [18]. 7.3.1.3 JavaScript Obfuscation Tools subsec:javascript-obfuscation-tools We compare Wobfuscator against four open-source state-of-the-art JavaScript obfuscation tools. JavaScript Obfuscator [107] is a JavaScript obfuscation tool that supports multiple obfuscation techniques including variable renaming, dead code injection, and control-flow flattening. Gnirts [11] focuses on mangling string 124 literals within JavaScript files. Jfogs [322] is an obfuscation tool that focuses on removing function call identifiers and parameters from call sites. JSObfu [152] is an obfuscator that supports converting function identifiers and string literals into expressions that evaluate to constants. This obfuscator also supports character escaping, whitespace removal, and more. All the experiments on a desktop containing an Intel Core i7 CPU@3.20GHz w/ 32 GB of memory running Ubuntu 20.04. 7.3.2 Effectiveness in Evading Detection (RQ1) subsubsec:effectiveness-our-approach To evaluate the effectiveness of Wobfuscator at evading static malicious JavaScript detectors, we compare the detectors’ performance on the original input programs against their performance after our obfuscation has been applied. Since the detectors classify each program as benign or malicious, the usual metrics of binary classifiers apply: precision and recall. Precision is the number of true positives (correctly identified malicious programs) divided by the number of all raised alarms (correct or not), and recall is the number of true positives divided by the number of all malicious programs in the dataset. That is P rec= T P T P +F P , Rec= T P T P +F N . A good malware detector should offer both high precision and high recall. Low precision indicates a high number of false positives, which would cause the system to block and break benign scripts and commonly used websites. Such a tool would not be adopted by actual users. Low recall means few of the actual malicious programs are detected, limiting the usefulness of the detector. The main goal of our obfuscation is to reduce the recall of detectors. Some detectors fail to parse some of the original and transformed code samples due to outdated or incomplete support of the JavaScript language. Since this is an implementation-specific detail of these detectors rather than a result of their detection methodology, we choose to exclude these samples from the count rather than mark them as false negatives. As a result, the denominators of the recall results differ depending on the detector and the applied transformations. 125 Table 7.2: Recall of Malware Detectors on Code Obfuscated by Wobfuscator. Lowest Recall in Bold. table:detection-results Technique Cujo Zozzle JaSt JStap (NGrams) JStap (Values) Baseline: No transformation 0.98 (5,548/5,649) 0.66 (3,598/5,453) 0.99 (5,076/5,108) 0.99 (4,483/4,524) 0.98 (4,439/4,524) Individual transformations: Sync, T1-StringLiteral 0.61 (1,623/2,644) 0.62 (3,387/5,453) 0.66 (3,393/5,108) 0.36 (1,539/4,257) 0.43 (1,839/4,257) Sync, T2-ArrayInitialization 0.94 (4,050/4,292) 0.66 (3,593/5,450) 0.85 (4,360/5,105) 0.86 (3,890/4,505) 0.89 (4,009/4,505) Sync, T3-FunctionName 0.67 (2,780/4,159) 0.65 (3,550/5,453) 0.69 (3,512/5,108) 0.57 (2,747/4,810) 0.72 (3,463/4,810) Sync, T4-CallExpression(a) 0.71 (3,040/4,285) 0.64 (3,507/5,453) 0.38 (1,943/5,108) 0.37 (1,723/4,633) 0.78 (3,613/4,633) Sync, T4-CallExpression(b) 0.58 (2,385/4,115) 0.63 (3,424/5,453) 0.44 (2,253/5,108) 0.23 (1,058/4,586) 0.73 (3,369/4,586) Sync, T5-IfStatement 0.82 (3,513/4,301) 0.64 (3,505/5,453) 0.89 (4,535/5,108) 0.83 (3,717/4,501) 0.93 (4,178/4,501) Sync, T6-ForStatement 0.90 (3,877/4,299) 0.66 (3,578/5,453) 0.92 (4,720/5,108) 0.87 (3,872/4,465) 0.98 (4,360/4,465) Sync, T7-WhileStatement 0.90 (3,904/4,321) 0.66 (3,598/5,453) 0.96 (4,882/5,108) 0.98 (4,410/4,502) 0.98 (4,412/4,502) Combined transformations: All sync (using T4(a)) 0.18 (416/2,255) 0.63 (3,450/5,450) 0.22 (1,104/5,105) 0.00 (1/4,235) 0.18 (766/4,235) All sync (using T4(b)) 0.19 (415/2,205) 0.63 (3,428/5,450) 0.18 (931/5,105) 0.00 (0/4,243) 0.08 (350/4,243) All async 0.28 (1,490/5,297) 0.65 (3,524/5,453) 0.20 (1,085/5,453) 0.00 (4/4,612) 0.22 (959/4,267) Results Table 7.2 shows the recall of the detectors described in Section 8.5.4.2 (columns) when run on code obfuscated by our transformations (rows). The first row gives each detector’s recall without our obfuscation, which serves as a baseline. The middle part of the table shows results from applying only one kind of transformation at a time. For example, the second row shows that applying our synchronous transformation technique T1-StringLiteral on the test set of malicious samples, Cujo achieves a recall of 0.61, i.e., a significant reduction compared to the baseline of 0.98. The results show that different translation techniques are more effective against some detectors rather than others. For each detector, the lowest recall score is bold-faced to reveal the best-performing individual transformation technique. For example, we find that T1-StringLiteral performs best for Cujo, Zozzle, and JStap in values mode, T4-CallExpression(a) performs best for JaSt, and T4-CallExpression(b) performs best for JStap in n-grams mode. Since each transformation rule is effective at reducing the recall for at least one detector, all transformation rules are integral to the effectiveness of our approach. We explain the reasons why some detection tools disfavor certain transformation rules over others. Cujo performs a lexical analysis on malware files, so it learns the suspicious features of strings that are indicative 126 0.98 0.66 0.99 0.99 0.98 0.18 0.63 0.22 0.00 0.18 0.00 0.20 0.40 0.60 0.80 1.00 Cujo Zozzle JaSt JStap (NGrams) JStap (Values) Baseline All sync (using T4(a)) Figure 7.4: Baseline Recall vs. Obfuscated Recall. fig:detection-recall of malware. Since T1-StringLiteral removes these identifiers from the file, Cujo is not able to find suspicious tokens in the file. Zozzle identifies malicious combinations of syntax features and the context they are used in. Since T1-StringLiteral removes the usage of malicious strings, such as shell code and dynamic code, malicious files that have the majority of their code leveraging these strings will have indicative features removed from their AST, allowing them to evade Zozzle. JaSt uses n-grams of AST syntax nodes (with a length of four by default) to detect malicious patterns. T4-CallExpression(a) replaces a single CallExpression node with multiple nodes handling the WebAssembly module instantiation. Since this translation adds more nodes than the length of sliding window, it can throw off the detection of known malicious n-gram patterns. JStap in its n-grams mode generates a program dependency graph (PDG) by adding edges representing data flow to the AST of an input file. Since T4-CallExpression(b) (and other transformations) route data flow through the generated WebAssembly module, it hinders the n-gram features that JStap can extract when performing its data flow analysis. JStap in its values mode is most affected by the T1-StringLiteral transformation since, in this mode, it relies on literals when selecting features to extract. T1-StringLiteral removes some literals. The lower part of Table 7.2 shows results from applying all transformation rules at once. We report results for three transformation combinations. “All sync (using T4(a))” and “All sync (using T4(b))” mean all synchronous transformations with T4(a) being used and with T4(b) being used, respectively. “All async” means all transformations in Table 7.1 (aside from T4(b)) in their asynchronous variant. We find that 127 Table 7.3: Recall of Malware Detectors on Code Obfuscated by Wobfuscator and Other Obfuscators. table:obfuscator-comparison Obfuscator Cujo Zozzle JaSt JStap (NGrams) JStap (Values) JavaScript Obfuscator 1.00 (4,406/4,415) 0.70 (3,807/5,453) 0.81 (4,153/5,108) 0.43 (2,005/4,717) 0.62 (2,947/4,717) Gnirts 0.98 (5,548/5,649) 0.66 (3,598/5,453) 0.99 (5,076/5,108) 0.99 (4,483/4,524) 0.98 (4,439/4,524) Jfogs 0.77 (3,515/4,562) 0.66 (3,584/5,453) 0.00 (26/5,453) 0.00 (16/5,025) 0.56 (2,826/5,025) JSObfu 1.00 (4,994/5,008) 0.84 (4,467/5,324) 0.29 (1,456/4,979) 0.01 (20/3,667) 0.66 (2,420/3,667) Wobfuscator (best recall) 0.18 (416/2,255) 0.62 (3,387/5,453) 0.18 (931/5,105) 0.00 (4/4,612) 0.08 (350/4,243) combining all transformation rules greatly reduces the recall of the detectors. In particular, with the “All sync (using T4(a))” set of transformations, Cujo, Zozzle, JaSt, JStap (NGrams), and JStap (Values) have a recall of 0.18, 0.63, 0.22, 0.00, and 0.18, respectively. Because of its performance and compatibility with the WebAssembly MVP language, we select “All sync (using T4(a))” to be the default configuration for Wobfuscator. Figure 7.4 visualizes the results for “All sync (using T4(a))”. 7.3.2.1 Comparison with Other Obfuscators subsubsec:obfuscator-comparison To demonstrate how Wobfuscator compares against currently available JavaScript obfuscators, we evaluate four obfuscation tools on the same dataset used in Section 7.3.2. We collect the precision and recall values obtained by the five malware detection tools when evaluated on a dataset obfuscated by each tool. Similar to Section 7.3.2, some detectors fail to parse certain obfuscated files, leading to different denominators in the values within the same detector column. Results Table 7.3 shows the recall values of the detection tools (columns) when run on code obfuscated by each of the four obfuscation tools described in Section 7.3.1.3 (rows). The last row shows the best recall values obtained by Wobfuscator. The results show that Wobfuscator outperforms current obfuscators when compared on the recall reduction of malware detectors. The only exception occurs when Jfogs is evaluated against JaSt. In this case, Jfogs’ recall rate of 0.00 outperforms Wobfuscator’s recall rate of 0.18. Jfogs’ obfuscation primarily replaces 128 Table 7.4: Correctness Validation Results. table:correctness-results Project Version LoC T1 T2 T3 T4(a) T4(b) T5 T6 T7 Total # of Tests # of Tests Impacted Validation of synchronous transformations: Lodash 5.0.0 21,178 193/801 160/208 0/0 87/467 299/467 47/187 0/0 38/61 408 322 Chalk 4.1.0 319 64/68 0/23 2/2 21/108 26/108 4/25 1/1 2/2 54 54 Commander 7.2.0 1,153 155/163 0/28 0/0 130/394 87/394 91/153 4/4 2/3 632 592 Debug 4.3.2 505 141/149 2/8 0/0 20/95 43/95 16/29 2/5 0/0 14 13 Async 3.2.0 787 30/57 0/21 0/0 89/209 108/209 31/86 5/6 5/8 675 659 Validation of asynchronous transformations: Node-Fetch 3.0.0beta.10 970 174/212 1/17 0/0 49/264 - 26/94 0/0 0/0 234 204 Total - 24,912 757/1,450 163/305 2/2 396/1,537 654/1,537 189/480 12/16 47/74 2,017 1,844 identifiers and literals with new intermediate variables, so Wobfuscator could be used to compliment Jfogs. For example, Jfogs moves string literals into variables, but it does not alter or remove the strings from the file. Using the T1-StringLiteral transformation, the string literals can be completely removed from the JavaScript file, reducing the syntactic information available to the detectors. 7.3.3 Correctness of the Transformations (RQ2) subsubsec:correctness-validation The transformations we apply change the syntactic structure of the program. Naturally, such changes could affect program semantics, potentially making the obfuscated program behave differently from the original program and thus breaking functional correctness. To validate that the correctness of the program is preserved, we leverage the comprehensive test suites of existing widely used JavaScript projects. We apply the transformations to the tested code and then validate if the transformed code still passes its tests. Results The results for the test suite runs are shown in Table 7.4. This table lists the tested project, its version, and the number of translation sites where rules T1–T7 are applied to. The last two columns list the total number of tests in the test suite, and the number of tests that are impacted by at least one transformation. All tests in each project pass successfully, showing that our obfuscations are semantics-preserving. 129 Columns 4-11 of Table 7.4 show the number of transformed code locations that meet the preconditions of the transformation out of the total number of available code locations relevant to the transformation, regardless of whether they satisfy the preconditions. The last two columns of Table 7.4 show that of the 2,017 unit tests in the five test suites, 1,844 of them (91.42%) rely on a function that is impacted by at least one transformation rule. The results show that our transformation rules are applicable to code locations used in the real-world. 7.3.4 Efficiency in Terms of Runtime and Code Size (RQ3) The Wobfuscator transformations we propose re-implement native JavaScript functionalities in WebAssembly modules, such as calling a function, performing a while loop, initializing an array, etc. As a result, there will be an impact on the performance of the translated programs. To quantify the performance impact of our transformations, we use the test suites of the six modules described in Section 7.3.3. In addition, the code size increase caused by the transformations is also analyzed, counting both added JavaScript and WebAssembly code. Table 7.5: Efficiency of Transformations. table:transformation-overhead Project Translation time Execution time Code size (bytes) LoC Time Original Overhead Original Overhead Synchronous Validation Lodash 21,178 29.58s 3.51s +25.81% 135,402 +139.84% Chalk 319 0.81s 4.03s +7.01% 14,935 +166.70% Commander 1,153 0.51s 3.97s +49.95% 74,269 +146.96% Debug 505 1.14s 0.61s +3.24% 21,395 +154.70% Async 787 5.42s 16.83s +2079.21% 28,925 +363.52% Asynchronous Validation Node-Fetch 970 16.03s 4.11s +14.76% 51,960 127.16% Average 4,152 8.92s 5.68s +31.07% 54,481 +170.42% 130 7.3.4.1 Translation Runtime First, we measure the time taken for performing the transformations on the project files. The times are measured with the time command available in Linux, averaged over ten repetitions. We compute the total transformation time of a project by summing the times to convert each JavaScript file used in the project. The transformation time results are presented in Table 7.5. The table shows that for the largest project, Lodash with 21,178 lines of code, the average time to apply all the synchronous transformations is 29.58 seconds. For the smallest project, Chalk with 319 lines of code, the average time to apply the transformations is only 0.81 seconds. In addition, we find that among all the projects in Table 7.5, the average time to apply all the transformations is only 8.92 seconds. These low transformation times show that Wobfuscator is practical for JavaScript obfuscation. 7.3.4.2 Execution Time Overhead The execution overhead time is the increase in runtime to complete the execution of the test suites of the transformed projects. We use the time command to measure the runtime of the project test suite before and after the transformations are applied, reporting averages over ten repeated measurements. The execution time results are presented in Table 7.5. Our transformations add a performance overhead that ranges from an increase of 3.24% to an increase of 2,079%. While the highest overhead number is large, it is important to note that this large runtime originates from one test within the async project that concurrently applies an asynchronous function to a collection of 1,048,576 elements. In most cases, it is unlikely that malware samples will follow such an execution pattern that incurs this large overhead. On average, Wobfuscator adds a performance overhead of 31.07%. 131 7.3.4.3 Code Size Overhead The code size overhead is the increase in code size between the original file and transformed output among all code files within a project. On average, applying all of the transformations among the project, the code size increased by 170.42%. Overall, the code size overhead is acceptable for practical applications. 7.4 Discussion This section discusses the limitations and possible mitigations to defend against Wobfuscator. 7.4.1 Limitations Wobfuscator targets malware detectors based on static analysis, and despite its effectiveness in bypassing them, is unlikely to be equally effective for dynamic analysis-based detectors. The transformations move some behavior into WebAssembly while leaving the ultimate runtime behavior intact. That is, a dynamic detector that, e.g., observes browser API calls made by a website will observe the same behavior with and without our obfuscation. However, in practice static detectors are much easier to deploy (e.g., as network proxies or browser extensions), whereas observing dynamic behavior is more complex to set up and expensive at runtime. Another limitation is that the approach applies transformations only to some of the given code. If a code location does not fulfill the preconditions for a specific transformation, then it cannot be transformed. Conservatively guarding transformations is crucial to ensure that our approach preserves the semantics of the given code, but also limits its applicability. Finally, our obfuscation relies on WebAssembly being available in the browser. With WebAssembly support in 94% of all installed browsers, this limitation is likely to be acceptable in practice. To ensure that the obfuscated malware runs as expected, an attacker could check for WebAssembly support and load the obfuscated code only if the language is supported. 132 7.4.2 Mitigations We discuss three mitigation strategies aimed at detecting malware despite our obfuscation. The first is dynamic analysis-based malware detection. Because our approach preserves the original JavaScript behavior, many runtime characteristics that dynamic detectors focus on [268] are not affected by the obfuscation. WebAssembly code invokes web APIs through JavaScript, which means the call will be visible to any runtime analysis that wraps the API functions or intercepts them within the browser. However, dynamic malware detectors often impose a non-negligible runtime overhead and may miss malware that hides its malicious behavior in specific configurations. The second mitigation strategy is based on the defender knowing the details of our obfuscation. Since the WebAssembly usage in the obfuscated code, e.g., loading many small modules, may be abnormal, it is possible to define heuristic rules to detect that Wobfuscator was applied. In a similar vein, one could include code obfuscated with our technique in the training data used to learn a malware classifier. The main drawback of these mitigations is that obfuscation does not imply maliciousness. There are legitimate reasons for obfuscating code, e.g., protecting intellectual property. Hence, classifying all code obfuscated by our technique, or any other obfuscation technique, as malicious is likely to cause an unacceptably high number of false positives. Finally, the third mitigation strategy is to jointly analyze JavaScript and WebAssembly. For detectors based on traditional program analysis, both static and dynamic, a joint analysis would reason about how data and control flows between the two languages. Likewise, learning-based detectors, such as those used in our evaluation, could feed code in both languages into their models. 133 7.5 Summary Much work has focused on identifying JavaScript malware using static analysis. However, these techniques ignore recent web standards available to attackers, namely WebAssembly. To bypass static detectors, we present Wobfuscator, an obfuscation approach built on a set of seven transformation rules that opportunistically translate specific parts of JavaScript code into functionally identical WebAssembly modules. We evaluate our transformations against four state-of-the-art static JavaScript malware detectors and show that our approach effectively reduces the recall on real malware samples. We show that our technique outperforms other obfuscation tools only based on JavaScript. Finally, we use the test suites of six NPM packages to validate the correctness of our transformations and show their low performance overhead. Our results show that current static detectors are ineffective against techniques that implement cross-language code obfuscation, motivating future work addressing this challenge. 134 Chapter 8 WAF, a WebAssembly Analysis Framework chp:waf-ir Chapters 3 to 7 outline current static program analysis techniques and the challenges that they encounter within the WebAssembly ecosystem. To circumvent these limitations, we introduce WAF, a WebAssembly Analysis Framework designed to enable novel WebAssembly optimizations, program comprehension analyses, and cross-language program analysis between WebAssembly and JavaScript. The IR is based on the multi-level IR for Java, Soot [310]. This framework designs three different intermediate representations, Baf, Jimple, and Grimp, of increasing syntactic richness and semantic complexity. Similarly, we design WAF to be composed of three different IR levels: Waf-Low, Waf-Mid, and Waf-High. Using these IRs, we aim to enable the following static program analyses for WebAssembly: • Novel Program Optimizations • Rich Binary Decompilation for WebAssembly Understanding • Simplified Cross-Language Program Analysis • Streamlined Malware Detection Analyses This dissertation’s author has implemented the framework described in this chapter. In addition, the author has performed all the experiments evaluating the framework along the discussed research questions. 135 8.1 Motivation and Contributions As we discuss in Section 1.2.2, there are challenges when trying to understand and analyze WebAssembly applications. Due to low-level features including its stack machine architecture and limited data types, there is a steep learning curve when adopting WebAssembly for developers, so developers may turn to program analysis techniques to gain more insight into their own or external programs. Indeed, static analysis approaches can prove useful here as they enable program inspection without having to execute the program. These approaches can help conceptualize WebAssembly at a higher level than through its terse, low-level instructions. Static analysis frameworks help users implement these analyses without having to start from scratch. However, there exists some limitations when trying to use currently available static analysis frameworks with WebAssembly. First, robust static analysis frameworks that exist for other binary/bytecode formats, such as Soot for Java bytecode [310], PhASAR for C++ [272], and JSAI for JavaScript [156], cannot be used with WebAssembly bytecode. These frameworks expect high-level properties on the code, such as managed memory objects, high-level classes, and string primitives, that are difficult to recreate in WebAssembly. These differences in WebAssembly hinder utilizing existing frameworks for robust static analysis. Second, there are several intermediate representations (IRs) for WebAssembly, such as LLVM IR [189], WABT IR [319], and Binaryen IR [337]. However, these IRs are designed for compilation from high-level languages to WebAssembly and remove semantic and contextual details in the process. For example, information related to loop constructs in C/C++ is lost when converting WebAssembly to these IRs. As a result, these IR formats model WebAssembly code with limited semantic depth. Static analyses benefiting from semantic details, such as binary decompilation, are hindered by this low semantic level. In addition, program optimizations built with these IRs are limited in the available contextual information of the targeted code, leading to misoptimizations [258]. 136 Third, existing works have proposed static program analyses [33, 178, 287, 285, 286] for WebAssembly. Namely, WasmA [32] and Wassail [286] are static analysis frameworks that provide limited control-flow and data-flow analyses for WebAssembly. While these frameworks provide useful static analyses such as control flow analysis and taint analysis, they do not aim to enrich the WebAssembly code with higher-level semantic details. As such, binary decompilation methods built atop these frameworks are limited in their readability and utility. Also, these frameworks cannot transform WebAssembly code, making them unsuitable for writing program optimizations. These frameworks cannot analyze data across WebAssembly and JavaScript, which is important as the two languages interact frequently in web applications. Finally, existing works has implemented dynamic WebAssembly program analysis. For example, the Wasabi framework [179] enables general-purpose dynamic analysis techniques, and several other works propose dynamic analysis techniques performing taint analysis [104, 291], concolic execution [206], and malware detection [327]. However, applying dynamic analysis to binary decompilation may suffer from limited coverage collection, high runtime overhead, and limited work on semantic recovery from execution traces. While dynamic analysis can be used to identify problematic code suitable for optimization, dynamic techniques cannot change the underlying binary to fix these issues at compile-time. To address these limitations in current static analysis approaches, we develop WAF, a WebAssembly Analysis Framework that aims to improve the performance and comprehension of WebAssembly applications. To the best of our knowledge, it is the first general-purpose static program analysis framework for WebAssembly that can serve in multiple applications, including program optimizations, binary decompilation, cross-language program analysis, malware detection, traditional compiler analyses, and WebAssembly-specific analyses. WAF is designed to take in standard WebAssembly bytecode as input, enable analysis on the code, apply transformations, and output standard WebAssembly bytecode. The core of our framework lies in our three intermediate representations (IRs), Waf-Low, Waf-Mid, Waf-High, that each model the WebAssembly module at a different semantic level. Specifically, Waf-Low aims to facilitate 137 conversion to our higher-level IRs by reducing the number of instruction types (e.g., removing data types and grouping logically distinct instruction sets). The goal of Waf-Mid is to simplify program analysis by abstracting away the WebAssembly stack machine with three-address code, allowing a variable’s value to be analyzed without having to trace the stack machine. Waf-High constructs a concise syntax that resembles a high-level language such as C or JavaScript by removing unnecessary statements (e.g., stack assignments and redundant variable assignments) where possible. We evaluate the ease of use for WAF against existing static analysis frameworks for WebAssembly. We also evaluate its ability to enrich semantic detail by using it for the tasks of binary decompilation, program optimizations, cross-language program analysis, and WebAssembly malware detection. Finally, we show that our framework achieves good runtime performance for use in a typical development environment. In summary, this chapter makes the following contributions: • We develop WAF, the first general-purpose static program analysis framework for WebAssembly that lifts the semantic detail of the language to a higher level. • Our framework can serve in multiple applications, including binary decompilation, program optimizations, cross-language program analysis, malware detection, traditional compiler analyses, and WebAssembly-specific analyses. • We evaluate the effectiveness of WAF in the task of binary decompilation against three state-of-the-art WebAssembly decompilation tools and show that WAF is able to improve readability over these tools. • We demonstrate using WAF to implement program optimizations that benefit from the contextual information reconstructed by our IRs. We find these optimizations can produce noticeable runtime improvements in select WebAssembly programs. • We show WAF is performant, generating each IR in as little as 162.89 milliseconds on average. 138 Input Wat File WAF Wasm Detail Parser Output Modified Wat File Get (global) 0 Set 1 Const 16 Set 2 Get 1 Get 2 Sub $$s0 = $global_0 $local_1 = $$s0 $$s0 = 16 $local_2 = $$s0 $$s0 = $local_1 $$s1 = $local_2 $$s0 = $$s0 - $$s1 Waf-Low Waf-Mid Waf-High $global_0 - 16 1. Remove implicit stack 2. Group instructions 1. Remove stack assignments 2. Consolidate locals 1. Unfold compound expressions 2. Add stack assignments 1. Remove assignments Figure 8.1: WAF Overview. fig:waf-workflow 8.2 Design To address these challenges for WebAssembly static analysis techniques, we design WAF, a program analysis framework designed to facilitate the analysis and optimization of WebAssembly applications. WAF is designed to take in standard WebAssembly bytecode as input, enable analysis on the code, apply transformations to the code, and output the transformed code as standard WebAssembly bytecode. To construct a comprehensive analysis framework, we design three levels of IRs, Waf-Low, Waf-Mid, and Waf-High, that each represent WebAssembly at a different semantic level for different purposes. The workflow of WAF is shown in Figure 8.1. We describe these three IRs in the following sections. We also exemplify the transformation between the levels through the code snippet presented in Figure 8.2. This snippet presents C source code implementing [61] the SHA-256 algorithm [100]. 8.2.1 Waf-Low To efficiently transform WebAssembly code to and from our framework, we develop the Waf-Low intermediate representation. Each WebAssembly instruction is parsed into the IR as internal WafInstruction objects listed in Table 8.1. As IR in the Figure 8.2c shows, Waf-Low closely follows the design of the WebAssembly text format and keeps several properties from the standard WebAssembly bytecode, such as the virtual stack machine. However, Waf-Low differs from WebAssembly bytecode in several key factors, namely: (i) removing instruction data types, (ii) grouping instructions into semantically-similar subsets, (iii) removing 139 Table 8.1: WAF IR Units for Waf-Low, Waf-Mid, Waf-High. table:waf-units IR Level Instruction Group IR Units Low Numeric Const, CountZeros, PopCnt, Add, Sub, Mul, Div, Rem, And, Or, Xor, Shl, Shr, Rotl, Rotr, Abs, Neg, Srt, Ceil, Floor, FloatTrunc, Nearest, Min, Max, CopySign Memory MemorySize, MemoryGrow, MemoryFill, MemoryCopy, MemoryInit, DataDrop, Load, Store, Control Nop, Unreachable, Block, End, Loop, If, Else, Br, Br_If, Br_Table, Return, Call, Call_Indirect Logical Eqz, Eq, Ne, Lt, Gt, Le, Ge Conversion Extend, Wrap, Trunc, Demote, Promote, Convert, Reinterpret Vector Any_True, All_True, Pmin, Pmax, BitSelect, Extract_Lane, Replace_Lane, Splat, Shuffle, Swizzle, Narrow, Add_Sat, Sub_Sat, Avgr, Q15MulR_Sat, ExtMul, ExtAdd_Pairwise, Dot, LoadZero, LoadSplat, LoadLane, StoreLane Table TableGet, TableSet, TableSize, TableGrow, TableFill, TableCopy, TableInit, ElemDrop, Parametric Drop, Select Reference Null, IsNull, RefFunc Variable GetVar, SetVar, TeeVar Mid Expression StackConst*, IfResultVariable*, LocalGlobalVariableReference*, FunctionParamReference*, MemoryAddressReference*, StackOperation* Assignment VariableAssignment, MemoryAssignment, StackAssignment Scoping BlockStatement, LoopStatement, IfStatement Parametric DropStatement, SelectExpression* Control CallExpression*, CallIndirectExpression*, NopStatement, UnreachableStatement, ReturnStatement, BranchStatement, BranchIfStatement, BranchTableStatement Memory MemorySizeExpression*, MemoryGrowStatement, MemoryFillStatement, MemoryInitStatement, MemoryCopyStatement, DataDropStatement Table TableGetExpression*, TableSetStatement, TableSizeStatement, TableGrowStatement, TableFillStatement, TableInitStatement, TableCopyStatement, ElemDropStatement Reference IsNullExpression*, RefFuncExpression*, NullExpression* High Control BreakStatement, ContinueStatement, SwitchStatement Specialized Loops WhileStatement, DoWhileStatement, ForStatement 140 void sha256_update(SHA256_CTX *ctx, const BYTE data[], size_t len) { WORD i; for (i = 0; i < len; ++i) { ctx->data[ctx->datalen] = data[i]; ctx->datalen++; if (ctx->datalen == 64) { sha256_transform(ctx, ctx->data); ctx->bitlen += 512; ctx->datalen = 0; } } } (func $sha256_update (type 7)(param i32 i32 i32) (local i32 i32) block ;; label = @1 local.get 2 i32.eqz br_if 0 (;@1;) i32.const 0 local.set 3 loop ;; label = @2 local.get 0 local.get 0 i32.load offset=64 i32.add local.get 1 ... local.get 2 i32.ne br_if 0 (;@2;) end end) a) C Source Code b) WebAssembly Text (func $sha256_update (type 7) (local i32 i32) block__1: { $$s0 = $local_2 $$s0 = $$s0 == 0 branch_if [0] ($$s0) $$s0 = 0 $local_3 = $$s0 loop_@2: { $$s0 = $local_0 $$s1 = $local_0 $$s1 = [$$s1 + 64] $$s0 = $$s0 + $$s1 $$s1 = $local_1 ... $$s0 = $local_4 $$s1 = $local_2 $$s0 = $$s0 != $$s1 branch_if [0] ($$s0) } }) d) Waf-Mid (func $sha256_update (type 7) (local i32 i32) Block label=@1 Get 2 Eqz Br_If 0 Const 0 Set 3 Loop label=@2 Get 0 Get 0 Load offset=64 Add Get 1 ... Get 2 Ne Br_If 0 End End) c) Waf-Low (func $sha256_update (type 7) (local i32 i32) block__1: { branch_if [0] ($local_2 == 0) $local_3 = 0 loop__2: while($local_4 != $local_2)){ [$local_0 + [$local_0 + 64]] = [$local_1 + $local_3] $local_4 = [$local_0 + 64] + 1 [$local_0 + 64] = $local_4 block__3: { branch_if [0] ($local_4 != 64) $sha256_transform($local_0, $local_0) [$local_0 + 64] = 0 [$local_0 + 72] = [$local_0 + 72] + 512 } $local_4 = $local_3 + 1 $local_3 = $local_4 } }) e) Waf-High class Const { DataType: i32 Value: 0 } class StackAssignment { StackHeight: 0 DataType: i32 Value: StackOperation { Operation: Eqz ValuesConsumed: [$$s0] } } class WhileStatement { BlockLabel: '@2' ScopeDepth: 2 ResultType: [] Condition: ($local_4 != $local_2) InnerStatements: [ VariableAssignment, MemoryAssignment, ... ]} 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 … 51 52 53 54 55 1 2 3 4 5 6 7 8 9 10 11 12 13 14 … 51 52 53 54 55 1 2 3 4 5 6 7 8 9 10 11 12 13 14 … 53 54 55 56 57 58 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Figure 8.2: sha256_update Processed with WAF. For the WebAssembly text shown in Figure 8.2b, the code is first processed into Waf-Low, shown in Figure 8.2c. In this level, the types have been removed and some instructions are consolidated. The IR is then lifted to Waf-Mid, where the implicit stack operations are made explicit by introducing stack, variable, and memory assignments. Finally, the IR is lifted to Waf-High, which introduces complex expressions and new semantic constructs not native to WebAssembly. The output resembles the source code text shown in Figure 8.2a for easier program comprehension. fig:running-ex-waf 141 the call_indirect instruction, and (iv) removing implicit function returns. These changes condense 437 WebAssembly instructions into 99 Waf-Low units. 8.2.1.1 Removing Data Types from WebAssembly Instructions sec:waf-low-remove-data-types The first change that Waf-Low IR makes on standard WebAssembly bytecode is removing the data types from the instructions. For example, the WebAssembly instructions i32.const and f64.const push constant values of two different data types, i32 and f64. To simplify analysis, we remove the restriction of a single data type on the instructions and replace these two instructions with a single Const unit, as shown in shown in Line 7 ofFigure 8.2c. As the figure shows, the data type information is stored in the operands rather than the instruction, so the data type of the instruction can be retrieved when needed. Type removal enables several instructions to be replaced by one of our WafInstructions. 8.2.1.2 Grouping Semantically-Similar Instruction Sets The second change consolidates the WebAssembly instructions into a smaller set of logically-similar instruction groups. Several instructions have multiple variants, e.g., the i32.store8 and i32.store16 variants of the i32.store instruction that specify different memory sizes to store. To handle variants of i32.store, we create one Store WafInstruction with a memory size field within the unit to represent all variants. We create similar fields for units of instructions that also have variants. This process groups instructions and their functionally similar variants together. 8.2.1.3 Removing the call_indirect Instruction The call_indirect instruction performs indirect calls whose target function is only known at runtime, similar to function pointers in C/C++, and is difficult to analyze statically. Unlike other languages, all potential indirect call targets in WebAssembly must be listed in the module’s function table. As a result, we can replace call_indirect with multiple call instructions for every possible function table target. 142 Get 0 Load offset=32 Const 3 And Call_Indirect 1 Const 1 Eq Get 0 Load offset=32 Const 3 And Const 1 Eq Get 4 Const 0 Eq If Call 36 End Get 4 Const 1 Eq If Call 65 End ... (a) Before Call_Indirect Removal (b) After Call_Indirect Removal Figure 8.3: Example of call_indirect Instruction Removed. fig:waf-low-call-indirect-example Figure 8.3 shows an example of the call_indirect removal. Figure 8.3a shows a Call_Indirect unit, and Figure 8.3b shows the unit replaced by new If blocks that explicitly check the indexing runtime value against each applicable table index. Each If block contains a Call unit for the function at the table index. By replacing call_indirect with its potential targets, we can statically analyze each potential function that could be invoked by the original instruction. 8.2.1.4 Removing Implicit Function Returns If a function type specifies a result value and the function body does not end with a return, WebAssembly will implicitly use the top stack value as the function result value. We find that these implicit function returns complicate dependency analyses in our higher-level IRs. To simplify analysis, we add a return instruction to the end of a function if it implicitly returns a function value. Adding this instruction converts the implicit returns into explicit returns. 8.2.2 Waf-Mid Our second level of IR, Waf-Mid, is built atop Waf-Low. Unlike Waf-Low, the focus of this IR level is to abstract away the stack machine from the WebAssembly code and replace it with a three-address code format. We also further condense the 99 Waf-Low units into 39 Waf-Mid units presented in Table 8.1. The 143 Waf-Mid format, exemplified in Figure 8.2, eases program analysis by allowing a variable’s value to be analyzed without the hassle imposed by tracing the stack machine. For example, on Line 13 of Figure 8.2d, we can see that the value $$s0 used in the addition points to the last assigned value, local_0. Meanwhile, on Line 13 in Figure 8.2b, we would collect the operands of the i32.add instruction by unwinding the previous instructions on Lines 12, 11, and 10 until we collect two stack values. This process becomes tedious when inspecting multiple locations. We describe our process of abstracting away the virtual stack in the following section. 8.2.2.1 Abstracting Away Implicit Stack through Symbolic Execution We use symbolic simulation techniques to emulate the execution of WafInstructions within a function. In Waf-Mid, we create WafStatement objects to represent each three-address code statement. Additionally, we create WafExpression objects that are part of other WafStatement, such as Const units. Our process for symbolically executing Waf-Low is as follows. For each WafInstruction, we define a function, fstack, that describes the stack effects on the current state of the virtual stack. These stack effects include operands popped from the stack and computed results pushed to the stack. fstack is defined based on the type of a WafInstruction: 1. For WafInstructions that have no effect on the virtual stack (e.g., br), fstack performs no actions. 2. For WafInstructions that use local variables, global variables, or linear memory values, fstack pushes symbolic references to these values onto the virtual stack. To simplify the higher-level IR construction, we use an alive variable definition set to track variable set instructions and memory assignments so that they contain an identifier indicating the variable uses refer to the same variable assignment. By doing so, we can easily track what value within the history of variable sets a particular reference is pointing to. 144 3. For the Const instruction that pushes known constant values to the stack, fstack constructs a stack constant reference to push to the virtual stack. We symbolically simulate each function by initializing a virtual stack V0 and applying fstack for each of its Waf-Low units of the function on the virtual stack V0. During the execution, we use the symbolic stack values affected by fstack to create three-address code statements. Assignments are WafStatements with a value being assigned (i.e., a right-hand side) and a location being assigned to (i.e., a left-hand side). For example, if the virtual stack of an Add instruction consists of stack values {s0, s1} then we can represent this instruction with the StackAssignment (a subclass of Assignment) s0 = s1 + s0. Line 5 in Figure 8.2d ( $$s0 = 0) shows an example of a StackAssignment explicitly denoting a comparison operation. Iterating over the WafInstruct- ions produces a series of Assignments and other WafState- ments that represent all stack values explicitly with stack variables rather than implicitly through stack operands. 8.2.2.2 Handling Scoping Statements Some WebAssembly control instructions, such as loop and if, are structured, i.e., they enclose a block of instructions in a new scope. We model these structured instructions in the following ways: 1. For structured instructions block, loop, and if, we initialize a field, InnerStatements, within their IR constructs, i.e., BlockStatement, LoopStatement (shown in Line 9 of Figure 8.2d), and IfStatement, to track statements in their scope. 2. For instructions defined within an else instruction’s scope, we define the field ElseStatements within IfStatement units to track these instructions as else instructions must always be paired with a corresponding if instruction. 145 3. These scoping instructions can also specify a result type that will be at the top of the stack after it and its inner statements are executed. We push symbolic references that represent results from scoped instructions to the stack when simulating execution. 8.2.3 Waf-High Our third level of IR, Waf-High, is built on top of Waf-Mid and shares WafStatements with this lower IR level. As shown in Table 8.1, Waf-High adds six additional units to the 39 Waf-Mid units for a total of 45 units. The main goal of Waf-High is to present the IR units into a concise syntax that resembles a high-level language such as C or JavaScript. This is done by applying four transformations to the units produced by Waf-Mid. The first two transformations reduce the syntax of the IR by removing all stack variables and consolidating local variable uses. The second two transformations reformat the IR and introduce new semantic constructs not native to WebAssembly in order to represent useful semantic information. We describe these transformations in depth. 8.2.3.1 Building Control Flow Graphs sec:control-flow-graph-design We rely on control flow graphs (CFGs) to lift Waf-Mid to Waf-High. To construct the CFGs, we traverse the Waf-Mid statements of a function. Every statement becomes nodes in the control flow graph, and the first statement acts as the starting node of the graph. Directed edges are added between consecutive statements to represent the normal flow of execution. When we encounter Branch-, BranchIf-, or BranchTableStatements, we add additional branching edges pointing to the next statement that would be executed if the branch is taken. Unlike goto in C, the flow of execution does not go to the beginning of labeled code block. Instead, the execution continues at the end of the block, so the edge points to the following statement after the targeted ScopingStatement. The exception to this flow is LoopStatements 146 $$s0 = $local_4 $$s1 = $local_5 [$$s0 + 28] = $$s1 [$local_4 + 28] = $local_5 (a) Before StackAssignment Consolidation (b) After StackAssignment Consolidation Figure 8.4: Example of StackAssignments Uses Being Replaced by Their Values. fig:waf-mid-stack-consol where the flow does continue at the beginning of the block to repeat execution, forming a cycle in the graph. 8.2.3.2 Building Statement Dependency Graph To make Waf-High concise, we remove as many intermediate statements from Waf-Mid as possible, such as stack assignments and redundant local variable assignments. To do this safely, we must ensure that the semantics of the original program are not altered after removing statements. For each function, we construct a dependency graph on the IR units of Waf-Mid. For example, if a WafStatement assigns a constant value to a variable, then in the dependency graph, the node representing the WafStatement has edges representing dependencies on the nodes for the constant value and the variable reference. By keeping track of which statements are dependencies of others, we can safely remove statements that are not the dependencies of other statements. 8.2.3.3 Consolidating Stack Assignments We first use the dependency graph to remove all StackAssignment units. For statements that have dependencies on a stack assignment, we remove the dependencies by replacing the stack assignment’s usage with its assigned value, i.e., its right-hand side. This process is demonstrated in Figure 8.4. The execution and validation schemes of WebAssembly ensure that all stack variables are used within some other instruction, dropped from the stack, or used as function result values, so we can safely substitute stack variable assignments with their assigned values. As such, this change can significantly reduce the number of WafStatements. 147 8.2.3.4 Consolidating Local Variable Assignments Using Alive Variable Definition Set Similar to stack assignments, we can also remove some local variable assignments by replacing usages of local variables with their active assigned value. To replace the local variables, we traverse over the WafStatement units of a function. We maintain a variable definition set to store the last assigned value of a local variable. When we encounter local variable assignments, we update the current value in this set with the value specified in the assignment. When we encounter a local variable usage within other IR units, we replace this reference with the local’s value in the definition set. 8.2.3.5 Converting Block Statements into If/Else Statements subsubsec:block-to-if WebAssembly uses the if and else instructions to implement conditional statement functionality. It also uses br and br_if instructions to implement jump and conditional jump functionality, which is similar to the goto keyword in C. As conditional statements are more-easily understood by developers than jumps, we seek to convert suitable BlockStatements to If-/ElseStatements. To convert the statements, we first construct a control-flow graph for a function. We perform a depthfirst traversal on the CFG and form a copy of the IR by appending statements as they are encountered. When we encounter a node with two outgoing edges (i.e., a Branch- or BranchIfStatement), we create a new IfStatement and add statements to the InnerStatements and ElseStatements fields as they are encountered along the traversal paths of the two outgoing edges. The path stops either when there are no more edges or when a node contains more than one incoming edge, which indicates a statement reachable by both paths, i.e., a statement after the block. We replace the original two-edged node with the new IfStatement in the CFG. This process removes Branch-, BranchIf-, and BranchTableStatements. We can then remove any unused BlockStatements by counting the remaining statements targeting the BlockStatements and replacing those with no targets by their InnerStatements. 148 8.2.3.6 Converting Loop Statements into For/While/Do-While Loop Statements subsubsec:loop-to-do-while In WebAssembly, loop statements performing repeated execution of code blocks are implemented using loop. To improve the readability of these loop statements in Waf-High, we convert these loop blocks into looping statements available in JavaScript: while, do-while, and for loops. We use the same control flow graph construction process as described in Section 8.2.3.5. We identify any cycles formed in the CFG as looping statements and store them in units representing the three different looping statement types. The appropriate loop type to use is determined by the locations of the branching statements that jump either out of the loop or to the beginning of the loop. We identify different patterns among these statements that allow us to transform a loop statement into a While-, DoWhile-, or For-Statement IR unit. In Figure 8.2e, we can see that the LoopStatement in Waf-Mid is lifted to a WhileStatement with the iteration condition added. 8.2.4 Downward Conversions Between IR Levels After any transformations are applied, each IR level must also be able to lower back down to semanticallylower levels. This process is necessary to output a usable WebAssembly text or binary module. In this section, we describe this downward conversion process between IR levels. 8.2.4.1 Waf-High to Waf-Mid In order to lower between the two IR levels, two main changes must be undone. First, the assignment consolidations must be unfolded. We can do this by traversing each WafStatement and moving nested statements units into their own StackAssignments. Second, the For-, While-, DoWhileStatements created from the LoopStatement conversions (Section 8.2.3.6) need to be changed back into their original statement type. To remove them, we can convert these units to LoopStatements units as they share many of the same 149 fields, and we can move any conditional units back into the InnerStatements field of new unit. Waf-High shares its remaining IR units with Waf-Mid, so no additional changes are needed. 8.2.4.2 Waf-Mid to Waf-Low We lower Waf-Mid to Waf-Low by handling three cases within the conversion process. First, to lower StackAssignments, we have to their Value fields, which contains WafExpressions. Each WafExpression has a corresponding WafInstruction, so we map one unit to another. Second, we lower the ScopingStatements themselves, and their InnerStatements fields are traversed to lower each unit to a Waf-Low unit. Third, all other WafSta- tement units have corresponding Waf-Low units. The conversion process for these units involves transferring over the fields of the WafStatement to the appropriate fields of the corresponding WafInstruction. 8.2.4.3 Waf-Low to WebAssembly Text Since Waf-Low is similar to WebAssembly text, each WafInstruction unit can be converted down to a corresponding WebAssembly instruction. The data type prefixes can be recovered through fields in the WafInstruction unit. Similarly, some WafInstruction units contain additional fields that allow them to recover different variations of an instruction. For example, the Load unit contains SignInterpretation and MemSize fields that allow it to represent multiple instructions, e.g., i32.load8_s, and i32.load16_u. 8.3 Applications of WAF sec:applications-of-waf We design of our WAF framework to serve in several applications. We use WAF to implement binary decompilation tools, program optimizations, cross-language program analysis, WebAssembly malware detection, traditional compiler analyses, and WebAssembly-specific program analyses. 150 8.3.1 WebAssembly Binary Decompilation subsec:waf-binary-decompilation WebAssembly programs are compiled from high-level source languages, such as C, C++, and Rust. The resulting binary modules are distributed to users. Without the original source code of a WebAssembly module, understanding the functionality of the module is a difficult and tedious task. Binary decompilation can help unfamiliar developers understand the code by converting it into a familiar language, such as C. Ideally, the decompilation should not only convert the instructions into valid C syntax but also recover semantic information that simplifies understanding the code. Leveraging the semantic lifting ability of the Waf-High, we implement a binary decompilation tool that can allow printing WebAssembly as C code. The generated C output string is compatible with standard C compilers, such as GCC [295] and Clang [56]. This conversion tool should help developers who are unfamiliar with WebAssembly code and need to understand the functionality of an unknown WebAssembly dependency. For each statement, we implement a .toCLikeString method that will print the Waf-High IR unit out into C-compatible string. To represent memory accesses from WebAssembly, we insert a global MEM variable into the C-like string and print the memory accesses as array accesses. To access memory, table, data, and element statement functions, we represent them as function calls to methods of global MEMORY, TABLE, DATA, and ELEM structures. 8.3.2 Program Optimizations subsec:waf-program-optimizations In addition to analysis, WAF can enable WebAssembly code transformations, such as those needed for program optimizations. Previous work has found that traditional program optimizations may introduce counterintuitive runtime effects into WebAssembly modules [258]. To showcase the code transformation abilities of WAF, we implement two optimization passes that can mitigate the performance degradation caused by these counterintuitive passes. 151 8.3.2.1 Hot Code Extraction subsubsec:application-hot-code-extract We use WAF to identify “hot code” within existing functions. Hot code is critical code where most of a function’s (and possibly the entire program’s) runtime execution is spent. When hot code is inlined into long-running functions or functions invoked only once, counterintuitive performance impacts can occur [258]. As a result, our optimization pass aims to extract this hot code into a separate function. To perform this process, we use our Waf-Mid IR to identify looping statements that are called within hot functions, i.e., the parent function itself is called within a different loop. Once these loops are identified, we extract them out into a separate function. We replace the old loop site with a new CallExpression. We also insert a new function type into the WebAssembly Types section if needed. 8.3.2.2 Loop Re-Rolling subsubsec:application-loop-rerolling Another program optimization that we implement aims to reverse the effects of loop unrolling. For the same reasons as function inlining, we find that the expanded function size caused by loop unrolling can prevent the browser from switching the code generated by the two WebAssembly compilers during execution. To counteract this traditional optimization, our optimization pass condenses repeated series of instructions into one looping statement. This pass operates at the function level and begins by analyzing the root Waf-Mid IR scope. To simplify the search for repeated sequences, we map the sequence of IR unit types into a string of ASCII characters. We then search all possible consecutive substrings of the sequence string to find the most repeated substring, if any. For multiple repeated substrings with the same count of repetitions, we choose the substring that appears earliest in the string. For the identified repeated substring, we analyze the IR units of the first and second repeated instances to identify fields in the IR units that change across repetitions. We then introduce a ForStatement unit to wrap the units of the first repeated instance, and we modify the unit’s fields to vary based on the new loop counter. Finally, we splice the IR sequence to include the new loop 152 statement and remove the proceeding repetitions from the sequence. To identify all applicable loop sites, we also recursively traverse the InnerStatements ofScopingStatements as they may also contain repeated unit sequences that can be re-rolled into a single loop. 8.3.3 Analyzing Cross-Language Obfuscation subsec:waf-cross-language WebAssembly applications are expected to frequently share data and control-flow with JavaScript during the course of their execution. As a result, WebAssembly applications are typically cross-language applications. Novel obfuscation techniques,e.g., Wobfuscator (Chapter 7), leverage the cross-language nature of WebAssembly applications and the lack of WebAssembly analysis tools to hide certain parts from JavaScript-based detectors, i.e., malware detectors. These obfuscation techniques can move data literals, function calls, conditional statements, etc... from JavaScript to WebAssembly to avoid any static detectors analyzing the JavaScript files. This cross-language obfuscation necessitating the need for suitable cross-language analysis. WAF can use Waf-High to enable such an analysis. The syntax of the Waf-High is much closer to a high-level language such as JavaScript compared with the syntax of the other IR levels. This property makes it easy to lift the Waf-High syntax to a JavaScript-like syntax that is compatible with existing JavaScript parsers, e.g., Esprima [151]. Using Waf-High, a WebAssembly-to-JavaScript translation can be done to standardize a cross-language WebAssembly application to JavaScript. This translated application can then be analyzed using existing JavaScript techniques to infer properties of the original cross-language application. To analyze cross-language obfuscation, if a JavaScript file is obfuscated to hide parts of its logic in WebAssembly, we can first generate the Waf-High IR for the module. Then, for each called export functions in the JavaScript file, we generate an Esprima-compatible string of the function’s IR. We then replace each export call in the JavaScript file with the function IR string. This transformed JavaScript file is fully 153 compatible with JavaScript parses, so it can leverage any existing JavaScript detection techniques. This approach can help mitigate the dangers posed by cross-language obfuscation. 8.3.4 Malware Detection subsec:waf-malware-detection In addition to targeting next-generation cross-language malware, we also design our IRs to be used for the detection of malware implemented in WebAssembly only. Since we implement call graph, control-flow, and data-flow analyses, our IRs enables traditional program analysis techniques for malware detection. We plan to showcase the ability of WAF to implement a detection technique for the most common malicious use case of WebAssembly modules: cryptojacking. By combining the information given by our three different intermediate representations, malware detection techniques can be constructed that go beyond WebAssembly binary analysis. To highlight the flexibility of WAF IR, we can implement an existing WebAssembly malware detection algorithm using Waf-High. We re-implement the MinerRay detection algorithm described in Algorithm 1. This algorithm identifies the critical hashing steps of cryptominers implemented in WebAssembly (Section 6.3.3). WAF contains most the units contained by the MinerRay IR, so we can port over the existing algorithm into the new IR. After this process, we still need handle one missing IR unit, MemCpy. Since WAF is designed to be a general-purpose framework, we can use the analyses available to implement a MemCopyAnalysis class in WAF. This analysis can identify can repeated memory accesses that either copy over two memory locations or store a memory literal into a memory region. With this analyses, we have all the tools to implement Algorithm 1. 154 8.3.5 Traditional Compiler Analyses To showcase the flexibility of WAF, we implement commonly-used program analyses for WebAssembly programs. Specifically, we implement analysis classes that enable control-flow analysis, live variable analysis, and reaching definition analysis. 8.3.5.1 Control-Flow Analysis A control-flow analysis describes the flow of control through the program’s execution. A specialized data structure, the control-flow graph (CFG), is used to represent and query properties on the control flow. We implement a control-flow analysis class in WAF using our Waf-Mid IR that generates the control-flow graph for a given function. The control flow of a WebAssembly function is very well-defined through the WebAssembly control instructions, e.g., call, if, br, etc... As such, the CFG can be constructed by traversing the Waf-Mid and representing the changes in control flow by control instructions through edges in the CFG, as described in Section 8.2.3.1. 8.3.5.2 Live Variable Analysis A live variable analysis identifies the active variables at a given location within the code and is typically used in compilers to consolidate variable and register uses. We perform a live variable analysis on a function by constructing a CFG and traversing it using depth-first search. We maintain variable definition sets that represent variables defined up to each point of execution. When a new variable is defined, a new definition set is created and associated with the following statements until another new variable set is constructed. 8.3.5.3 Reaching Definition Analysis Throughout function execution, operations can change the values that variables can take on. An operation that sets a variable’s value is said to define that variable, and the definition that is read at a given point is 155 said to reach the given code location. A reaching definition analysis identifies all the reaching definitions for the active variables at a given code location. We implement such a reaching definition analysis class leveraging the VariableAssignment units of Waf-Mid. 8.3.6 WebAssembly-Specific Analyses WAF is designed to handle the unique characteristics of WebAssembly code. We implement two analysis classes that report metrics on function properties that can enable further optimization. 8.3.6.1 Global Variable Usage Analysis WebAssembly functions can access global variable values that are set in a different location within the WebAssembly module or even within the host context, i.e., JavaScript. Global variable accesses in functions can produce side-effects on the module state when invoked through another function. This usage prevents functions from being “pure” function [217]. Identifying pure functions within a program can help reduce bugs and enable additional optimizations on the function [160]. As such, it is useful to gather properties when global variables are used in a function, such as global variable retrievals and stores. 8.3.6.2 Static Memory Size Analysis The WebAssembly standard provides mechanisms to increase or decrease the size of a module’s linear memory. These alterations to the linear memory can alter existing memory layouts if they are made under the assumption of a static memory size. Bugs can arise in WebAssembly applications if the linear memory is shared between JavaScript and WebAssembly but WebAssembly resizes the linear memory, leading to a stale data mapping [260, 164]. For this reason, we implement a static memory size analysis class that scans all of the functions within the module to identify if any include code to alter the linear memory size. This analysis can be leverage with advanced bug detection approaches for WebAssembly bugs. We scan the Waf-Mid IR of each function for MemoryGrow units and report any functions that contain these units. 156 8.4 Implementation and Usage 8.4.1 Implementation We develop a prototype of our framework as a TypeScript project [145] designed to run on Node.js [102]. We choose TypeScript as it is a popular language among web developers, which is a likely audience to encounter WebAssembly modules. We leverage the Graphology library [244] as the underlying graph data structure of our control flow analysis. 8.4.2 Usage WAF is designed to be used as either a standalone CLI tool or as a library in other applications. Our CLI tool can be run with Node.js. In this prototype, we provide commands to generate and print out the different WAF IR texts for a given WebAssembly file. Each command takes in a file path pointing to a WebAssembly text file to process. To use WAF as a library, we provide classes to process a WebAssembly text (.wat) file through our IRs. To start, we define the method readWatFile(file_path) in our WasmModuleDetails class that parses a WebAssembly text file into a basic IR containing information on the different sections of the module. Within this class, the field InternalFunctions contains FunctionDetails objects, which are representations of each function defined within the module. These objects will also contain the different WAF IRs when they are generated. Once the basic IR is created, the WasmModuleDetails class contains another method, generateWafLowIRUnits(), to start the Waf-Low generation process. Once this method is run, the Waf-Low IR units are available in the WafLowIR field of the FunctionDetails objects. Another function, generateWafMidIRUnits(), can construct the Waf-Mid IR units for each function in the module. Similarly, the resulting IR will be available in the WafMidIR field of the FunctionDetails objects. Finally, the method generateWafHighIRUnits() on the WasmModuleDetails class will generate the Waf-High units and store them in the WafHighIR field of the FunctionDetails objects. Each of our analyses and transformations 157 are implemented as their own classes. The static methods analyzeModule and analyzeFunction are defined for each analysis. The static methods transformModule and transformFunction are defined for each transformation. 8.5 Evaluation We evaluate WAF and its ability to serve in different applications through the following research directions: • RQ1 - Practicality: How practical is WAF for implementing analyses? • RQ2 - Decompilation: Does WAF reconstruct useful semantic details for the task of WebAssembly binary decompilation? How complex and readable is the decompiled code? How does it compare with existing state-of-the-art decompilation tools? • RQ3 - Optimization: Besides analysis, how does WAF perform when transforming WebAssembly code for the task of program optimization? • RQ4 - Cross-Language Analysis: How well can WAF raise the semantic level of WebAssembly code to analyze data flow between JavaScript and WebAssembly? How well does WAF do for the task of cross-language malware detection? • RQ5 - WebAssembly Malware Detection: Is WAF flexible enough to support implementing existing WebAssembly static analyses. Can WAF implement a malware detection analysis? • RQ6 - Overhead: What is the runtime overhead of the WAF framework? 8.5.1 RQ1: Practicality of WAF subsec:framework-ease-of-use The goal of WAF is to make a static analysis framework available for developers to implement various analyses on WebAssembly applications. As such, we evaluate how much coding effort is required to 158 Table 8.2: Source Lines of Code (SLOC) for Analyses in WAF. table:analysis-sloc Analysis/Transformation SLOC Count WAF Wassail WasmA Call Graph Analysis 99 185 254 Control Flow Analysis 200 1,013 348 Live Variable Analysis 50 - - Reaching Definition Analysis 99 - - Binary Decompilation Printout 63 - - Loop Rerolling 280 - - Hot Code Extraction 434 - - Esprima-Compatible Printout 45 - - JavaScript-WebAssembly Combination 357 - - Cryptominer Detection Analysis 394 - - Import/Export Call Graph 19 - - Global Variable Usage Analysis 72 - - Static Memory Size Analysis 39 - - implement analyses on top of WAF given the core framework. Since we aim to make WAF an easy-to-use static analysis framework for novice and experienced users alike, we need to investigate how difficult it is to implement analyses with WAF. In addition, we compare this development effort with existing static analysis frameworks for WebAssembly. 8.5.1.1 Development Effort for WAF Analyses We evaluate the effort required to implement each of our described analyses in Section 8.3. To demonstrate the development effort incurred, we list the source lines of code (SLOC) count used to implement each analysis or transformation using WAF in Table 8.2. As the table shows, most of our analyses can be implemented with fewer than 100 lines of code, signaling low development effort. Our most complex analyses, Loop Re-Rolling and Hot Code Extraction, require 280 and 434 lines of code, respectively, to implement, which is reasonable effort for complex analyses. 159 8.5.1.2 Existing WebAssembly Static Analysis Frameworks We compare this development effort against two existing static analysis frameworks for WebAssembly, Wassail and WasmA. Wassail [286] is a WebAssembly static analysis and inspection library that aims to enable lightweight and heavyweight analyses on WebAssembly applications. WasmA [32] is a performant static analysis framework for WebAssembly that can perform analyses such as call, control-, and data-flow graphs. Since WAF, Wassail, and WasmA implement call graph and control-flow graph analyses with their frameworks, we compare the development effort incurred by each implementation. We present the development effort in terms of SLOC count in Table 8.2. The results show that WAF uses 99 lines of code to implement the analysis. Meanwhile, Wassail and WasmA require 185 (86.87% increase) and 254 (156.57% increase) lines of code, respectively. In addition, to implement control flow analysis, WAF requires 200 lines of code, lower than either 1,013 (406.50% increase) lines required by Wassail or the 348 (74.00% increase) required by WasmA. These results demonstrate that WAF can lower the development effort required to implement static analysis when compared with existing frameworks. 8.5.2 RQ2: Binary Decompilation Effectiveness WAF implements binary decompilation by generating a C string from WebAssembly code. WAF’s semantic lifting should improve readability and comprehension compared with the low-level text. To evaluate the effectiveness of this binary decompilation, we compare the code complexity and readability metrics of the C string against decompiled code generated by three state-of-the-art WebAssembly decompilation tools, Wasm2C [320], LLVM-CBE [58], and W2C2 [224]. 8.5.2.1 Dataset For this experiment, we need a dataset of C source code files that can be compiled to WebAssembly. We leverage the collection of C files from the DeepFix dataset [120]. This dataset is comprised of C files collected 160 from an introductory university-level programming course. These files were collected using a programming tutoring system called Prutor [78]. The dataset consists of 53,478 C files gathered from users implementing ninety-three different programming tasks. Of these files, 6,971 of the samples cannot compile, so we exclude these samples from our dataset. We choose to evaluate the effects of the optimization levels O0, O1, O2 O3, Os, and Oz on the decompilation performance. For each sample, we extract the main function from the WebAssembly text files. Since this function is present in all our samples, we feed this code into the large language model. 8.5.2.2 Decompilation Tools We compare the decompiled output generated by WAF against three state-of-the-art decompilation tools. The first two tools, Wasm2C and LLVM-CBE, leverage existing IRs for WebAssembly to perform decompilation. Wasm2C [320] is a tool that leverages an existing IR for WebAssembly, WABT IR [319], to transform WebAssembly code into compatible C code. As such, it is an effective tool to measure the suitability of WABT IR for the task of binary decompilation. The second tool, LLVM-CBE [58], decompiles LLVM IR to C. To use this tool for WebAssembly decompilation, the WebAssembly binaries have to be transformed back to LLVM IR, which is not officially implemented within LLVM. As a result, we can turn to third party projects aiming to decompile WebAssembly back to LLVM IR, such as the Wasm-to-LLVM prototype tool [146] and the aWsm compiler [122]. Since both tools are under active development, we bypass this step by compiling WebAssembly to LLVM IR and then using LLVM-CBE to decompile the IR back to C. Rather than focusing on the limitations of in-progress tools, this approach evaluates the effectiveness of the LLVM IR itself for binary decompilation usage. The third tool, W2C2 [224],converts WebAssembly code into portable C code. 161 8.5.2.3 Code Complexity and Readability Metrics subsec:readability-metrics Previous work on code complexity and readability has proposed several metrics that use different features of a program to estimate its complexity and readability. We describe six complexity and readability metrics below. • Source Lines of Code (SLOC): Counts non-empty lines in the program source code excluding comments [231]. A lower SLOC implies increased readability. • Maximum Depth of Nesting: Counts number of nested control structures (e.g., branches and loops) in the program. Nested code beyond three levels is difficult to understand [213, 154]. Larger nesting depths indicate decreased code readability. • Shannon Byte Entropy: Measures the amount of information or uncertainty contained by a signal or event [245]. On a random variable X composed of terms xi , it is defined as H(X) = − Pn i=1 p(xi)log2(p(xi)) [279]. Higher byte entropy indicates higher code readability. • Halstead Difficulty: Measures the difficulty in understanding a program from the operators and operands in its source code [124]. A program with a higher Halstead difficulty than another program will have a higher code complexity than that other program and may be more challenging to understand. • McCabe Cyclomatic Complexity: Measures the number of linearly independent paths through the program’s source code [211]. A higher McCabe cyclomatic complexity value indicates higher code complexity. 162 Table 8.3: Code Complexity and Readability Metrics. table:binary-decomp-metrics Opt Level Source Lines of Code Maximum Depth of Nesting Byte Entropy OG WAF Wasm2C W2C2 L-CBE OG WAF Wasm2C W2C2 L-CBE OG WAF Wasm2C W2C2 L-CBE O0 24.47 63.00 460.42 572.97 143.31 2.25 2.38 1.00 0.39 2.09 0.53 0.62 0.58 0.60 0.58 O1 24.47 54.92 178.66 204.54 122.91 2.25 2.02 1.00 0.39 2.08 0.53 0.62 0.58 0.60 0.61 O2 24.46 51.61 186.95 215.04 163.77 2.26 1.99 1.31 0.63 2.05 0.53 0.62 0.57 0.60 0.61 O3 24.41 53.43 190.98 219.69 169.97 2.25 1.99 1.31 0.63 2.03 0.53 0.62 0.57 0.60 0.61 Os 24.46 42.59 144.87 167.46 124.57 2.25 1.83 1.30 0.63 2.05 0.53 0.62 0.57 0.60 0.61 Oz 24.47 36.81 141.18 158.91 105.44 2.25 2.00 1.33 0.73 2.07 0.53 0.62 0.57 0.60 0.61 Opt Level Halstead Difficulty McCabe Cyclomatic Complexity OG WAF Wasm2C W2C2 L-CBE OG WAF Wasm2C W2C2 L-CBE O0 31.19 13.73 20.10 15.78 29.24 4.73 4.27 3.23 1.80 7.38 O1 31.20 14.17 18.16 14.16 30.87 4.74 4.70 3.43 1.92 7.67 O2 31.21 16.33 25.58 17.08 33.66 4.74 5.36 4.29 2.33 8.87 O3 31.14 16.35 25.64 17.12 33.75 4.72 5.53 4.39 2.39 9.07 Os 31.20 15.27 22.48 15.57 31.50 4.74 4.42 3.71 2.00 7.62 Oz 31.20 14.56 21.78 14.76 28.31 4.74 3.51 3.10 1.64 5.94 8.5.2.4 Results We present the binary decompilation results in Table 8.3. For each optimization level, we present the average code metrics of the original source code samples and the samples after they are compiled to WebAssembly and decompiled with WAF, Wasm2C, and W2C2. In addition, we compile the samples into LLVM IR and decompile them with LLVM-CBE to collect metric on its decompiled code. Overall, we can see that WAF generally produces more readable code than Wasm2C, W2C2, or LLVMCBE. When looking at the source lines of code, we can see that for each optimization level, WAF produces more compact decompiled code. For example, with level O0, the average SLOC count of the original samples is 24.47, and WAF produces an average SLOC count of 63.00 (1.57x the original) . Meanwhile, Wasm2C produces code with an average SLOC count of 460.42 (17.82x the original), W2C2 generates code with an average SLOC count of 572.97 (22.42x the original), and LLVM-CBE decompiled code has an average SLOC count of 143.31 (4.86x the original). For the maximum depth of nesting values, all tools have an average nesting depth level less than 3, so the decompiled code does not suffer from too much nesting. However, we do note that Wasm2C and W2C2 have much lower nesting depths than the original code (1.00 and 0.39, respectively), indicating that the generated code may suffer from too little use of semantic constructs such 163 as loops and conditional statements. Finally, for byte entropy, the results show that the decompiled code generated by WAF has a higher byte entropy (0.62), i.e., more information, than the decompiled code of Wasm2C (0.58), W2C2 (0.60), and LLVM-CBE (0.58). The results in the code complexity metrics are more nuanced. For example, in level O0, WAF produces code with lower Halstead difficulty (13.73) than either Wasm2C (20.10), W2C2 (15.78), or LLVM-CBE (29.24). However, while WAF generates code with lower McCabe cyclomatic complexity (4.27) than LLVMCBE (7.38), WAF produces samples with higher complexity than Wasm2C (3.23) and W2C2 (1.80). These differences in the code complexity metrics can be explained by highlighting the different properties these metrics measure. Halstead difficulty measures the ratio of unique operators compared to the total number of operators. Since WAF’s C string only represents a small subset of C keywords and operators needed to represent simpler control flow in WebAssembly, the original source code may include more unique operators than the decompiled code. This can lead to an overall lower Halstead difficulty measure than the original source. McCabe cyclomatic complexity measures the number of linearly independent paths within a program’s control flow graph. The close value obtained by WAF (4.27) to the original source code (4.73) indicates a similar number of conditional constructs are used. Meanwhile, the low values of Wasm2C (3.23) and W2C2 (1.80) reinforce the notion that the decompiled codes lack conditional statements. Overall, the results indicate that WAF is capable of producing compact that has an overall complexity comparable to that of the original source code. 8.5.3 RQ3: Performance Optimizations Effectiveness We develop two program optimization techniques that aim to improve WebAssembly runtime performance. Our first technique, hot code extraction (Section 8.3.2.1), aims to separate hot code out from within a larger function into a new function to avoid counterintuitive runtime impacts that function inlining can cause. Our second technique, loop re-rolling (Section 8.3.2.2), converts a series of repeated, similar statements 164 into a loop construct to reduce the code size of the parent function. Reducing the code size can mitigate counterintuitive runtime effects of WebAssembly. 8.5.3.1 Dataset subsubsec:eval-program-opts We leverage benchmarking samples from the LLVM test suite [193] to test the runtime duration changes caused by our program optimizations. We use samples within the SingleSource/Benchmarks directory, which includes samples from the Polybench benchmark suite [246]. In addition, the samples perform complex numeric computations, such as Fibonacci number computation [191], Cholesky factorization [192], and Huffman compression [193]. Since 16 samples do not compile with Emscripten, we use 127 samples to run our experiments. 8.5.3.2 Experimental Setup We run the samples in Chromium and instrument the WebAssembly JavaScript APIs to insert timing code on every WebAssembly export call. We rerun each sample 100 times and report the average runtime duration for each sample. We also test each program when compiled with optimization levels O2 and O3 as these optimization levels in the Emscripten pipeline apply the counterintuitive optimizations that our techniques aim to mitigate against. 8.5.3.3 Results 8.5.3.4 Loop Re-Rolling In Table 8.4, we list the program runtime duration before and after applying each program optimization technique. We list only the samples showing a runtime duration decrease greater than 10%. We find 4 samples experiencing improved runtime performance when we apply the loop re-rolling optimization pass. In particular, we find that in O2 and O3, the durbin.c experiences the smallest speedup at 15.82% and 17.94% 165 Table 8.4: Samples Experiencing Improved Runtime Performance After Optimization. table:eval-program-opt-results Sample Name OG∗ Opt† % Change OG∗ Opt† % Change Loop Re-Rolling O2 O3 flops-5.c 606.60 51.14 -91.57% 608.02 64.52 -89.39% revertBits.c 171.48 53.64 -68.72% 170.61 53.83 -68.45% durbin.c 16.37 13.78 -15.82% 16.58 13.60 -17.94% gramschmidt.c 7,212.67 5,537.64 -23.22% 7,107.27 5,515.76 -22.39% Hot Code Extraction O2 O3 evalloop.c 1,576.46 1,412.10 -10.43% 1,609.36 1,388.18 -13.74% revertBits.c 171.48 150.89 -12.01% 170.61 151.59 -11.15% ary.cpp 46.82 38.65 -17.44% 47.68 41.15 -13.69% methcall.cpp 4,697.54 4,203.36 -10.52% 4,649.94 4,210.10 -9.46% random.cpp 1,962.98 1,739.60 -11.38% 1,927.03 1,454.10 -24.54% moments.cpp 79.37 63.68 -19.77% 81.17 64.14 -20.99% Note: OG∗ is short for Original, Opt† is short for Optimized duration decrease for O2 and O3, respectively. Meanwhile, flops-5.c experiences the largest speedup at 91.57% and 89.39% for O2 and O3, respectively. These results show that certain samples can significantly benefit from the loop re-rolling optimization. 8.5.3.5 Hot Code Extraction We list 6 samples that experience improved program runtime performance after applying hot code extraction. We find that the smallest runtime performance improvement is seen in evalloop.c at 10.43% while the largest improvement is seen in moments.cpp at 19.77% with level O2. In level O3, we see that the smallest improvement (greater than 10%) is seen in revertBits.c at 11.15%, and the largest speedup is seen in random.cpp at 24.54%. 8.5.4 Evaluating Cross-Language Malware Detection Analysis We apply the cross-language program analysis abilities of WAF for the specific task of detecting crosslanguage malware. Namely, we use the tool Wobfuscator (Chapter 7) in order to create cross-language JavaScript- WebAssembly malware from malicious JavaScript samples. We use state-of-the-art detection tools on a large dataset of malicious files under three configurations: (i) a baseline configuration with no 166 Table 8.5: Datasets for Evaluating Cross-Language Malware Detection Effectiveness. table:dataset Datasets # Samples (Files) Benign JS150k 149,677 Malicious VirusTotal 2,674 Hynek Petrak 39,450 GeeksOnSecurity 1,375 Total Malicious 43,499 obfuscation applied, (ii) samples transformed with Wobfuscator (iii) samples transformed with Wobfuscator then processed by WAF’s cross-language combination transformation. 8.5.4.1 Datasets We need to train the tested JavaScript malware detectors on large sets of real-world benign and malicious JavaScript code. Table 8.5 summarizes the datasets we use. For the benign samples, we use 149,677 files from the JS150k dataset [253]. For the malicious samples, we use 2,674 samples from VirusTotal [315], 39,450 samples from the Hynek Petrak JavaScript malware collection [242], and 1,375 samples from the GeeksOnSecurity malicious JavaScript dataset [202] for a total of 43,499 samples. 8.5.4.2 JavaScript Malware Detectors subsec:javascript-malware-detectors We evaluate our transformation technique against four state-of-the-art, static, learning-based JavaScript malware detectors used to evaluate Wobfuscator. As done in Chapter 7, we split the benign and malicious datasets into training, validation, and test sets containing 70%, 15%, and 15% of the samples, respectively. We re-list the four JavaScript malware detectors below for convenience: • Cujo [254]: A hybrid JavaScript malware detector performing lexical analysis on JavaScript files. We use the static detection part, based on a re-implementation of Cujo provided by Fass et al. [19] 167 • Zozzle [76]: An in-browser detection tool that uses syntactic information, such as identifier names and code locations, obtained from a JavaScript AST. We rely on a reimplementation of Zozzle provided by Fass et al. [20]. • JaSt [97]: A static malicious JavaScript detector that uses AST n-grams to identify patterns indicative of malicious behavior. We use the implementation made available on the project’s GitHub page [17]. • JStap [96]: A static malware detector that leverages syntax, control-flow, and data-flow data from ASTs, control flow graphs (CFGs), and program dependency graph (PDGs). The tool extracts features either by constructing n-grams of nodes or by combining the AST node type with its corresponding identifier/literal value. We evaluate both the n-grams and values feature extraction modes using the implementation available on GitHub [18]. 8.5.4.3 Results We present the results of the cross-language malware detection in Table 8.6. For each malware detection tool, we present the sample detection rates for four different Wobfuscator transformation techniques. For each technique, we present the detection results after Wobfuscator is applied, the detection rate after the Wobfuscated samples are processed with the cross-language combination technique provided by WAF, and the percent change in detection rates between the Wobfuscated samples and WAF-processed samples. As the results show, in some detectors, such as JaSt and JStap, the detection rates can be significantly improved after applying WAF to the samples. For other detectors, the detection rates do not change by a large amount. Since JaSt and JStap (N grams) both heavily rely on JavaScript n-grams, the large change in detection rates suggests that WAF cross-language combination technique is better suited to reconstruct the syntactic structure of the functionality moved into WebAssembly by Wobfuscator. Since Zozzle, Cujo, and JStap in the identifier values configuration rely on lexical information, the lack of change in the detection results suggest that WAF’s transformation cannot fully recover the exact tokens removed by Wobfuscator. 168 Table 8.6: Detection Rates of Samples Obfuscated by Wobfuscator then Transformed by WAF. table:waf-cross-language-results Sample Configuration Zozzle Cujo JaSt Baseline 66.67% 98.00% 99.88% Wob* WAF’d† % Change Wob* WAF’d† % Change Wob* WAF’d† % Change Wobfuscated (Calls Only) 65.54% 65.89% 0.53% 70.23% 76.56% 9.01% 37.91% 67.42% 77.83% Wobfuscated (Ifs Only) 65.52% 65.87% 0.54% 77.37% 86.39% 11.65% 82.35% 87.94% 6.80% Wobfuscated (Fors Only) 66.38% 66.75% 0.56% 90.38% 90.80% 0.47% 91.96% 96.90% 5.36% Wobfuscated (Combined Calls, Ifs, Fors) 65.33% 63.54% -2.73% 65.87% 70.77% 7.44% 32.43% 40.34% 24.39% JStap (N grams) JStap (Values) Baseline 99.82% 98.54% Wob* WAF’d† % Change Wob* WAF’d† % Change Wobfuscated (Calls Only) 30.02% 59.20% 97.19% 76.61% 77.73% 1.46% Wobfuscated (Ifs Only) 73.97% 79.83% 7.93% 92.71% 93.19% 0.51% Wobfuscated (Fors Only) 80.16% 94.69% 18.12% 97.25% 97.24% -0.01% Wobfuscated (Combined Calls, Ifs, Fors) 9.12% 27.88% 205.72% 75.62% 76.31% 0.90% Note: Wob* is short for Wobfuscator, WAF’d† is short for WAF-Transformed Further work could be done to improve the cross-language translation performed by WAF to cover more of the lexical tokens available in JavaScript. 8.5.5 Evaluation of WebAssembly Malware Detection We evaluate the effectiveness of WAF at implementing a malware detection algorithm, specifically a WebAssembly cryptojacking detection algorithm . We start by constructing a dataset of known cryptojacking modules. We then process them through an analysis pass created using WAF, and we report the detection rate of the cryptojacking detection analysis. 8.5.5.1 Comparison with MinerRay We evaluate our cryptomining detection analysis implemented in WAF against our cryptojacking detection tool, MinerRay. Both techniques are similar in that they use an intermediate representation to abstract the WebAssembly semantics. However, the IRs of WAF are more robust that the one used in MinerRay, so it should be able to better handle a wide variety of WebAssembly samples and identify cryptomining details within them. 169 Table 8.7: Cryptominer Detection Results. table:waf-malware-detection Detector Total Cryptominer Samples Identified Undetected Errored WAF 45 34 2 9 MinerRay 45 16 0 29 8.5.5.2 Dataset To evaluate the cryptojacking detection analysis, we leverage a dataset of WebAssembly cryptominer modules used in WASPur [256]. This dataset includes 45 WebAssembly modules implementing cryptojackers used in 900 websites. Since both MinerRay and WAF perform offline analysis of WebAssembly binary modules for cryptojacking detection, we leverage this set of known cryptomining modules to test their detection performance. 8.5.5.3 Results We present the detection results in Table 8.7. The table shows that of the 45 cryptomining WebAssembly samples in the dataset, the cryptomining detection analysis implemented using WAF is able to detect 34 of the samples. It incorrectly misses two of the samples. In addition, the tool was unable to parse 9 samples. We note that this is the result of WAF being in its prototype stage of implementation, and these errors can be addressed to handle these samples. Meanwhile, MinerRay detects only 16 out of the 45 samples. In addition, it encounters an error parsing 30 of the WebAssembly samples. We use these results to show that WAF is flexible enough to implement a WebAssembly malware detection approach that is effective enough to detect 34 out of 36 inspected samples (94.44%). 8.5.6 RQ4: Evaluating Framework Overhead We demonstrate how efficient WAF is in terms of its duration when processing a WebAssembly file and the memory size it occupies. To evaluate these metrics, we apply WAF onto a large dataset of diverse 170 10 1 10 0 10 1 10 2 10 3 10 4 10 5 Wat Byte Size (KB) 10 0 10 1 10 2 10 3 Time (ms) Waf-Low Waf-Mid Waf-High Figure 8.5: WAF Duration for IR Generation. fig:waf-duration WebAssembly files. We measure the time duration and memory used when our framework constructs the Waf-Low, Waf-Mid, and Waf-High IRs. We also investigate how each IR’s outputs change in size through the lifting transformations. We perform our measurements on a desktop computer with an Intel i9 CPU@5GHz and 64GB RAM running on Ubuntu 22.04. WebAssembly Sample Dataset To evaluate the resource overhead of WAF, we leverage two existing datasets of WebAssembly samples collected from the wild. The first dataset, the WasmBench dataset [132], contains 8,461 unique WebAssembly samples from a variety of sources, including websites, GitHub repositories, WebAssembly package managers, browser extensions, and more. The second dataset, the WASPur dataset [256], contains 1,829 unique WebAssembly modules found in websites, Chrome extensions, Firefox add-ons, and GitHub repositories used to train an automated classification tool. Combining these two datasets provides us with a large number of diverse WebAssembly samples. Duration When Using WAF We measure the runtime performance of WAF by recording the duration required to generate each of our IR levels for each WebAssembly sample. Figure 8.5 plots the duration to generate each of our three IR levels in milliseconds against the WebAssembly text file size in kilobytes. On average, it takes 77.70 ms to generate Waf-Low, 105.48 ms to generate Waf-Mid, and 162.89 ms to generate Waf-High. As the results show, for 171 10 1 10 0 10 1 10 2 10 3 10 4 10 5 Wat Byte Size (KB) 10 1 10 0 10 1 10 2 10 3 Memory Usage (MB) Waf-Low Waf-Mid Waf-High Figure 8.6: WAF Memory Usage for IR Generation. fig:waf-memory-usage each of our IR levels, the generation process grows with increased WebAssembly text size, but overall, each IR can be generated in less than a few seconds. Memory Usage When Using WAF We also measure the memory usage of our framework when generating each of our IR levels. Figure 8.6 plots the amount of memory in megabytes used when generating each IR level. The figure shows that Waf-Low generally remains under 100 MB of memory. Our Waf-Mid and Waf-High IRs require more memory and can grow up to 1GB of memory usage. We note that while this amount of memory used is high, our current implementation is a prototype, and the memory efficiency of our IR units can be improved as future work. Nevertheless, the memory requirements of our IR are not out of reach for modern development machines. 8.6 Limitations This section discusses the limitations of our current implementation. Similar to other static analysis techniques, our framework suffers from some limitations in its analysis ability. As a static analysis approach, we are limited to information available within the module’s text files. We cannot leverage values that can only be known at runtime. However, we can reason about some static values. For example, when we replace call_indirect uses, we can reason about the possible values of the indexing value as it can only target valid indices within the function table. 172 In addition, we are also currently limited by the lack of static analysis techniques to recover highlevel type information from WebAssembly binaries. While previous work has been done to recover these types [178], the technique only achieves an accuracy rate of 75.2%, which is not reliable enough for robust analyses. For this reason, our analyses currently cannot reliable recover the high-level data structures present in the source code of a WebAssembly module. 8.7 Summary There exist several limitations within the current WebAssembly ecosystem that introduce new challenges for statically analyzing WebAssembly applications. To address these challenges, this chapter presents WAF, a novel WebAssembly Analysis Framework. The core of WAF consists of three intermediate representations: Waf-Low, Waf-Mid, and Waf-High. We describe how each IR level gradually lifts the WebAssembly text representation higher to make it easier for developers to reason about WebAssembly semantics. We demonstrate that WAF is flexible enough to be used in a variety of applications, including the implementation of semantically-rich binary decompilation, novel program optimizations, cross-language program analysis, and WebAssembly malware detection. We hope our WAF framework will enable new static analysis design for WebAssembly to address the limitations of the current ecosystem and new challenges that the young language may bring. 173 Chapter 9 Related Work chp:related-work In this section, we describe the existing research work that are closely related to the three WebAssembly static analysis challenges discussed in this dissertation. 9.1 General WebAssembly Related Haas et al. [123] explain the motivation and benefits of introducing a new bytecode language to the Web. The Wasabi framework [179] is an important tool for dynamic analysis of WebAssembly introduced early in WebAssembly ecosystem. There have been several works that aim to analyze the WebAssembly specification to prove its correctness and support verifying the language [333, 354, 336, 270, 271]. Previous work has even leveraged WebAssembly as a software sandbox itself [31]. In addition, some works have investigated the runtime performance of WebAssembly [144, 352, 283, 326], while other works have studied WebAssemlby energy usage [79, 93, 321]. 9.1.1 WebAssembly Prevalence Studies Prior work aims to study the prevalence of WebAssembly by retrieving WebAssembly samples from web crawling [221]. Hilbig et al. perform an empirical study on the real-world usage of WebAssembly binaries [132]. 174 9.1.2 WebAssembly Analysis Tools Existing work develops tools to analyze WebAssembly applications. Wasabi is a framework to dynamically analyze WebAssembly code via instrumentation [179]. Wizard is a tool for non-intrusive dynamic WebAssembly instrumentation [300]. WASim and WASPur are tools that automatically identifies the purposes of a WebAssembly modules and functions [257, 256]. WasmA [32] and Wassail [286] are two static analysis frameworks for WebAssembly that enable lightweight and heavyweight analyses, including control flow analysis, data flow analysis, and taint analysis. BREWasm is a framework to ease binary rewriting for WebAssembly [41]. 9.2 WebAssembly Compiler and Runtime Related 9.2.1 Compiler Studies Previous studies investigate the prevalence of compiler bugs [289, 260] and survey compiler testing approaches [46]. Other studies develop compiler testing techniques, such as equivalence modulo inputs [173, 174] and skeletal program enumeration [360]. 9.2.2 Compiler Optimizations Studies Existing work studies the impacts of different optimizations on specific processor architectures [28] and highlevel synthesis [60]. Some work proposes optimization frameworks improving SIMD performance [134]. Other works leverage machine learning techniques on optimization selection [155, 204]. Theodoridis et al. describe LLVM inlining heuristics improvements in native applications [297]. 175 9.2.3 WebAssembly Compiler and Runtime Testing An active area of research focuses on developing methods to effectively test WebAssembly compilers, runtimes, and applications. Studies have investigated the presence of bugs in WebAssembly compilers [260] and runtimes [329]. Some works leverage differential fuzzing approaches to detect WebAssembly runtime bugs [364, 42, 127], to detect server-side WebAssembly performance issues [147], and to find WebAssembly compiler misoptimization problems [188]. 9.3 WebAssembly Program Comprehension Related 9.3.1 Machine Learning on Binary Code Machine learning techniques have been applied on binary and source code to assist in program analysis [234, 199, 262, 209, 325], categorize software [187], identify vulnerabilities [148], and detect malicious executables [273, 9, 16, 106, 267]. Prior work leverages recurrent neural networks for decompilation [157]. Cao et al. use neural networks to improve decompilation on optimized binaries [43]. Similar techniques have been developed for WebAssembly to recover high-level type information from the binaries using neural networks [178]. 9.3.2 Large Language Model Techniques With the advancement of neural approaches, pre-trained large language models (LLMs) have been leveraged for various tasks traditionally left for static program analysis. Large language models have shown impressive performance on the task of code generation [98, 232, 330, 159] and binary decompilation [350, 241, 158, 136]. Naturally, LLMs have gained attraction for the task of WebAssembly binary decompilation and reverse engineering. Benali [25] explores WebAssembly decompilation using a Neural Machine Translation (NMT) model [312] versus a Long Short-Term Memory (LSTM) model [133]. Some works improve on neural 176 decompilation by recovering high-level semantic constructs [185]. Similar works have leveraged language models to enable WebAssembly reverse engineering through natural language descriptions [137]. 9.4 WebAssembly Security Related Several works analyze WebAssembly security. Previous works have surveyed the existing techniques addressing WebAssembly security [166]. Prior work proposes specification and compiler extensions to improve security [334, 83, 335, 150, 226]. Some security tools perform authoring [335]. Other works identify vulnerabilities in WebAssembly applications [177, 180, 362] or propose attack strategies using WebAssembly [261]. There has been recent work aiming at using semantic reconstruction to aid in detecting JavaScript-WebAssembly multilingual malware [345]. Some works investigate utilizing software diversification to improve WebAssembly binary security [35, 36, 37]. There has also been prior work in identifying vulnerabilities in WebAssembly-based smart contracts [363, 47]. 9.4.1 WebAssembly Cryptominers 9.4.1.1 Static Techniques Static detection techniques inspect program behaviors to see if the patterns match known cryptominers [222, 265, 170, 327]. For example, Minesweeper detects WebAssembly-based cryptominers within the Alexa top 1 million [170] by counting the number of bit operations and comparing the signatures observed in CryptoNight . SEISMIC [327] detects WebAssembly cryptominers by counting the number of executed arithmetic operations. Aside from WebAssembly, static analysis techniques have been developed that can identify cryptographic applications. Aligot [38] detects obfuscated cryptographic primitives by comparing their input-output parameters. CryptoHunt [347] uses symbolic execution to identify cryptographic functions in obfuscated binary code. 177 9.4.1.2 Dynamic Techniques Several approaches used behavioral analysis to detect cryptominers. Papadopoulos et al. [239] capture several metrics when visiting a website, such as memory consumption, CPU thread usage, and system temperature, to identify cryptominers. CloudRadar [361] collects features from hardware performance counters to identify the execution of cryptographic applications to detect side-channel attacks in cloud systems. Gröbert et al. [118] identify cryptographic primitives and keys in binary programs with dynamic binary program instrumentation. CMTracker [135] captures data based on a hashing-function profiler and a stack-structure profiler to identify miners. Rüth et al. [264] augment No Coin with dynamic analysis afterword to identify mining sites that get past the static filter. Bijmans et al. [27] also use a modified Chromium build to capture WebAssembly modules and inspect WebSocket messages passed in a large-scale study to identify cryptominers hidden in websites. CPU usage analysis looks for abnormally high CPU usage [222, 265, 170, 292] because miners are likely to cause a spike in CPU usage. Although greedy miners can be found, it is easy to circumvent by throttling [90]. 9.4.1.3 Machine Learning Techniques Other approaches use machine learning classification techniques to identify cryptominers. Musch et al. [221] create n-gram sequences from WebAssembly binaries to use as features for classification. Outguard [162] collects several features from an instrumented browser to classify sites. RAPID [255] captures resource usage and API event data through instrumented HTML APIs to create features for an SVM classifier. MINOS [227] uses a convolutional neural network (CNN) model to detect WebAssembly-based cryptojackers. 9.4.2 Obfuscation Studies and Techniques Obfuscation techniques have been observed in various programming languages and software domains for both malicious and benign purposes. Previous works categorize obfuscation techniques applied on 178 malicious code [353] while others compare the effectiveness of different obfuscation techniques [45, 125, 324]. Some studies have analyzed the usage of obfuscation techniques specifically in JavaScript code by investigating the obfuscation techniques used in real-world malicious and benign files [349, 280]. Håkon and Morrison [126] explore the effects of WebAssembly obfuscation on evading cryptojacking detection. There is little work proposing new obfuscation attacks for JavaScript code. Fass et al. [95] construct HideNoSeek which rewrites the ASTs of malicious programs into the AST of known benign programs to avoid detection. The authors evaluate HideNoSeek on 91,020 samples against VirusTotal, Yara, JaSt, Zozzle and Cujo. The obfuscation technique is able to achieve a 99.98% false negative rate against the detectors. 9.4.2.1 Obfuscated Malware Detection An active area in academic research produces static analysis techniques designed to identify malicious behavior even in the presence of obfuscated code. One class of static detection tools use lexical and syntactic information derived from JavaScript files in order to identify features that indicate malicious code [254, 76, 97, 39, 278]. Techniques build on this syntactic information by incorporating control-flow and data-flow analysis [96] or by adding dynamic analysis to confirm the presence of malware [348, 323]. Other static detection techniques analyze the JavaScript source code through machine-learning and deep-learning approaches [328, 228]. Other malware detection techniques dynamically analyze programs to identify malicious behaviors. Some techniques focus on collecting runtime statistics to construct models that identify malware [72, 268, 351]. Other techniques leverage symbolic execution [169] or forced execution [165] to trigger malware hidden behind complex input sequences. Wobfuscator is unlikely to reduce the detection rate against these detectors as we do not significantly change the runtime behavior. 179 9.4.2.2 Obfuscation Detection Some existing work only focuses on detecting obfuscation rather than obfuscated malware. NOFUS [153] and JSOD [6] use syntactic and contextual information as features for a machine learning classifier to detect obfuscation. Sarker et al. [268] develop a hybrid approach by instrumenting browser APIs and determining whether the traced API call corresponds to a static code location. 180 Chapter 10 Future Work and Conclusions chp:future-work-conclusion 10.1 Future Work sec:future-work 10.1.1 Leveraging WAF in Neural Techniques As described in Chapter 5, our work shows that leveraging an intermediate representation abstracting a few WebAssembly instructions lead to great classification performance when combined with machine learning techniques. Since the IRs in WAF offer richer semantic detail than the simpler IR used, we plant to explore the use of WAF in conjunction with machine learning classifiers for the use of WebAssembly code purpose identification. Additionally, the IRs of WAF, especially Waf-High, may be useful for code comprehension techniques when combined with large language models, such as ChatGPT [236]. This combination may allow large language models to overcome context length limitations when compared to handling raw WebAssembly text. Additionally, the use of a richer IR can help build on the approach used in WASPur to construct a more-granular WebAssembly code purpose identification tool. 10.1.2 Develop Cross-Language Program Optimizations WebAssembly applications are typically cross-language applications by default. However, the optimization stages of compilers typically analyze the generated JavaScript and WebAssembly code separately for 181 optimizations. As WAF is designed for cross-language program analysis, our framework could be used to implement optimizations that view WebAssembly applications in a holistic manner. 10.1.3 Extend into Non-Web WebAssembly Use Cases Currently, WAF and its applications are designed for usage within a web context. For example, the crosslanguage program analysis assumes that JavaScript is used as the host language. This assumption may not be true if the WebAssembly application is run within a standalone virtual machine such as WAVM [269] or Wasmtime [8]. Rather than interacting with JavaScript, WebAssembly applications may interact with the host through a different method, such as the WebAssembly System Interface (WASI) [51]. WAF could be extended to support to model interaction between WebAssembly and WASI implementations, constructing a holistic approach for these non-Web WebAssembly modules. 10.1.4 Develop WebAssembly Application Bug Detection Techniques As discussed in Chapter 3, WebAssembly compilers can produce bugs in the WebAssembly applications that they generate. These bugs are usually only found during runtime as they require interaction from both JavaScript and WebAssembly to occur. We can leverage the cross-language analysis abilities of WAF to implement static techniques for inconsistencies between WebAssembly and JavaScript implementations. For example, incorrect data value legalization (Section 3.3.2.2) could be detected by analyzing the calls occurring between the two languages. 10.2 Conclusions sec:conclusions This dissertation explores static program analyses for WebAssembly. We investigate three challenges and opportunities for static program analysis techniques within the WebAssembly ecosystem. First, we explore 182 the unique issues faced by the infrastructure supporting WebAssembly compilation and execution (Chapters 3 and 4). Second, we apply static analysis techniques to aid in WebAssembly program comprehension and report on the obstacles future approaches must overcome to improve their performance (Chapter 5). Third, we inspect the security risks associated with WebAssembly malware and the possible avenues to leverage static analyses to mitigate these risks (Chapters 6 and 7). Finally, Chapter 8 presents WAF, a multi-purpose static program analysis framework for WebAssembly designed to address these challenges. The core of WAF lies in the three intermediate representations, Waf-Low, Waf-Mid, and Waf-High, that lift the WebAssembly module to higher semantic levels and abstract away the low-level WebAssembly syntax. We demonstrate the usage of WAF in four use cases: program optimization, binary decompilation, cross-language program analysis, and malware detection. We evaluate the effectiveness of WAF in each of the targeted use cases. Our general-purpose static analysis framework can leverage its semantic-lifting IR levels to address the open challenges present within the WebAssembly ecosystem. 183 Bibliography [1] -G4 and Wasm2js Should Generate JS Source Map or Be an Error · Issue #8743. 2020. [2] -Mbulk-Memory Generates a 500 Byte Zero Segment Embedded into .Wasm · Issue #8899. 2020. [3] -S MAIN_MODULE=1 + Upstream + Function Pointer Calls Using I64 => TypeError: Cannot Pass I64 to or from JS · Issue #9562. 2021. url: https://github.com/emscripten-core/emscripten/issues/9562 (visited on 01/30/2021). [4] 262588213843476. Npm Rank. Gist. url: https://gist.github.com/anvaka/8e8fa57c7ee1350e3491 (visited on 08/10/2021). [5] Martín Abadi et al. “TensorFlow: A System for Large-Scale Machine Learning”. In: Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation. OSDI’16. USA: USENIX Association, Nov. 2, 2016, pp. 265–283. isbn: 978-1-931971-33-1. [6] Ismail Adel AL-Taharwa et al. “JSOD: JavaScript Obfuscation Detector: JSOD”. In: Security and Communication Networks 8.6 (Apr. 2015), pp. 1092–1107. issn: 19390114. doi: 10.1002/sec.1064. (Visited on 07/18/2021). [7] Algorithms. XMRig, 2020. url: https://xmrig.com/docs/algorithms. [8] Bytecode Alliance. Wasmtime. url: https://wasmtime.dev/ (visited on 09/12/2023). [9] Kevin Allix et al. “Empirical Assessment of Machine Learning-Based Malware Detectors for Android”. In: Empirical Software Engineering 21 (Dec. 2014), pp. 1–29. doi: 10.1007/s10664-014-9352-6. [10] Mohamed Alsharnouby, Furkan Alaca, and Sonia Chiasson. “Why Phishing Still Works: User Strategies for Combating Phishing Attacks”. In: International Journal of Human-Computer Studies 82 (2015), pp. 69–82. doi: 10.1016/j.ijhcs.2015.05.005. [11] anseki. Gnirts. url: https://github.com/anseki/gnirts (visited on 11/17/2021). [12] Antoine. Answer to "Clang Optimization Levels". Stack Overflow. Mar. 21, 2013. url: https://stackoverflow.com/a/15548189 (visited on 08/16/2022). 184 [13] Shay Artzi et al. “A Framework for Automated Testing of JavaScript Web Applications”. In: ICSE. 2011, pp. 571–580. [14] Asm.Js. 2014. url: https://asmjs.org. [15] AssemblyScript. url: https://www.assemblyscript.org/ (visited on 10/03/2021). [16] M. A. Atici, S. Sagiroglu, and I. A. Dogru. “Android Malware Analysis Approach Based on Control Flow Graphs and Machine Learning Algorithms”. In: 2016 4th International Symposium on Digital Forensic and Security (ISDFS). Apr. 2016, pp. 26–31. doi: 10.1109/ISDFS.2016.7473512. [17] Aurore54F. JaSt - JS AST-Based Analysis. url: https://github.com/Aurore54F/JaSt (visited on 08/10/2021). [18] Aurore54F. JStap: A Static Pre-Filter for Malicious JavaScript Detection. url: https://github.com/Aurore54F/JStap (visited on 08/10/2021). [19] Aurore54F. Lexical-Jsdetector. url: https://github.com/Aurore54F/lexical-jsdetector (visited on 08/10/2021). [20] Aurore54F. Syntactic-Jsdetector. url: https://github.com/Aurore54F/syntactic-jsdetector (visited on 08/10/2021). [21] Clemens Backes. Liftoff: A New Baseline Compiler for WebAssembly in V8 · V8. url: https://v8.dev/blog/liftoff (visited on 03/14/2022). [22] Thoms Ball. “The Concept of Dynamic Analysis”. In: ACM SIGSOFT Software Engineering Notes 24.6 (Nov. 1999), pp. 216–234. issn: 0163-5948. doi: 10.1145/318774.318944. (Visited on 04/22/2024). [23] Timo Bartkewitz. Building Hash Functions from Block Ciphers, Their Security and Implementation Properties. 2009. [24] Matteo Basso. Mbasso/Awesome-Wasm. 2019. url: https://github.com/mbasso/awesome-wasm. [25] Adam Benali. An Initial Investigation of Neural Decompilation for WebAssembly. 2022. url: https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-310093 (visited on 06/21/2023). [26] Eli Bendersky. Parsing C++ in Python with Clang. url: https://eli.thegreenplace.net/2011/07/03/parsing-c-in-python-with-clang (visited on 09/01/2022). [27] Hugo L.J. Bijmans, Tim M. Booij, and Christian Doerr. “Inadvertently Making Cyber Criminals Rich: A Comprehensive Study of Cryptojacking Campaigns at Internet Scale”. In: 28th USENIX Security Symposium (USENIX Security 19). Santa Clara, CA: USENIX Association, Aug. 2019, pp. 1627–1644. isbn: 978-1-939133-06-9. url: https://www.usenix.org/conference/usenixsecurity19/presentation/bijmans. 185 [28] Aart JC Bik, David L Kreitzer, and Xinmin Tian. “A Case Study on Compiler Optimizations for the Intel® Core TM 2 Duo Processor”. In: International Journal of Parallel Programming 36.6 (2008), pp. 571–591. [29] Binaryen Port Failing to Compile on Windows · Issue #4821. 2020. [30] Alex Biryukov, Daniel Dinu, and Dmitry Khovratovich. Argon2: The Memory-Hard Function for Password Hashing and Other Applications. 2017. url: https://www.cryptolux.org/images/0/0d/Argon2.pdf. [31] Jay Bosamiya, Wen Shih Lim, and Bryan Parno. “Provably-Safe Multilingual Software Sandboxing Using WebAssembly”. In: (). [32] Florian Breitfelder et al. “WasmA: A Static WebAssembly Analysis Framework for Everyone”. In: 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). Mar. 2023, pp. 753–757. doi: 10.1109/SANER56733.2023.00085. (Visited on 12/13/2023). [33] Tiago Brito et al. “Wasmati: An Efficient Static Vulnerability Scanner for WebAssembly”. In: Computers and Security 118.C (July 1, 2022). issn: 0167-4048. doi: 10.1016/j.cose.2022.102745. (Visited on 03/10/2023). [34] Browser Market Share Worldwide. StatCounter Global Stats. url: https://gs.statcounter.com/browser-market-share (visited on 09/01/2022). [35] Javier Cabrera Arteaga. “Software Diversification for WebAssembly”. In: (2024). url: https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-342751 (visited on 04/21/2024). [36] Javier Cabrera Arteaga et al. “CROW: Code Diversification for WebAssembly”. In: Proceedings 2021 Workshop on Measurements, Attacks, and Defenses for the Web. Workshop on Measurements, Attacks, and Defenses for the Web. Virtual: Internet Society, 2021. isbn: 978-1-891562-67-9. doi: 10.14722/madweb.2021.23004. (Visited on 04/21/2024). [37] Javier Cabrera-Arteaga et al. “Wasm-Mutate: Fast and Effective Binary Diversification for WebAssembly”. In: Computers & Security 139 (Apr. 1, 2024), p. 103731. issn: 0167-4048. doi: 10.1016/j.cose.2024.103731. (Visited on 04/21/2024). [38] Joan Calvet, José M. Fernandez, and Jean-Yves Marion. “Aligot: Cryptographic Function Identification in Obfuscated Binary Programs”. In: Proceedings of the 2012 ACM Conference on Computer and Communications Security. CCS ’12. New York, NY, USA: ACM, 2012, pp. 169–182. isbn: 978-1-4503-1651-4. doi: 10.1145/2382196.2382217. [39] Davide Canali et al. “Prophiler: A Fast Filter for the Large-Scale Detection of Malicious Web Pages”. In: Proceedings of the 20th International Conference on World Wide Web - WWW ’11. The 20th International Conference. Hyderabad, India: ACM Press, 2011, p. 197. isbn: 978-1-4503-0632-4. doi: 10.1145/1963405.1963436. (Visited on 07/18/2021). [40] Can’t Build Incoming 64bits · Issue #4105. 2020. 186 [41] Shangtong Cao et al. “BREWasm: A General Static Binary Rewriting Framework for WebAssembly”. In: Static Analysis. Ed. by Manuel V. Hermenegildo and José F. Morales. Cham: Springer Nature Switzerland, 2023, pp. 139–163. isbn: 978-3-031-44245-2. doi: 10.1007/978-3-031-44245-2_8. [42] Shangtong Cao et al. “WRTester: Differential Testing of WebAssembly Runtimes via Semantic-aware Binary Generation”. Version 1. In: (2023). doi: 10.48550/ARXIV.2312.10456. (Visited on 03/21/2024). [43] Ying Cao et al. “Boosting Neural Networks to Decompile Optimized Binaries”. In: Proceedings of the 38th Annual Computer Security Applications Conference. ACSAC ’22. New York, NY, USA: Association for Computing Machinery, Dec. 5, 2022, pp. 508–518. isbn: 978-1-4503-9759-9. doi: 10.1145/3564625.3567998. (Visited on 06/21/2023). [44] Nicholas Carlini, Adrienne Porter Felt, and David Wagner. “An Evaluation of the Google Chrome Extension Security Architecture”. In: Proceedings of the 21st USENIX Conference on Security Symposium. Security’12. Berkeley, CA, USA: USENIX Association, 2012, pp. 7–7. url: http://dl.acm.org/citation.cfm?id=2362793.2362800. [45] Mariano Ceccato et al. “A Large Study on the Effect of Code Obfuscation on the Quality of Java Code”. In: Empirical Software Engineering 20.6 (Dec. 2015), pp. 1486–1524. issn: 1382-3256, 1573-7616. doi: 10.1007/s10664-014-9321-0. (Visited on 07/18/2021). [46] Junjie Chen et al. “A Survey of Compiler Testing”. In: ACM Computing Surveys 53.1 (Feb. 5, 2020), 4:1–4:36. issn: 0360-0300. doi: 10.1145/3363562. (Visited on 07/13/2022). [47] Weimin Chen et al. “WASAI: Uncovering Vulnerabilities in Wasm Smart Contracts”. In: Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. 2022, pp. 703–715. [48] Chromium. url: https://www.chromium.org/Home/ (visited on 03/14/2022). [49] Clang C Language Family Frontend for LLVM. url: https://clang.llvm.org/ (visited on 01/11/2022). [50] Clang.Cindex — Libclang 14.0.6 Documentation. url: https://libclang.readthedocs.io/en/latest/_modules/clang/cindex.html (visited on 09/01/2022). [51] Lin Clark. Standardizing WASI: A System Interface to Run WebAssembly Outside the Web – Mozilla Hacks - the Web Developer Blog. Mozilla Hacks – the Web developer blog. url: https://hacks.mozilla.org/2019/03/standardizing-wasi-a-webassembly-system-interface (visited on 09/12/2023). [52] Sam Clegg. Error on Invalid Archive Member. 2018. url: https://github.com/emscripten-core/emscripten/pull/6961. [53] Coinhive: A Crypto Miner for Your Website. 2019. url: https://coinhive.com. [54] CoinIMP 0% Fee JavaScript Mining, Browser Mining, Browser Miner. 2020. url: https://www.coinimp.com/. 187 [55] CoinMarketCap. 2019. url: https://coinmarketcap.com/. [56] Clang Community. Clang C Language Family Frontend for LLVM. url: https://clang.llvm.org/ (visited on 01/31/2024). [57] Compiler Internals. 2020. url: https://tinygo.org/compiler-internals/. [58] Julia Computing. JuliaHubOSS/Llvm-Cbe. JuliaHubOSS. url: https://github.com/JuliaHubOSS/llvm-cbe (visited on 04/10/2024). [59] Concurrency Model and the Event Loop - JavaScript | MDN. url: https://developer.mozilla.org/en-US/docs/Web/JavaScript/EventLoop (visited on 01/29/2021). [60] Jason Cong et al. “A Study on the Impact of Compiler Optimizations on High-Level Synthesis”. In: International Workshop on Languages and Compilers for Parallel Computing. Springer, 2012, pp. 143–157. [61] Brad Conte. Crypto-Algorithms/Sha256.c at Master · B-Con/Crypto-Algorithms. GitHub. url: https://github.com/B-Con/crypto-algorithms/blob/master/sha256.c (visited on 04/10/2024). [62] Contents - Asterius Documentation. 2020. url: https://asterius.netlify.app/. [63] Emscripten Contributors. About Emscripten. Emscripten 3.1.55-git (dev) documentation. 2024. url: https://emscripten.org/docs/introducing_emscripten/about_emscripten.html (visited on 04/22/2024). [64] Emscripten Contributors. Emscripten. 2020. url: https://github.com/emscripten-core/emscripten. [65] Emscripten Contributors. Main — Emscripten 3.1.55-Git (Dev) Documentation. url: https://emscripten.org/ (visited on 03/18/2024). [66] MDN contributors. Loading and Running WebAssembly Code - WebAssembly | MDN. Nov. 20, 2023. url: https://developer.mozilla.org/en-US/docs/WebAssembly/Loading_and_running (visited on 04/23/2024). [67] MDN contributors. Performance.Now() - Web APIs | MDN. url: https://developer.mozilla.org/en-US/docs/Web/API/Performance/now (visited on 09/02/2022). [68] MDN contributors. The Web and Web Standards - Learn Web Development | MDN. MDN Web Docs. Oct. 8, 2023. url: https://developer.mozilla.org/enUS/docs/Learn/Getting_started_with_the_web/The_web_and_web_standards (visited on 04/22/2024). [69] MDN contributors. Using the WebAssembly JavaScript API - WebAssembly | MDN. Nov. 20, 2023. url: https://developer.mozilla.org/en-US/docs/WebAssembly/Using_the_JavaScript_API (visited on 04/23/2024). [70] MDN contributors. WebAssembly - WebAssembly | MDN. Nov. 20, 2023. url: https://developer.mozilla.org/en-US/docs/WebAssembly/JavaScript_interface (visited on 04/23/2024). 188 [71] MDN contributors. WebAssembly Concepts - WebAssembly | MDN. Jan. 18, 2024. url: https://developer.mozilla.org/en-US/docs/WebAssembly/Concepts (visited on 04/21/2024). [72] Marco Cova, Christopher Kruegel, and Giovanni Vigna. “Detection and Analysis of Drive-by-Download Attacks and Malicious JavaScript Code”. In: Proceedings of the 19th International Conference on World Wide Web. WWW ’10. New York, NY, USA: Association for Computing Machinery, Apr. 26, 2010, pp. 281–290. isbn: 978-1-60558-799-8. doi: 10.1145/1772690.1772720. (Visited on 08/13/2021). [73] Crash Using Haskell ‘read‘ Function · Issue #60 · Tweag/Asterius. 2020. [74] CryptoMineDev. minerBlock. url: https://github.com/xd4rker/MinerBlock. [75] CryptoSlate. European Central Bank: Bitcoin Isn’t a Threat, Cryptocurrency Not a New Asset Class. 2019. url: https://cryptoslate.com/european-central-bank-bitcoin-isnt-a-threatcryptocurrency-not-a-new-asset-class/. [76] Charlie Curtsinger et al. “{ZOZZLE}: Fast and Precise {In-Browser} {JavaScript} Malware Detection”. In: 20th USENIX Security Symposium (USENIX Security 11). 2011. url: https://www.usenix.org/conference/usenix-security-11/zozzle-fast-and-precise-browserjavascript-malware-detection (visited on 08/27/2022). [77] Ann Yuan Daniel Smilkov Nikhil Thorat. Introducing the WebAssembly Backend for TensorFlow.Js. 2020. url: https://blog.tensorflow.org/2020/03/introducing-webassembly-backend-for-tensorflow-js.html. [78] Rajdeep Das et al. “Prutor: A System for Tutoring CS1 and Collecting Student Programs for Analysis”. 2016. arXiv: 1608.03828. [79] João De Macedo et al. “WebAssembly versus JavaScript: Energy and Runtime Performance”. In: 2022 International Conference on ICT for Sustainability (ICT4S). 2022 International Conference on ICT for Sustainability (ICT4S). June 2022, pp. 24–34. doi: 10.1109/ICT4S55073.2022.00014. (Visited on 04/21/2024). [80] Yury Delendik. DWARF Information Does Not Contain Absolute File Locations for Generics. 2018. url: https://github.com/rust-lang/rust/issues/54408. [81] Frank Denis. Libsodium.Js. url: https://github.com/jedisct1/libsodium.js (visited on 02/01/2023). [82] The Lethean developers. Lethean Cryptocurrency. 2019. url: https://lethean.io/. [83] Craig Disselkoen et al. “Position Paper: Progressive Memory Safety for WebAssembly”. In: Proceedings of the 8th International Workshop on Hardware and Architectural Support for Security and Privacy. HASP ’19. New York, NY, USA: Association for Computing Machinery, June 23, 2019, pp. 1–8. isbn: 978-1-4503-7226-8. doi: 10.1145/3337167.3337171. (Visited on 07/13/2022). [84] Morris Dworkin. Block Cipher Techniques. NIST, 2017. url: https://csrc.nist.gov/projects/block-cipher-techniques. 189 [85] Edwin Cheng. Bin-Crate Build Fail with New LLD Linker in Windows. 2018. url: https://github.com/rust-lang/rust/issues/48948. [86] Brendan Eich. From ASM.JS to WebAssembly. 2019. url: https://brendaneich.com/2015/06/from-asm-js-to-webassembly/. [87] Emscripten: An LLVM-to-Web Compiler. url: https://github.com/emscripten-core/emscripten. [88] Emterpreter — Emscripten 1.39.11 Documentation. url: https://emscripten.org/docs/porting/emterpreter.html (visited on 04/24/2021). [89] Michael D. Ernst. “Static and Dynamic Analysis: Synergy and Duality”. In: Workshop on Dynamic Analysis (WODA). 2003. [90] Shayan Eskandari et al. “A First Look at Browser-Based Cryptojacking”. In: CoRR abs/1803.02887 (2018). arXiv: 1803.02887. url: http://arxiv.org/abs/1803.02887. [91] Ethereum Core Devs to Move Forward with ASIC-Resistant PoW Algorithm. 2019. url: https://cointelegraph.com/news/ethereum-core-devs-to-move-forward-with-asic-resistant-powalgorithm. [92] Exclude Zero-Initialized Values from .Mem File · Issue #3907. 2020. [93] prefix=de useprefix=false family=Macedo given=João Gonçalves. “On the Performance of WebAssembly”. MA thesis. Apr. 1, 2022. url: https://repositorium.sdum.uminho.pt/handle/1822/86805 (visited on 04/21/2024). [94] prefix=found on useprefix=false family=WasmBoy given=Weird Importing Bug I. Weird Importing Bug I Found on WasmBoy. 2018. url: https://github.com/AssemblyScript/assemblyscript/issues/29. [95] Aurore Fass, Michael Backes, and Ben Stock. “HideNoSeek: Camouflaging Malicious JavaScript in Benign ASTs”. In: Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. CCS ’19: 2019 ACM SIGSAC Conference on Computer and Communications Security. London United Kingdom: ACM, Nov. 6, 2019, pp. 1899–1913. isbn: 978-1-4503-6747-9. doi: 10.1145/3319535.3345656. (Visited on 07/18/2021). [96] Aurore Fass, Michael Backes, and Ben Stock. “JStap: A Static Pre-Filter for Malicious JavaScript Detection”. In: Proceedings of the 35th Annual Computer Security Applications Conference. ACSAC ’19. New York, NY, USA: Association for Computing Machinery, Dec. 9, 2019, pp. 257–269. isbn: 978-1-4503-7628-0. doi: 10.1145/3359789.3359813. (Visited on 08/26/2022). [97] Aurore Fass et al. “JaSt: Fully Syntactic Detection of Malicious (Obfuscated) JavaScript”. In: Detection of Intrusions and Malware, and Vulnerability Assessment. Ed. by Cristiano Giuffrida, Sébastien Bardin, and Gregory Blanc. Lecture Notes in Computer Science. Cham: Springer International Publishing, 2018, pp. 303–325. isbn: 978-3-319-93411-2. doi: 10.1007/978-3-319-93411-2_14. 190 [98] Zhangyin Feng et al. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. Sept. 18, 2020. doi: 10.48550/arXiv.2002.08155. arXiv: 2002.08155 [cs]. (Visited on 06/22/2023). preprint. [99] Filesystem Accesses from Pthreads Are Much Slower than in the Main Thread. · Issue #3922. 2020. [100] “FIPS 180-4, Secure Hash Standard (SHS)”. In: (). [101] Michael J. Flynn. “Some Computer Organizations and Their Effectiveness”. In: IEEE Transactions on Computers C-21.9 (Sept. 1972), pp. 948–960. issn: 0018-9340. doi: 10.1109/TC.1972.5009071. (Visited on 04/22/2024). [102] OpenJS Foundation. Node.Js. Node.js. url: https://nodejs.org/en (visited on 09/25/2023). [103] The Rust Foundation. The Rust Programming Language. 2021. url: https://github.com/rust-lang/rust. [104] William Fu, Raymond Lin, and Daniel Inge. “TaintAssembly: Taint-Based Information Flow Control Tracking for WebAssembly”. In: ArXiv (Feb. 4, 2018). url: https://www.semanticscholar.org/paper/TaintAssembly%3A-Taint-Based-Information-Flow-ControlFu-Lin/18b24afcca112e6a2f548d424d5a09c1660c1a82 (visited on 09/11/2023). [105] Katelyn Gadd. Kg/Ilwasm. url: https://github.com/kg/ilwasm (visited on 01/30/2021). [106] Hugo Gascon et al. “Structural Detection of Android Malware Using Embedded Call Graphs”. In: Proceedings of the 2013 ACM Workshop on Artificial Intelligence and Security. AISec ’13. New York, NY, USA: ACM, 2013, pp. 45–54. isbn: 978-1-4503-2488-5. doi: 10.1145/2517312.2517315. [107] GitHub - Javascript-Obfuscator/Javascript-Obfuscator: A Powerful Obfuscator for JavaScript and Node.Js. url: https://github.com/javascript-obfuscator/javascript-obfuscator (visited on 11/17/2021). [108] GraDKh. WebAssembly: Wrong Conversion from Double to Int64 When OUTLINING_LIMIT Isn’t 0. 2018. url: https://github.com/emscripten-core/emscripten/issues/6352. [109] Grame-Cncm/Faust. GRAME. url: https://github.com/grame-cncm/faust (visited on 01/30/2021). [110] sourced.tech Group. 2020. url: https://github.com/src-d/datasets/tree/master/PublicGitArchive/pga. [111] WebAssembly Community Group. Change History — WebAssembly 2.0 (Draft 2024-04-12). url: https://webassembly.github.io/spec/core/appendix/changes.html (visited on 04/23/2024). [112] WebAssembly Community Group. Index of Instructions — WebAssembly 2.0 (Draft 2024-04-12). 2022. url: https://webassembly.github.io/spec/core/appendix/index-instructions.html (visited on 04/21/2024). [113] WebAssembly Community Group. Index of Types — WebAssembly 2.0 (Draft 2024-04-12). 2022. url: https://webassembly.github.io/spec/core/appendix/index-types.html (visited on 04/21/2024). 191 [114] WebAssembly Community Group. Modules — WebAssembly 2.0 (Draft 2024-04-12). url: https://webassembly.github.io/spec/core/syntax/modules.html#syntax-module (visited on 04/23/2024). [115] WebAssembly Community Group. Roadmap. 2019. url: https://webassembly.org/roadmap/. [116] WebAssembly Community Group. Soundness. WebAssembly 2.0 (Draft 2024-04-12). 2022. url: https://webassembly.github.io/spec/core/appendix/properties.html (visited on 04/21/2024). [117] WebAssembly Community Group. Webassembly/Wabt. 2019. url: https://github.com/WebAssembly/wabt. [118] Felix Gröbert, Carsten Willems, and Thorsten Holz. “Automated Identification of Cryptographic Primitives in Binary Programs”. In: Recent Advances in Intrusion Detection. Ed. by Robin Sommer, Davide Balzarotti, and Gregor Maier. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011, pp. 41–60. isbn: 978-3-642-23644-0. [119] Shu-yu Guo, Michael Ficarra, and Kevin Gibbons. ECMAScript® 2025 Language Specification. url: https://tc39.es/ecma262/#sec-overview (visited on 04/22/2024). [120] Rahul Gupta et al. “DeepFix: Fixing Common C Language Errors by Deep Learning”. In: AAAI. Ed. by Satinder P. Singh and Shaul Markovitch. AAAI Press, 2017, pp. 1345–1351. url: http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14603. [121] Aleksander Guryanov. Asyncify Behaving Differently Then Emterpreter · Issue #9823. url: https://github.com/emscripten-core/emscripten/issues/9823 (visited on 01/30/2021). [122] Gwsystems/aWsm. The Embedded and Operating Systems group at GWU. url: https://github.com/gwsystems/aWsm (visited on 04/23/2024). [123] Andreas Haas et al. “Bringing the Web up to Speed with WebAssembly”. In: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation. PLDI 2017. New York, NY, USA: Association for Computing Machinery, June 14, 2017, pp. 185–200. isbn: 978-1-4503-4988-8. doi: 10.1145/3062341.3062363. (Visited on 09/27/2023). [124] Maurice H. Halstead. Elements of Software Science (Operating and Programming Systems Series). USA: Elsevier Science Inc., Apr. 1977. 128 pp. isbn: 978-0-444-00205-1. [125] Mahmoud Hammad, Joshua Garcia, and Sam Malek. “A Large-Scale Empirical Study on the Effects of Code Obfuscations on Android Apps and Anti-Malware Products”. In: Proceedings of the 40th International Conference on Software Engineering. ICSE ’18: 40th International Conference on Software Engineering. Gothenburg Sweden: ACM, May 27, 2018, pp. 421–431. isbn: 978-1-4503-5638-1. doi: 10.1145/3180155.3180228. (Visited on 07/18/2021). [126] Håkon Harnes and Donn Morrison. Cryptic Bytes: WebAssembly Obfuscation for Evading Cryptojacking Detection. Mar. 22, 2024. doi: 10.48550/arXiv.2403.15197. arXiv: 2403.15197 [cs]. (Visited on 04/21/2024). preprint. 192 [127] Keno Haßler and Dominik Maier. “Wafl: Binary-only Webassembly Fuzzing with Fast Snapshots”. In: Reversing and Offensive-Oriented Trends Symposium. 2021. [128] Satia Herfert, Jibesh Patra, and Michael Pradel. “Automatically Reducing Tree-Structured Test Inputs”. In: Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. ASE 2017. Urbana-Champaign, IL, USA: IEEE Press, 2017, pp. 861–871. isbn: 978-1-5386-2684-9. [129] David Herman, Luke Wagner, and Alon Zakai. Asm.Js. url: https://asmjs.org/spec/latest/ (visited on 04/22/2024). [130] Ariya Hidayat. ECMAScript Parsing Infrastructure for Multipurpose Analysis. url: http://esprima.org/. [131] Kagami Hiiragi. USE_PTHREADS Freezes Firefox Nightly 42.0a1. 2015. url: https://github.com/emscripten-core/emscripten/issues/3636. [132] Aaron Hilbig, Daniel Lehmann, and Michael Pradel. “An Empirical Study of Real-World WebAssembly Binaries: Security, Languages, Use Cases”. In: Proceedings of the Web Conference 2021. WWW ’21. New York, NY, USA: Association for Computing Machinery, Apr. 19, 2021, pp. 2696–2708. isbn: 978-1-4503-8312-7. doi: 10.1145/3442381.3450138. (Visited on 03/16/2022). [133] Sepp Hochreiter and Jürgen Schmidhuber. “Long Short-Term Memory”. In: Neural Computation 9.8 (Nov. 15, 1997), pp. 1735–1780. issn: 0899-7667. doi: 10.1162/neco.1997.9.8.1735. (Visited on 06/22/2023). [134] Manuel Hohenauer et al. “A SIMD Optimization Framework for Retargetable Compilers”. In: ACM Transactions on Architecture and Code Optimization (TACO) 6.1 (2009), pp. 1–27. [135] Geng Hong et al. “How You Get Shot in the Back: A Systematical Study about Cryptojacking in the Real World”. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. CCS ’18. New York, NY, USA: Association for Computing Machinery, Oct. 15, 2018, pp. 1701–1713. isbn: 978-1-4503-5693-0. doi: 10.1145/3243734.3243840. (Visited on 08/26/2022). [136] Iman Hosseini and Brendan Dolan-Gavitt. “Beyond the C: Retargetable Decompilation Using Neural Machine Translation”. In: Proceedings 2022 Workshop on Binary Analysis Research. 2022. doi: 10.14722/bar.2022.23009. arXiv: 2212.08950 [cs]. (Visited on 06/21/2023). [137] Hanxian Huang and Jishen Zhao. Multi-Modal Learning for WebAssembly Reverse Engineering. Apr. 3, 2024. doi: 10.1145/3650212.3652141. arXiv: 2404.03171 [cs]. (Visited on 04/21/2024). preprint. [138] Xin Huang et al. “Synthesizing Qualitative Research in Software Engineering: A Critical Review”. In: Proceedings of the 40th International Conference on Software Engineering. ICSE ’18. New York, NY, USA: Association for Computing Machinery, 2018, pp. 1207–1218. isbn: 978-1-4503-5638-1. doi: 10.1145/3180155.3180235. 193 [139] Matthias Höschele and Andreas Zeller. “Mining Input Grammars from Dynamic Taints”. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. ASE 2016. New York, NY, USA: Association for Computing Machinery, 2016, pp. 720–725. isbn: 978-1-4503-3845-5. doi: 10.1145/2970276.2970321. [140] GitHub Inc. GitHub API V3. 2019. url: https://developer.github.com/v3/. [141] Inline Functions, C++ FAQ. url: https://isocpp.org/wiki/faq/inline-functions#inline-and-perf (visited on 08/15/2022). [142] Intel C/C++ Compilers Complete Adoption of LLVM. Intel. url: https://www.intel.com/content/www/us/en/developer/articles/technical/adoption-of-llvmcomplete-icx.html (visited on 08/19/2022). [143] Michael Irwin. Introducing the Docker+Wasm Technical Preview | Docker. url: https://www.docker.com/blog/docker-wasm-technical-preview/ (visited on 02/01/2024). [144] Abhinav Jangda et al. “Not so Fast: Analyzing the Performance of WebAssembly vs. Native Code”. In: 2019 USENIX Annual Technical Conference, USENIX ATC 2019, Renton, WA, USA, July 10-12, 2019. 2019, pp. 107–120. url: https://www.usenix.org/conference/atc19/presentation/jangda. [145] JavaScript With Syntax For Types. url: https://www.typescriptlang.org/ (visited on 09/25/2023). [146] jcbeyler. Jcbeyler/Wasm-to-Llvm-Prototype. url: https://github.com/jcbeyler/wasm-to-llvm-prototype (visited on 04/10/2024). [147] Shuyao Jiang et al. “Revealing Performance Issues in Server-Side WebAssembly Runtimes via Differential Testing”. In: Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering. 2023. [148] Matthieu Jimenez et al. “The Importance of Accounting for Real-World Labelling When Predicting Software Vulnerabilities”. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ESEC/FSE 2019. New York, NY, USA: ACM, 2019, pp. 695–705. isbn: 978-1-4503-5572-8. doi: 10.1145/3338906.3338941. [149] Guoliang Jin et al. “Understanding and Detecting Real-World Performance Bugs”. In: Sigplan Notices - SIGPLAN 47 (Aug. 2012). doi: 10.1145/2345156.2254075. [150] Evan Johnson et al. “, : SFI Safety for Native-Compiled Wasm”. In: Proceedings 2021 Network and Distributed System Security Symposium. Network and Distributed System Security Symposium. Virtual: Internet Society, 2021. isbn: 978-1-891562-66-2. doi: 10.14722/ndss.2021.24078. (Visited on 07/13/2022). [151] Jquery/Esprima. jQuery. url: https://github.com/jquery/esprima (visited on 09/11/2023). [152] JSObfu. Rapid7. url: https://github.com/rapid7/jsobfu (visited on 11/17/2021). 194 [153] Scott Kaplan et al. "NOFUS: Automatically Detecting" + String.fromCharCode(32) + "ObFuSCateD ".toLowerCase() + "JavaScript Code". MSR-TR-2011-57. May 2011. url: https://www.microsoft.com/en-us/research/publication/nofus-automatically-detecting-stringfromcharcode32-obfuscated-tolowercase-javascript-code/. [154] Elliott Karpilovsky. Code Health: Reduce Nesting, Reduce Complexity. Google Testing Blog. url: https://testing.googleblog.com/2017/06/code-health-reduce-nesting-reduce.html (visited on 10/12/2023). [155] Yuriy Kashnikov, Jean Christophe Beyler, and William Jalby. “Compiler Optimizations: Machine Learning versus O3”. In: International Workshop on Languages and Compilers for Parallel Computing. Springer, 2012, pp. 32–45. [156] Vineeth Kashyap et al. “JSAI: A Static Analysis Platform for JavaScript”. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. FSE 2014. New York, NY, USA: Association for Computing Machinery, Nov. 11, 2014, pp. 121–132. isbn: 978-1-4503-3056-5. doi: 10.1145/2635868.2635904. (Visited on 07/28/2023). [157] Deborah S. Katz, Jason Ruchti, and Eric Schulte. “Using Recurrent Neural Networks for Decompilation”. In: 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). Mar. 2018, pp. 346–356. doi: 10.1109/SANER.2018.8330222. [158] Omer Katz et al. Towards Neural Decompilation. May 20, 2019. doi: 10.48550/arXiv.1905.08325. arXiv: 1905.08325 [cs]. (Visited on 06/21/2023). preprint. [159] Dharma KC and Clayton T. Morrison. Neural Machine Translation for Code Generation. May 22, 2023. doi: 10.48550/arXiv.2305.13504. arXiv: 2305.13504 [cs]. (Visited on 06/21/2023). preprint. [160] Keeping Components Pure – React. url: https://react.dev/learn/keeping-components-pure (visited on 04/19/2024). [161] Keraf. No Coin - Block Miners on the Web! url: https://github.com/keraf/NoCoin. [162] Amin Kharraz et al. “Outguard: Detecting In-Browser Covert Cryptocurrency Mining in the Wild”. In: The World Wide Web Conference. WWW ’19. New York, NY, USA: Association for Computing Machinery, May 13, 2019, pp. 840–852. isbn: 978-1-4503-6674-8. doi: 10.1145/3308558.3313665. (Visited on 08/26/2022). [163] kichikuou. Cannot Access Packaged Files after Memory Growth · Issue #5179. url: https://github.com/emscripten-core/emscripten/issues/5179 (visited on 01/30/2021). [164] kichikuou. Cannot Access Packaged Files after Memory Growth · Issue #5179 · Emscripten-Core/Emscripten. GitHub. url: https://github.com/emscripten-core/emscripten/issues/5179 (visited on 04/19/2024). 195 [165] Kyungtae Kim et al. “J-Force: Forced Execution on JavaScript”. In: Proceedings of the 26th International Conference on World Wide Web. WWW ’17. Republic and Canton of Geneva, CHE: International World Wide Web Conferences Steering Committee, Apr. 3, 2017, pp. 897–906. isbn: 978-1-4503-4913-0. doi: 10.1145/3038912.3052674. (Visited on 08/13/2021). [166] Minseo Kim, Hyerean Jang, and Youngjoo Shin. “Avengers, Assemble! Survey of WebAssembly Security Solutions”. In: 2022 IEEE 15th International Conference on Cloud Computing (CLOUD). 2022 IEEE 15th International Conference on Cloud Computing (CLOUD). July 2022, pp. 543–553. doi: 10.1109/CLOUD55607.2022.00077. (Visited on 04/21/2024). [167] Diederik P. Kingma and Jimmy Ba. “Adam: A Method for Stochastic Optimization”. In: CoRR (Dec. 22, 2014). url: https://www.semanticscholar.org/paper/Adam%3A-A-Method-for-StochasticOptimization-Kingma-Ba/a6cb366736791bcccc5c8639de5a8f9636bf87e8 (visited on 04/21/2024). [168] Yoonseok Ko et al. “Practically Tunable Static Analysis Framework for Large-Scale JavaScript Applications (T)”. In: 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). Nov. 2015, pp. 541–551. doi: 10.1109/ASE.2015.28. [169] Clemens Kolbitsch et al. “Rozzle: De-cloaking Internet Malware”. In: 2012 IEEE Symposium on Security and Privacy. 2012 IEEE Symposium on Security and Privacy. May 2012, pp. 443–457. doi: 10.1109/SP.2012.48. [170] Radhesh Krishnan Konoth et al. “MineSweeper: An In-depth Look into Drive-by Cryptocurrency Mining and Its Defense”. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. CCS ’18. New York, NY, USA: Association for Computing Machinery, Oct. 15, 2018, pp. 1714–1730. isbn: 978-1-4503-5693-0. doi: 10.1145/3243734.3243858. (Visited on 08/26/2022). [171] Wasm3 Labs. Wasm3/Wasm3. Wasm3 Labs. url: https://github.com/wasm3/wasm3 (visited on 04/23/2024). [172] Wasmi Labs. Wasmi-Labs/Wasmi. Wasmi Labs. url: https://github.com/wasmi-labs/wasmi (visited on 04/23/2024). [173] Vu Le, Mehrdad Afshari, and Zhendong Su. “Compiler Validation via Equivalence modulo Inputs”. In: ACM SIGPLAN Notices 49.6 (June 9, 2014), pp. 216–226. issn: 0362-1340. doi: 10.1145/2666356.2594334. (Visited on 07/13/2022). [174] Vu Le, Chengnian Sun, and Zhendong Su. “Finding Deep Compiler Bugs via Guided Stochastic Program Mutation”. In: ACM SIGPLAN Notices 50.10 (Oct. 23, 2015), pp. 386–399. issn: 0362-1340. doi: 10.1145/2858965.2814319. (Visited on 07/13/2022). [175] Sungho Lee, Julian Dolby, and Sukyoung Ryu. “HybriDroid: Static Analysis Framework for Android Hybrid Applications”. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. ASE’16: ACM/IEEE International Conference on Automated Software Engineering. Singapore Singapore: ACM, Aug. 25, 2016, pp. 250–261. isbn: 978-1-4503-3845-5. doi: 10.1145/2970276.2970368. (Visited on 07/27/2023). 196 [176] Daniel Lehmann. “Program Analysis of WebAssembly Binaries”. Universität Stuttgart, 2022. [177] Daniel Lehmann, Johannes Kinder, and Michael Pradel. “Everything Old Is New Again: Binary Security of WebAssembly”. In: 29th USENIX Security Symposium, USENIX Security 2020, August 12-14, 2020. Ed. by Srdjan Capkun and Franziska Roesner. USENIX Association, 2020, pp. 217–234. url: https://www.usenix.org/conference/usenixsecurity20/presentation/lehmann. [178] Daniel Lehmann and Michael Pradel. “Finding the Dwarf: Recovering Pecise Types from WebAssembly Binaries”. In: Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation. PLDI 2022. New York, NY, USA: Association for Computing Machinery, June 9, 2022, pp. 410–425. isbn: 978-1-4503-9265-5. doi: 10.1145/3519939.3523449. (Visited on 07/13/2022). [179] Daniel Lehmann and Michael Pradel. “Wasabi: A Framework for Dynamically Analyzing WebAssembly”. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. ASPLOS ’19. New York, NY, USA: Association for Computing Machinery, Apr. 4, 2019, pp. 1045–1058. isbn: 978-1-4503-6240-5. doi: 10.1145/3297858.3304068. (Visited on 10/05/2021). [180] Daniel Lehmann, Martin Toldam Torp, and Michael Pradel. Fuzzm: Finding Memory Bugs through Binary-Only Instrumentation and Fuzzing of WebAssembly. Oct. 28, 2021. doi: 10.48550/arXiv.2110.15433. arXiv: 2110.15433 [cs]. (Visited on 07/13/2022). preprint. [181] Daniel Lehmann et al. “That’s a Tough Call: Studying the Challenges of Call Graph Construction for WebAssembly”. In: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. ISSTA ’23: 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. Seattle WA USA: ACM, July 12, 2023, pp. 892–903. isbn: 9798400702211. doi: 10.1145/3597926.3598104. (Visited on 09/10/2023). [182] Borui Li, Wei Dong, and Yi Gao. “Wiprog: A Webassembly-Based Approach to Integrated Iot Programming”. In: IEEE INFOCOM 2021-IEEE Conference on Computer Communications. IEEE, 2021, pp. 1–10. [183] Borui Li et al. “Bringing Webassembly to Resource-Constrained Iot Devices for Seamless Device-Cloud Integration”. In: Proceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services. 2022, pp. 261–272. [184] Zhenmin Li et al. “Have Things Changed Now?: An Empirical Study of Bug Characteristics in Modern Open Source Software”. In: ASID’06: 1st Workshop on Architectural and System Support for Improving Software Dependability. Jan. 2006, pp. 25–33. doi: 10.1145/1181309.1181314. [185] Ruigang Liang et al. Semantics-Recovering Decompilation through Neural Machine Translation. Dec. 22, 2021. doi: 10.48550/arXiv.2112.15491. arXiv: 2112.15491 [cs]. (Visited on 06/21/2023). preprint. [186] Libsodium. npm. Mar. 22, 2022. url: https://www.npmjs.com/package/libsodium (visited on 02/01/2023). 197 [187] Mario Linares-Vásquez et al. “On Using Machine Learning to Automatically Classify Software Applications into Domain Categories”. In: Empirical Software Engineering 19.3 (June 1, 2014), pp. 582–618. issn: 1573-7616. doi: 10.1007/s10664-012-9230-z. [188] Zhibo Liu et al. “Exploring Missed Optimizations in WebAssembly Optimizers”. In: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. 2023, pp. 436–448. [189] Llvm-Mirror/Clang. Unofficial Automated Mirror of LLVM (legacy). url: https://github.com/llvmmirror/clang/blob/aa231e4be75ac4759c236b755c57876f76e3cf05/bindings/python/clang/cindex.py (visited on 09/01/2022). [190] Llvm-Test-Suite/Matrix.c at Main · Llvm/Llvm-Test-Suite. GitHub. url: https://github.com/llvm/llvm-test-suite (visited on 03/14/2022). [191] Llvm/Llvm-Test-Suite/SingleSource/Benchmarks/Misc-C++/Bigfib.Cpp. LLVM. url: https://github.com/llvm/llvm-testsuite/blob/6b4b74f6a2a1726eed9efbbcc8fd336f83981eb1/SingleSource/Benchmarks/MiscC%2B%2B/bigfib.cpp (visited on 05/03/2022). [192] Llvm/Llvm-Test-Suite/SingleSource/Benchmarks/Misc-C++/Huffbench.c. LLVM. url: https://github.com/llvm/llvm-test-suite/blob/6b4b74f6a2a1726eed9efbbcc8fd336f83981eb1/ SingleSource/Benchmarks/CoyoteBench/huffbench.c (visited on 05/03/2022). [193] Llvm/Llvm-Test-Suite/SingleSource/Benchmarks/Polybench/LinearAlgebra/Kernels/Cholesky/Cholesky.c. LLVM. url: https://github.com/llvm/llvm-test-suite/blob/6b4b74f6a2a1726eed9efbbcc8fd336f83981eb1/ SingleSource/Benchmarks/Polybench/linear-algebra/kernels/cholesky/cholesky.c (visited on 05/03/2022). [194] LLVM’s Analysis and Transform Passes — LLVM 13 Documentation. url: https://releases.llvm.org/13.0.0/docs/Passes.html#argpromotion-promote-by-referencearguments-to-scalars (visited on 01/11/2022). [195] Loading WebAssembly Modules Efficiently. Google Developers. url: https://developers.google.com/web/updates/2018/04/loading-wasm (visited on 08/14/2021). [196] Long Lu et al. “BLADE: An Attack-Agnostic Approach for Preventing Drive-by Malware Infections”. In: Proceedings of the 17th ACM Conference on Computer and Communications Security, CCS 2010, Chicago, Illinois, USA, October 4-8, 2010. Ed. by Ehab Al-Shaer, Angelos D. Keromytis, and Vitaly Shmatikov. ACM, 2010, pp. 440–450. doi: 10.1145/1866307.1866356. [197] Shan Lu et al. “Bugbench: Benchmarks for Evaluating Bug Detection Tools”. In: Workshop on the Evaluation of Software Defect Detection Tools. 2005. [198] Qi Luo. “Automatic Performance Testing Using Input-Sensitive Profiling”. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. FSE 2016. New York, NY, USA: Association for Computing Machinery, 2016, pp. 1139–1141. isbn: 978-1-4503-4218-6. doi: 10.1145/2950290.2983975. 198 [199] Alwin Maier et al. “TypeMiner: Recovering Types in Binary Programs Using Machine Learning”. In: Detection of Intrusions and Malware, and Vulnerability Assessment. Ed. by Roberto Perdisci et al. Lecture Notes in Computer Science. Cham: Springer International Publishing, 2019, pp. 288–308. isbn: 978-3-030-22038-9. doi: 10.1007/978-3-030-22038-9_14. [200] Main — Emscripten 3.1.1-Git (Dev) Documentation. url: https://emscripten.org/ (visited on 01/11/2022). [201] makc. Abort("alignment Fault") at jsStackTrace. 2016. url: https://github.com/emscripten-core/emscripten/issues/4760. [202] Malicious Javascript Dataset. GeeksOnSecurity, Aug. 2021. url: https://github.com/geeksonsecurity/js-malicious-dataset (visited on 08/10/2021). [203] Malwarebytes. Cryptojacking. url: https://www.malwarebytes.com/cryptojacking/report. [204] Rahim Mammadli et al. “Learning to Make Compiler Optimizations More Effective”. In: Proceedings of the 5th ACM SIGPLAN International Symposium on Machine Programming. MAPS 2021. New York, NY, USA: Association for Computing Machinery, June 21, 2021, pp. 9–20. isbn: 978-1-4503-8467-4. doi: 10.1145/3460945.3464952. (Visited on 07/13/2022). [205] Vadim Markovtsev and Waren Long. “Public Git Archive: A Big Code Dataset for All”. In: Proceedings of the 15th International Conference on Mining Software Repositories. MSR ’18. New York, NY, USA: Association for Computing Machinery, May 28, 2018, pp. 34–37. isbn: 978-1-4503-5716-6. doi: 10.1145/3196398.3196464. (Visited on 02/10/2023). [206] Filipe Marques et al. “Concolic Execution for WebAssembly”. In: (2022). [207] Jan Kasper Martinsen and Håkan Grahn. “A Methodology for Evaluating JavaScript Execution Behavior in Interactive Web Applications”. In: 2011 9th IEEE/ACS International Conference on Computer Systems and Applications (AICCSA). 2011 9th IEEE/ACS International Conference on Computer Systems and Applications (AICCSA). Dec. 2011, pp. 241–248. doi: 10.1109/AICCSA.2011.6126611. [208] Jan Kasper Martinsen, Håkan Grahn, and Anders Isberg. “A Comparative Evaluation of JavaScript Execution Behavior”. In: Web Engineering. Ed. by Sören Auer, Oscar Díaz, and George A. Papadopoulos. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer, 2011, pp. 399–402. isbn: 978-3-642-22233-7. doi: 10.1007/978-3-642-22233-7_35. [209] Luca Massarelli et al. “Investigating Graph Embedding Neural Networks with Unsupervised Features Extraction for Binary Analysis”. In: Jan. 2019. doi: 10.14722/bar.2019.23020. [210] Radu Matei. Spin 1.0 — The Developer Tool for Serverless WebAssembly. url: https://www.fermyon.com/blog/introducing-spin-v1 (visited on 04/21/2024). [211] T.J. McCabe. “A Complexity Measure”. In: IEEE Transactions on Software Engineering SE-2.4 (Dec. 1976), pp. 308–320. issn: 1939-3520. doi: 10.1109/TSE.1976.233837. 199 [212] Judy McConnell. WebAssembly Support Now Shipping in All Major Browsers. 2019. url: https://blog.mozilla.org/blog/2017/11/13/webassembly-in-browsers/. [213] Steve McConnell. Code Complete. 1st edition. Redmond, Wash: Microsoft Press, Jan. 1, 1993. 857 pp. isbn: 978-1-55615-484-3. [214] Benedikt Meurer. “An Overview of the TurboFan Compiler”. url: https://docs.google.com/presentation/d/1H1lLsbclvzyOF3IUR05ZUaZcqDxo7_-8f4yJoxdMooU (visited on 08/30/2022). [215] Microsoft. Microsoft Digital Defense Report. 2020. [216] MinerRay Source Code | Miner-Ray.Github.Io. 2020. url: https://miner-ray.github.io/. [217] Mudit Mishra. What Is Pure Function in JavaScript? Scaler Topics. url: https://www.scaler.com/topics/pure-function-in-javascript/ (visited on 04/19/2024). [218] Mozilla. Download the Fastest Firefox Ever. Mozilla. url: https://www.mozilla.org/en-US/firefox/new/ (visited on 03/14/2022). [219] Paul Muntean et al. “CastSan: Efficient Detection of Polymorphic C++ Object Type Confusions with LLVM: 23rd European Symposium on Research in Computer Security, ESORICS 2018, Barcelona, Spain, September 3-7, 2018, Proceedings, Part I”. In: Aug. 8, 2018, pp. 3–25. isbn: 978-3-319-99072-9. doi: 10.1007/978-3-319-99073-6_1. [220] Troy Mursch. CRYPTOJACKING MALWARE COINHIVE FOUND ON 30,000+ WEBSITES. 2017. url: https://badpackets.net/cryptojacking-malware-coinhive-found-on-30000-websites/. [221] Marius Musch et al. “New Kid on the Web: A Study on the Prevalence of WebAssembly in the Wild”. In: Detection of Intrusions and Malware, and Vulnerability Assessment. Ed. by Roberto Perdisci et al. Vol. 11543. Cham: Springer International Publishing, 2019, pp. 23–42. isbn: 978-3-030-22037-2 978-3-030-22038-9. doi: 10.1007/978-3-030-22038-9_2. (Visited on 03/10/2023). [222] Marius Musch et al. “Thieves in the Browser: Web-based Cryptojacking in the Wild”. In: Proceedings of the 14th International Conference on Availability, Reliability and Security. ARES ’19. New York, NY, USA: ACM, 2019, 4:1–4:10. isbn: 978-1-4503-7164-3. doi: 10.1145/3339252.3339261. [223] Anders Møller and Michael I Schwartzbach. “Static Program Analysis”. University, Denmark. [224] Bastian Müller. W2c2. url: https://github.com/turbolent/w2c2 (visited on 09/12/2023). [225] Satoshi Nakamoto. “Bitcoin: A Peer-to-Peer Electronic Cash System”. In: Cryptography Mailing list at https://metzdowd.com (Mar. 2009). [226] Shravan Narayan et al. “Swivel: Hardening {WebAssembly} against Spectre”. In: 30th USENIX Security Symposium (USENIX Security 21). 2021, pp. 1433–1450. isbn: 978-1-939133-24-3. url: https://www.usenix.org/conference/usenixsecurity21/presentation/narayan (visited on 07/13/2022). 200 [227] Faraz Naseem et al. “MINOS: A Lightweight Real-Time Cryptojacking Detection System”. In: Proceedings 2021 Network and Distributed System Security Symposium. Network and Distributed System Security Symposium. Virtual: Internet Society, 2021. isbn: 978-1-891562-66-2. doi: 10.14722/ndss.2021.24444. (Visited on 04/21/2024). [228] Samuel Ndichu et al. “A Machine Learning Approach to Detection of JavaScript-based Attacks Using AST Features and Paragraph Vectors”. In: Applied Soft Computing 84 (Nov. 2019), p. 105721. issn: 15684946. doi: 10.1016/j.asoc.2019.105721. (Visited on 07/18/2021). [229] Nicholas Nethercote. “Dynamic Binary Analysis and Instrumentation”. University of Cambridge. url: https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-606.pdf. [230] Hung Viet Nguyen, Christian Kästner, and Tien N Nguyen. “Building Call Graphs for Embedded Client-Side Code in Dynamic Web Applications”. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 2014, pp. 518–529. [231] Vu Nguyen et al. “A SLOC Counting Standard”. In: 2007. url: https://www.semanticscholar.org/paper/A-SLOC-Counting-Standard-Nguyen-DeedsRubin/b05ca2c80aeca742ec61d20b0a9efe79f335ccd7 (visited on 06/23/2023). [232] Erik Nijkamp et al. CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. Feb. 27, 2023. doi: 10.48550/arXiv.2203.13474. arXiv: 2203.13474 [cs]. (Visited on 06/22/2023). preprint. [233] Patrick Nohe. Cryptojacking up 8500% in Q4 2017. 2018. url: https://www.thesslstore.com/blog/cryptojacking-8500-q4-2017-symantec/. [234] D. Di Nucci et al. “Detecting Code Smells Using Machine Learning Techniques: Are We There Yet?” In: 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). Los Alamitos, CA, USA: IEEE Computer Society, Mar. 2018, pp. 612–621. doi: 10.1109/SANER.2018.8330266. [235] F. Ocariza et al. “An Empirical Study of Client-Side JavaScript Bugs”. In: ACM / IEEE International Symposium on Empirical Software Engineering and Measurement. 2013, pp. 55–64. doi: 10.1109/ESEM.2013.18. [236] OpenAI. Introducing ChatGPT. url: https://openai.com/blog/chatgpt (visited on 06/22/2023). [237] Optimizing Code — Emscripten 3.1.6-Git (Dev) Documentation. url: https://emscripten.org/docs/optimizing/Optimizing-Code.html (visited on 03/15/2022). [238] Overview of the Compiler - Rust Compiler Development Guide. url: https://rustc-dev-guide.rust-lang.org/overview.html#mir-lowering (visited on 04/22/2024). [239] Panagiotis Papadopoulos, Panagiotis Ilia, and Evangelos P Markatos. “Truth in Web Mining: Measuring the Profitability and Cost of Cryptominers as a Web Monetization Model”. 2018. arXiv: 1806.01994. 201 [240] Anupam Pattanayak. “Revisiting Dedicated and Block Cipher Based Hash Functions”. In: IACR Cryptology ePrint Archive 2012 (2012), p. 322. [241] Hammond Pearce et al. Pop Quiz! Can a Large Language Model Help With Reverse Engineering? Feb. 2, 2022. doi: 10.48550/arXiv.2202.01142. arXiv: 2202.01142 [cs]. (Visited on 06/21/2023). preprint. [242] Hynek Petrak. Javascript Malware Collection. url: https://github.com/HynekPetrak/javascript-malware-collection (visited on 08/10/2021). [243] Phu H. Phung, David Sands, and Andrey Chudnov. “Lightweight Self-Protecting JavaScript”. In: Proceedings of the 4th International Symposium on Information, Computer, and Communications Security. ASIACCS ’09. New York, NY, USA: Association for Computing Machinery, Mar. 10, 2009, pp. 47–60. isbn: 978-1-60558-394-5. doi: 10.1145/1533057.1533067. (Visited on 02/02/2023). [244] Guillaume Plique. Graphology, a Robust and Multipurpose Graph Object for JavaScript. Zenodo, Aug. 1, 2023. doi: 10.5281/zenodo.8204237. (Visited on 09/25/2023). [245] Daryl Posnett, Abram Hindle, and Premkumar Devanbu. “A Simpler Model of Software Readability”. In: Proceedings of the 8th Working Conference on Mining Software Repositories. ICSE11: International Conference on Software Engineering. Waikiki, Honolulu HI USA: ACM, May 21, 2011, pp. 73–82. isbn: 978-1-4503-0574-7. doi: 10.1145/1985441.1985454. (Visited on 05/24/2023). [246] Louis-Noël Pouchet and Tomofumi Yuki. PolyBench/C. url: https://web.cse.ohio-state.edu/~pouchet.2/software/polybench/ (visited on 02/02/2023). [247] BitcoinWiki project. CryptoNight-Heavy. 2018. url: https://en.bitcoinwiki.org/wiki/CryptoNight-Heavy. [248] Sumokoin Project. Sumokoin - Digital Cash for Highly-Confidential Transactions. 2019. url: https://www.sumokoin.org/. [249] The Monero Project. Monero: The Secure, Private, Untraceable Cryptocurrency. 2019. url: https://getmonero.org. [250] Niels Provos et al. “The Ghost in the Browser: Analysis of Web-Based Malware”. In: First Workshop on Hot Topics in Understanding Botnets, HotBots’07, Cambridge, MA, USA, April 10, 2007. Ed. by Niels Provos. USENIX Association, 2007. url: https://www.usenix.org/conference/hotbots-07/ghost-browser-analysis-web-based-malware. [251] Robby Qiu. WebAssembly Serverless Functions in AWS Lambda. CNCF. url: https://www.cncf.io/blog/2021/08/25/webassembly-serverless-functions-in-aws-lambda/ (visited on 04/21/2024). [252] [Question] Upstream Compiled Binaries Are 30% to 50% Slower than Fastcomp Ones · Issue #9817. 2020. 202 [253] Veselin Raychev et al. “Learning Programs from Noisy Data”. In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. POPL ’16. New York, NY, USA: Association for Computing Machinery, 2016, pp. 761–774. isbn: 978-1-4503-3549-2. doi: 10.1145/2837614.2837671. [254] Konrad Rieck, Tammo Krueger, and Andreas Dewald. “Cujo: Efficient Detection and Prevention of Drive-by-Download Attacks”. In: Proceedings of the 26th Annual Computer Security Applications Conference. ACSAC ’10. New York, NY, USA: Association for Computing Machinery, Dec. 6, 2010, pp. 31–39. isbn: 978-1-4503-0133-6. doi: 10.1145/1920261.1920267. (Visited on 08/26/2022). [255] Juan D Parra Rodriguez and Joachim Posegga. “RAPID: Resource and API-Based Detection against in-Browser Miners”. In: Proceedings of the 34th Annual Computer Security Applications Conference. ACM, 2018, pp. 313–326. [256] Alan Romano and Weihang Wang. “Automated WebAssembly Function Purpose Identification With Semantics-Aware Analysis”. In: Proceedings of the ACM Web Conference 2023. WWW ’23. New York, NY, USA: Association for Computing Machinery, Apr. 30, 2023, pp. 2885–2894. isbn: 978-1-4503-9416-1. doi: 10.1145/3543507.3583235. (Visited on 06/22/2023). [257] Alan Romano and Weihang Wang. “WASim: Understanding WebAssembly Applications through Classification”. In: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering. ASE ’20. New York, NY, USA: Association for Computing Machinery, Dec. 21, 2020, pp. 1321–1325. isbn: 978-1-4503-6768-4. doi: 10.1145/3324884.3415293. (Visited on 10/03/2021). [258] Alan Romano and Weihang Wang. “When Function Inlining Meets WebAssembly: Counterintuitive Impacts on Runtime Performance”. In: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ESEC/FSE 2023. New York, NY, USA: Association for Computing Machinery, Nov. 30, 2023, pp. 350–362. isbn: 9798400703270. doi: 10.1145/3611643.3616311. (Visited on 12/13/2023). [259] Alan Romano, Yunhui Zheng, and Weihang Wang. “MinerRay: Semantics-aware Analysis for Ever-Evolving Cryptojacking Detection”. In: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering. ASE ’20. New York, NY, USA: Association for Computing Machinery, 2020, pp. 1129–1140. isbn: 978-1-4503-6768-4. doi: 10.1145/3324884.3416580. [260] Alan Romano et al. “An Empirical Study of Bugs in WebAssembly Compilers”. In: 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). Nov. 2021, pp. 42–54. doi: 10.1109/ASE51524.2021.9678776. [261] Alan Romano et al. “Wobfuscator: Obfuscating JavaScript Malware via Opportunistic Translation to WebAssembly”. In: 2022 IEEE Symposium on Security and Privacy (SP). 2022 IEEE Symposium on Security and Privacy (SP). May 2022, pp. 1574–1589. doi: 10.1109/SP46214.2022.9833626. [262] Nathan E. Rosenblum et al. “Machine Learning-Assisted Binary Code Analysis”. In: 2007. url: https://www.semanticscholar.org/paper/Machine-Learning-Assisted-Binary-Code-AnalysisRosenblum-Zhu/282ada60862b322fb4902ca78cf73508ef265418 (visited on 12/16/2022). 203 [263] The Rust and WebAssembly Working Group. Wasm-Bindgen. 2020. url: https://github.com/rustwasm/wasm-bindgen. [264] Jan Rüth et al. “Digging into Browser-Based Crypto Mining”. In: Proceedings of the Internet Measurement Conference 2018, IMC 2018, Boston, MA, USA, October 31 - November 02, 2018. ACM, 2018, pp. 70–76. url: https://dl.acm.org/citation.cfm?id=3278539. [265] Muhammad Saad, Aminollah Khormali, and Aziz Mohaisen. “End-to-End Analysis of in-Browser Cryptojacking”. In: CoRR abs/1809.02152 (2018). arXiv: 1809.02152. url: http://arxiv.org/abs/1809.02152. [266] SAFE_HEAP=1 Breaks Source Map Generation for WASM. 2018. url: https://github.com/emscripten-core/emscripten/issues/6534. [267] J. Sahs and L. Khan. “A Machine Learning Approach to Android Malware Detection”. In: 2012 European Intelligence and Security Informatics Conference. Aug. 2012, pp. 141–147. doi: 10.1109/EISIC.2012.34. [268] Shaown Sarker, Jordan Jueckstock, and Alexandros Kapravelos. “Hiding in Plain Site: Detecting JavaScript Obfuscation through Concealed Browser API Usage”. In: Proceedings of the ACM Internet Measurement Conference. IMC ’20. New York, NY, USA: Association for Computing Machinery, Oct. 27, 2020, pp. 648–661. isbn: 978-1-4503-8138-3. doi: 10.1145/3419394.3423616. (Visited on 08/13/2021). [269] Andrew Scheidecker. WAVM. WAVM. url: https://github.com/WAVM/WAVM (visited on 09/12/2023). [270] Markus Scherer et al. “Wappler: Sound Reachability Analysis for WebAssembly”. In: 2024 IEEE 37th Computer Security Foundations Symposium (CSF). IEEE Computer Society, Apr. 9, 2024, pp. 377–392. isbn: 9798350362039. doi: 10.1109/CSF61375.2024.00025. (Visited on 04/21/2024). [271] Schloss Dagstuhl – Leibniz-Zentrum für Informatik. “Dagstuhl Reports, Table of Contents, Volume 13, Issue 3, 2023”. In: (2023), 2 pages, 228750 bytes. issn: 2192-5283. doi: 10.4230/DAGREP.13.3.I. (Visited on 04/21/2024). [272] Philipp Dominik Schubert, Ben Hermann, and Eric Bodden. “PhASAR: An Inter-procedural Static Analysis Framework for C/C++”. In: Tools and Algorithms for the Construction and Analysis of Systems. Ed. by Tomáš Vojnar and Lijun Zhang. Lecture Notes in Computer Science. Cham: Springer International Publishing, 2019, pp. 393–410. isbn: 978-3-030-17465-1. doi: 10.1007/978-3-030-17465-1_22. [273] M.G. Schultz et al. “Data Mining Methods for Detection of New Malicious Executables”. In: Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001. Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001. May 2001, pp. 38–49. doi: 10.1109/SECPRI.2001.924286. [274] Search - GitHub Docs. url: https://docs.github.com/en/rest/reference/search (visited on 01/30/2021). 204 [275] Stijn Seghers. U32 Interpreted as I32 When Passed to JS. 2019. url: https://github.com/rustwasm/wasm-bindgen/issues/1388. [276] Seigen et al. “CryptoNight Hash Function”. In: (2013). url: https://cryptonote.org/cns/cns008.txt. [277] Mirko Sertic. Mirkosertic/Bytecoder. url: https://github.com/mirkosertic/Bytecoder (visited on 01/30/2021). [278] Prabhu Seshagiri, Anu Vazhayil, and Padmamala Sriram. “AMA: Static Code Analysis of Web Page for the Detection of Malicious Scripts”. In: Procedia Computer Science 93 (2016), pp. 768–773. issn: 18770509. doi: 10.1016/j.procs.2016.07.291. (Visited on 07/18/2021). [279] C. E. Shannon. “A Mathematical Theory of Communication”. In: The Bell System Technical Journal 27.3 (July 1948), pp. 379–423. issn: 0005-8580. doi: 10.1002/j.1538-7305.1948.tb01338.x. [280] Philippe Skolka, Cristian-Alexandru Staicu, and Michael Pradel. “Anything to Hide? Studying Minified and Obfuscated Code in the Web”. In: The World Wide Web Conference on - WWW ’19. The World Wide Web Conference. San Francisco, CA, USA: ACM Press, 2019, pp. 1735–1746. isbn: 978-1-4503-6674-8. doi: 10.1145/3308558.3313752. (Visited on 07/18/2021). [281] SpiderMonkey: The Mozilla JavaScript Runtime - Mozilla | MDN. url: https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey (visited on 01/29/2021). [282] SpiderMonkey — Firefox Source Docs Documentation. url: https://firefox-source-docs.mozilla.org/js/index.html (visited on 03/14/2022). [283] Benedikt Spies and Markus Mock. “An Evaluation of WebAssembly in Non-Web Environments”. In: 2021 XLVII Latin American Computing Conference (CLEI). 2021 XLVII Latin American Computing Conference (CLEI). Oct. 2021, pp. 1–10. doi: 10.1109/CLEI53233.2021.9640153. (Visited on 04/21/2024). [284] Std::Atomic - Cppreference.Com. url: https://en.cppreference.com/w/cpp/atomic/atomic (visited on 04/24/2021). [285] Quentin Stiévenart, David W. Binkley, and Coen De Roover. “Static Stack-Preserving Intra-Procedural Slicing of Webassembly Binaries”. In: Proceedings of the 44th International Conference on Software Engineering. ICSE ’22. New York, NY, USA: Association for Computing Machinery, July 5, 2022, pp. 2031–2042. isbn: 978-1-4503-9221-1. doi: 10.1145/3510003.3510070. (Visited on 03/10/2023). [286] Quentin Stiévenart and Coen De Roover. “Wassail: A WebAs- Sembly Static Analysis Library”. Fifth International Workshop on Programming Technology for the Future Web. 2021. [287] Quentin Stiévenart and Coen De Roover. “Compositional Information Flow Analysis for WebAssembly Programs”. In: 2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM). 2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM). Sept. 2020, pp. 13–24. doi: 10.1109/SCAM51674.2020.00007. 205 [288] Ben Stock, Benjamin Livshits, and Benjamin G. Zorn. “Kizzle: A Signature Compiler for Detecting Exploit Kits”. In: 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2016, Toulouse, France, June 28 - July 1, 2016. IEEE Computer Society, 2016, pp. 455–466. doi: 10.1109/DSN.2016.48. [289] Chengnian Sun et al. “Toward Understanding Compiler Bugs in GCC and LLVM”. In: Proceedings of the 25th International Symposium on Software Testing and Analysis. ISSTA 2016. New York, NY, USA: Association for Computing Machinery, July 18, 2016, pp. 294–305. isbn: 978-1-4503-4390-9. doi: 10.1145/2931037.2931074. (Visited on 07/13/2022). [290] Symantec. 2018 Internet Security Threat Report. 2018. url: https://www.symantec.com/security-center/threat-report. [291] Aron Szanto, Timothy Tamm, and Artidoro Pagnoni. Taint Tracking for WebAssembly. July 22, 2018. doi: 10.48550/arXiv.1807.08349. arXiv: 1807.08349 [cs]. (Visited on 09/11/2023). preprint. [292] Rashid Tahir et al. “Mining on Someone Else’s Dime: Mitigating Covert Mining Operations in Clouds and Enterprises”. In: International Symposium on Research in Attacks, Intrusions, and Defenses. Springer, 2017, pp. 287–310. [293] Lin Tan et al. “Bug Characteristics in Open Source Software”. In: Empirical Software Engineering 19.6 (Dec. 1, 2014), pp. 1665–1705. issn: 1573-7616. doi: 10.1007/s10664-013-9258-8. [294] LLVM Admin Team. The LLVM Compiler Infrastructure Project. url: https://llvm.org/ (visited on 01/30/2021). [295] The GCC Team. GCC, the GNU Compiler Collection - GNU Project. url: https://gcc.gnu.org/ (visited on 09/12/2023). [296] The JSE Team. JSECoin - JavaScript Embedded Cryptocurrency. 2019. url: https://jsecoin.com/. [297] Theodoros Theodoridis, Tobias Grosser, and Zhendong Su. “Understanding and Exploiting Optimal Function Inlining”. In: Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. ASPLOS 2022. New York, NY, USA: Association for Computing Machinery, Feb. 28, 2022, pp. 977–989. isbn: 978-1-4503-9205-1. doi: 10.1145/3503222.3507744. (Visited on 06/20/2022). [298] Vladimir Tikhonov. A Specific Pattern of Pushes to Array Causes a Runtime Error in Lib/Rt/Tlsf/insertBlock. 2020. url: https://github.com/AssemblyScript/assemblyscript/issues/1042. [299] Vladimir Tikhonov. Runtime Error during GC. 2020. url: https://github.com/AssemblyScript/assemblyscript/issues/1038. [300] Ben L. Titzer et al. Flexible Non-intrusive Dynamic Instrumentation for WebAssembly. Mar. 12, 2024. doi: 10.48550/arXiv.2403.07973. arXiv: 2403.07973 [cs]. (Visited on 04/21/2024). preprint. [301] Trend-Micro. 2018 Midyear Security Roundup: Unseen Threats, Imminent Losses. 2018. url: https://documents.trendmicro.com/assets/rpt/rpt-2018-Midyear-Security-Roundup-unseenthreats-imminent-losses.pdf. 206 [302] Mircea Trofin. Issue 2656103003: [Wasm] Flag for Asm-Wasm Investigations - Code Review. 2018. url: https://codereview.chromium.org/2656103003/. [303] tstreiff. Tstreiff/Ppci-Mirror. url: https://github.com/tstreiff/ppci-mirror (visited on 01/30/2021). [304] TurboFan · V8. url: https://v8.dev/docs/turbofan (visited on 03/14/2022). [305] uPlexa. uPlexa: Incentivizing the Mass Compute Power of IoT Devices to Form a Means of Anonymous Blockchain Payments. 2019. url: https://uplexa.com/. [306] Usage of Sigsetjmp/Siglongjmp Leads to Undefined Symbol References · Issue #5204 · Emscripten-Core/Emscripten. GitHub. url: https://github.com/emscripten-core/emscripten/issues/5204 (visited on 01/30/2021). [307] Using JavaScript and WebCL for Numerical Computations: A Comparative Study of Native and Web Technologies: ACM SIGPLAN Notices: Vol 50, No 2. url: https://dl.acm.org/doi/abs/10.1145/2775052.2661090?casa_token=CEY26QCZ4hIAAAAA:U_RMy4oihBnr38QffcwCax3TwXwrG5B00az8LjexGdmeLTT9vHH1ZmPFcBNw90jE-Rk6DniTUJF-A (visited on 02/03/2023). [308] V8 JavaScript Engine. url: https://v8.dev/ (visited on 03/14/2022). [309] A. Vahabzadeh, A. M. Fard, and A. Mesbah. “An Empirical Study of Bugs in Test Code”. In: 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME). 2015, pp. 101–110. doi: 10.1109/ICSM.2015.7332456. [310] Raja Vallée-Rai et al. “Soot: A Java Bytecode Optimization Framework”. In: CASCON First Decade High Impact Papers on - CASCON ’10. CASCON First Decade High Impact Papers. Toronto, Ontario, Canada: ACM Press, 2010, pp. 214–224. doi: 10.1145/1925805.1925818. (Visited on 07/18/2023). [311] Steven Van Acker et al. “Monkey-in-the-Browser: Malware and Vulnerabilities in Augmented Browsing Script Markets”. In: 9th ACM Symposium on Information, Computer and Communications Security, ASIA CCS ’14, Kyoto, Japan - June 03 - 06, 2014. Ed. by Shiho Moriai, Trent Jaeger, and Kouichi Sakurai. ACM, 2014, pp. 525–530. doi: 10.1145/2590296.2590311. [312] Ashish Vaswani et al. “Attention Is All You Need”. In: Advances in Neural Information Processing Systems. Vol. 30. Curran Associates, Inc., 2017. url: https://proceedings.neurips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aaAbstract.html (visited on 06/22/2023). [313] Brion Vibber. Assertion Failure in Wasm-Ld Linking Libvpx with Pthreads. 2019. url: https://github.com/emscripten-core/emscripten/issues/9155. [314] Brion Vibber. Error "Variable FS Is Undeclared" in 1.39.7 with Closure Compiler. 2020. url: https://github.com/emscripten-core/emscripten/issues/10385. [315] VirusTotal. url: https://www.virustotal.com/gui/home/upload (visited on 08/10/2021). [316] W3C. WebAssembly. url: https://webassembly.org/. 207 [317] Wabt. npm. url: https://www.npmjs.com/package/wabt (visited on 08/10/2021). [318] WABT: The WebAssembly Binary Toolkit. WebAssembly. url: https://github.com/WebAssembly/wabt (visited on 06/22/2023). [319] Wabt/Src/Ir.Cc at Main · WebAssembly/Wabt. GitHub. url: https://github.com/WebAssembly/wabt/blob/main/src/ir.cc (visited on 04/12/2024). [320] Wabt/Wasm2c/README.Md at Main · WebAssembly/Wabt. GitHub. url: https://github.com/WebAssembly/wabt/blob/main/wasm2c/README.md (visited on 09/12/2023). [321] Linus Wagner et al. “On the Energy Consumption and Performance of WebAssembly Binaries across Programming Languages and Runtimes in IoT”. In: Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering. EASE ’23. New York, NY, USA: Association for Computing Machinery, June 14, 2023, pp. 72–82. isbn: 9798400700446. doi: 10.1145/3593434.3593454. (Visited on 04/21/2024). [322] Jihu Wang. Zswang/Jfogs. url: https://github.com/zswang/jfogs (visited on 11/17/2021). [323] Junjie Wang et al. “JSDC: A Hybrid Approach for JavaScript Malware Detection and Classification”. In: Proceedings of the 10th ACM Symposium on Information, Computer and Communications Security. ASIA CCS ’15: 10th ACM Symposium on Information, Computer and Communications Security. Singapore Republic of Singapore: ACM, Apr. 14, 2015, pp. 109–120. isbn: 978-1-4503-3245-3. doi: 10.1145/2714576.2714620. (Visited on 07/18/2021). [324] Pei Wang et al. “Field Experience with Obfuscating Million-User iOS Apps in Large Enterprise Mobile Development”. In: Software: Practice and Experience 49.2 (2019), pp. 252–273. [325] Shuai Wang and Dinghao Wu. “In-Memory Fuzzing for Binary Code Similarity Analysis”. In: 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). Oct. 2017, pp. 319–330. doi: 10.1109/ASE.2017.8115645. [326] Weihang Wang. “Empowering Web Applications with WebAssembly: Are We There Yet?” In: 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). Nov. 2021, pp. 1301–1305. doi: 10.1109/ASE51524.2021.9678831. (Visited on 04/21/2024). [327] Wenhao Wang et al. “SEISMIC: SEcure In-lined Script Monitors for Interrupting Cryptojacks”. In: Computer Security: 23rd European Symposium on Research in Computer Security, ESORICS 2018, Barcelona, Spain, September 3-7, 2018, Proceedings, Part II. Berlin, Heidelberg: Springer-Verlag, Sept. 3, 2018, pp. 122–142. isbn: 978-3-319-98988-4. doi: 10.1007/978-3-319-98989-1_7. (Visited on 09/11/2023). [328] Yao Wang, Wan-dong Cai, and Peng-cheng Wei. “A Deep Learning Approach for Detecting Malicious JavaScript Code: Using a Deep Learning Approach to Detect JavaScript-based Attacks”. In: Security and Communication Networks 9.11 (July 25, 2016), pp. 1520–1534. issn: 19390114. doi: 10.1002/sec.1441. (Visited on 07/18/2021). 208 [329] Yue Wang et al. “A Comprehensive Study of WebAssembly Runtime Bugs”. In: 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). Mar. 2023, pp. 355–366. doi: 10.1109/SANER56733.2023.00041. (Visited on 04/21/2024). [330] Yue Wang et al. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. Sept. 2, 2021. doi: 10.48550/arXiv.2109.00859. arXiv: 2109.00859 [cs]. (Visited on 06/22/2023). preprint. [331] Wasmer: The Universal WebAssembly Runtime. Wasmer. url: https://wasmer.io/ (visited on 04/23/2024). [332] Wasmtime. Bytecode Alliance. url: https://github.com/bytecodealliance/wasmtime (visited on 10/03/2021). [333] Conrad Watt. “Mechanising and Verifying the WebAssembly Specification”. In: Proceedings of the 7th ACM SIGPLAN International Conference on Certified Programs and Proofs. CPP 2018. New York, NY, USA: Association for Computing Machinery, Jan. 8, 2018, pp. 53–65. isbn: 978-1-4503-5586-5. doi: 10.1145/3167082. (Visited on 04/21/2024). [334] Conrad Watt, Andreas Rossberg, and Jean Pichon-Pharabod. “Weakening WebAssembly”. In: Proceedings of the ACM on Programming Languages 3 (OOPSLA Oct. 10, 2019), 133:1–133:28. doi: 10.1145/3360559. (Visited on 08/11/2022). [335] Conrad Watt et al. “CT-wasm: Type-Driven Secure Cryptography for the Web Ecosystem”. In: Proceedings of the ACM on Programming Languages 3 (POPL Jan. 2, 2019), pp. 1–29. issn: 2475-1421. doi: 10.1145/3290390. (Visited on 04/08/2022). [336] Conrad Watt et al. “WasmRef-Isabelle: A Verified Monadic Interpreter and Industrial Fuzzing Oracle for WebAssembly”. In: Proceedings of the ACM on Programming Languages 7 (PLDI June 6, 2023), 110:100–110:123. doi: 10.1145/3591224. (Visited on 04/21/2024). [337] WebAssembly. Binaryen. WebAssembly. url: https://github.com/WebAssembly/binaryen (visited on 01/11/2022). [338] WebAssembly Core Specification. W3C, Dec. 5, 2019. url: https://www.w3.org/TR/wasm-core-1/. [339] WebAssembly Core Specification. Version 2.0. W3C, Apr. 19, 2022. url: https://www.w3.org/TR/wasm-core-2/. [340] WebDollar. WebDollar - Currency of the Internet. 2019. url: https://webdollar.io/. [341] Fengguo Wei et al. “JN-SAF: Precise and Efficient NDK/JNI-aware Inter-language Static Analysis Framework for Security Vetting of Android Applications with Native Code”. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. CCS ’18: 2018 ACM SIGSAC Conference on Computer and Communications Security. Toronto Canada: ACM, Oct. 15, 2018, pp. 1137–1150. isbn: 978-1-4503-5693-0. doi: 10.1145/3243734.3243835. (Visited on 07/27/2023). 209 [342] Andre Weissflog. Code-Gen Bug Related to 32-Bit Floats (Actually: Rand()). 2016. url: https://github.com/WebAssembly/binaryen/issues/817. [343] What Is Rustc? - The Rustc Book. url: https://doc.rust-lang.org/rustc/index.html (visited on 01/11/2022). [344] White Paper · Webchain-Network/Wiki Wiki. 2019. url: https://github.com/webchain-network/wiki/wiki/White-Paper#the-protocol. [345] Yifan Xia et al. “Static Semantics Reconstruction for Enhancing JavaScript-WebAssembly Multilingual Malware Detection”. In: Computer Security – ESORICS 2023. Ed. by Gene Tsudik et al. Cham: Springer Nature Switzerland, 2024, pp. 255–276. isbn: 978-3-031-51476-0. doi: 10.1007/978-3-031-51476-0_13. [346] Xinyu Xing et al. “Understanding Malvertising through Ad-Injecting Browser Extensions”. In: Proceedings of the 24th International Conference on World Wide Web, WWW 2015, Florence, Italy, May 18-22, 2015. Ed. by Aldo Gangemi, Stefano Leonardi, and Alessandro Panconesi. ACM, 2015, pp. 1286–1295. doi: 10.1145/2736277.2741630. [347] D. Xu, J. Ming, and D. Wu. “Cryptographic Function Detection in Obfuscated Binaries via Bit-Precise Symbolic Loop Mapping”. In: 2017 IEEE Symposium on Security and Privacy (SP). May 2017, pp. 921–937. doi: 10.1109/SP.2017.56. [348] Wei Xu, Fangfang Zhang, and Sencun Zhu. “JStill: Mostly Static Detection of Obfuscated Malicious JavaScript Code”. In: Proceedings of the Third ACM Conference on Data and Application Security and Privacy - CODASPY ’13. The Third ACM Conference. San Antonio, Texas, USA: ACM Press, 2013, p. 117. isbn: 978-1-4503-1890-7. doi: 10.1145/2435349.2435364. (Visited on 07/18/2021). [349] Wei Xu, Fangfang Zhang, and Sencun Zhu. “The Power of Obfuscation Techniques in Malicious JavaScript Code: A Measurement Study”. In: 2012 7th International Conference on Malicious and Unwanted Software. 2012 7th International Conference on Malicious and Unwanted Software (MALWARE). Fajardo, PR, USA: IEEE, Oct. 2012, pp. 9–16. isbn: 978-1-4673-4879-9 978-1-4673-4880-5 978-1-4673-4878-2. doi: 10.1109/MALWARE.2012.6461002. (Visited on 07/18/2021). [350] Xiangzhe Xu et al. LmPa: Improving Decompilation by Synergy of Large Language Model and Program Analysis. June 4, 2023. doi: 10.48550/arXiv.2306.02546. arXiv: 2306.02546 [cs]. (Visited on 06/21/2023). preprint. [351] Yinxing Xue et al. “Detection and Classification of Malicious JavaScript via Attack Behavior Modelling”. In: Proceedings of the 2015 International Symposium on Software Testing and Analysis. ISSTA 2015. New York, NY, USA: Association for Computing Machinery, July 13, 2015, pp. 48–59. isbn: 978-1-4503-3620-8. doi: 10.1145/2771783.2771814. (Visited on 08/13/2021). [352] Yutian Yan et al. “Understanding the Performance of Webassembly Applications”. In: Proceedings of the 21st ACM Internet Measurement Conference. IMC ’21: ACM Internet Measurement Conference. Virtual Event: ACM, Nov. 2, 2021, pp. 533–549. isbn: 978-1-4503-9129-0. doi: 10.1145/3487552.3487827. (Visited on 04/08/2022). 210 [353] Ilsun You and Kangbin Yim. “Malware Obfuscation Techniques: A Brief Survey”. In: 2010 International Conference on Broadband, Wireless Computing, Communication and Applications. 2010 International Conference on Broadband, Wireless Computing, Communication and Applications (BWCCA 2010). Fukuoka: IEEE, Nov. 2010, pp. 297–300. isbn: 978-1-4244-8448-5. doi: 10.1109/BWCCA.2010.85. (Visited on 07/18/2021). [354] Dongjun Youn et al. “Bringing the WebAssembly Standard up to Speed with SpecTec”. In: 44th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) 8 (). doi: 10.1145/3656440. [355] Alon Zakai. Emscripten and the LLVM WebAssembly Backend - V8. 2019. url: https://v8.dev/blog/emscripten-llvm-wasm. [356] Alon Zakai. “The Binaryen Optimizer Goes Up To 4”. June 2018. url: https://kripken.github.io/talks/binaryen_O4.html. [357] Andreas Zeller and Ralf Hildebrandt. “Simplifying and Isolating Failure-Inducing Input”. In: IEEE Transactions on Software Engineering 28.2 (Feb. 2002), pp. 183–200. issn: 0098-5589. doi: 10.1109/32.988498. [358] Mili Zhang. 1.38.24 Does Not Generate Asm.Js with Correct Variable Names with -G3/G4. 2019. url: https://github.com/emscripten-core/emscripten/issues/7883. [359] Mili Zhang. 1.38.24 Does Not Generate Asm.Js with Correct Variable Names with -G3/G4 · Issue #7883. url: https://github.com/emscripten-core/emscripten/issues/7883 (visited on 01/30/2021). [360] Qirun Zhang, Chengnian Sun, and Zhendong Su. “Skeletal Program Enumeration for Rigorous Compiler Testing”. In: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation. PLDI 2017. New York, NY, USA: Association for Computing Machinery, June 14, 2017, pp. 347–361. isbn: 978-1-4503-4988-8. doi: 10.1145/3062341.3062379. (Visited on 07/13/2022). [361] Tianwei Zhang, Yinqian Zhang, and Ruby Lee. “CloudRadar: A Real-Time Side-Channel Attack Detection System in Clouds”. In: vol. 9854. Sept. 2016, pp. 118–140. isbn: 978-3-319-45718-5. doi: 10.1007/978-3-319-45719-2_6. [362] Wenlong Zheng, Baojian Hua, and Zhuochen Jiang. “WASMDYPA: Effectively Detecting WebAssembly Bugs via Dynamic Program Analysis”. In: 2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security Companion (QRS-C). 2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security Companion (QRS-C). Oct. 2023, pp. 867–868. doi: 10.1109/QRS-C60940.2023.00101. (Visited on 04/21/2024). [363] Jianfei Zhou and Ting Chen. “WASMOD: Detecting Vulnerabilities in Wasm Smart Contracts”. In: IET Blockchain 3.4 (2023), pp. 172–181. issn: 2634-1573. doi: 10.1049/blc2.12029. (Visited on 04/21/2024). [364] Shiyao Zhou et al. “WADIFF: A Differential Testing Framework for WebAssembly Runtimes”. In: 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). 2023. url: https://api.semanticscholar.org/CorpusID:265054525. 211 [365] Zhide Zhou et al. “An Empirical Study of Optimization Bugs in GCC and LLVM”. In: Journal of Systems and Software 174 (2021), p. 110884. issn: 0164-1212. doi: 10.1016/j.jss.2020.110884. 212
Abstract (if available)
Abstract
WebAssembly is a recent standard for the web that aims to enable high-performance web applications that can run at near-native speeds. The standard has gained attention in both academia and industry for its ability to speed up existing user-facing web applications. Due to its well-defined and sound design, many static program analysis techniques have been developed to accomplish various purposes of WebAssembly analysis. However, we explore the current landscape of static program analyses and identify gaps within the current WebAssembly ecosystem . We find that current program optimizations applied on WebAssembly modules may lead to diminished performance. We also identify a lack of tools that help developers understand WebAssembly modules through robust binary decompilation. Finally, we find a gap in the ability to analyze cross-language WebAssembly applications across the two languages they are typically implemented in, i.e., WebAssembly and JavaScript.
In this dissertation, we present a novel WebAssembly Analysis Framework, or WAF. WAF is a static program analysis framework for WebAssembly modules that consists of multiple intermediate representations. Inspired by frameworks made for Java, the core of our framework lies in our three intermediate representations that each model the WebAssembly module at a different semantic level. This structure enables WAF to serve in multiple use cases, including program optimizations, binary decompilation, cross-language program analysis, and malware detection. We aim to show that our framework can improve static program analysis in the areas that the WebAssembly ecosystem is lacking.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Improving binary program analysis to enhance the security of modern software systems
PDF
Side-channel security enabled by program analysis and synthesis
PDF
Assume-guarantee contracts for assured cyber-physical system design under uncertainty
PDF
Constraint based analysis for persistent memory programs
PDF
Constraint-based program analysis for concurrent software
PDF
Automatic detection and optimization of energy optimizable UIs in Android applications using program analysis
PDF
Custom hardware accelerators for boolean satisfiability
PDF
Security-driven design of logic locking schemes: metrics, attacks, and defenses
PDF
Automatic test generation system for software
PDF
Hardware-software codesign for accelerating graph neural networks on FPGA
PDF
Towards the efficient and flexible leveraging of distributed memories
PDF
Automated repair of layout accessibility issues in mobile applications
PDF
Software quality understanding by analysis of abundant data (SQUAAD): towards better understanding of life cycle software qualities
PDF
Graph machine learning for hardware security and security of graph machine learning: attacks and defenses
PDF
A function approximation view of database operations for efficient, accurate, privacy-preserving & robust query answering with theoretical guarantees
PDF
Compiler and runtime support for hybrid arithmetic and logic processing of neural networks
PDF
Detection, localization, and repair of internationalization presentation failures in web applications
PDF
Efficiently learning human preferences for proactive robot assistance in assembly tasks
PDF
Detecting SQL antipatterns in mobile applications
PDF
Towards combating coordinated manipulation to online public opinions on social media
Asset Metadata
Creator
Romano, Alan Jesus
(author)
Core Title
Static program analyses for WebAssembly
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Degree Conferral Date
2024-08
Publication Date
05/28/2024
Defense Date
05/20/2024
Publisher
Los Angeles, California
(original),
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
MinerRay,OAI-PMH Harvest,program analysis,static program analysis,WAF,web,WebAssembly,Wobfuscator
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Wang, Weihang (
committee chair
), Nuzzo, Pierluigi (
committee member
), Wang, Chao (
committee member
)
Creator Email
ajromano@usc.edu,alan.rom25.95@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC113967449
Unique identifier
UC113967449
Identifier
etd-RomanoAlan-13030.pdf (filename)
Legacy Identifier
etd-RomanoAlan-13030
Document Type
Dissertation
Format
theses (aat)
Rights
Romano, Alan Jesus
Internet Media Type
application/pdf
Type
texts
Source
20240528-usctheses-batch-1162
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
MinerRay
program analysis
static program analysis
WAF
WebAssembly
Wobfuscator