Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Constraint-based program analysis for concurrent software
(USC Thesis Other)
Constraint-based program analysis for concurrent software
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Constraint-based Program Analysis for Concurrent Software by Chungha Sung A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) May 2021 Copyright 2021 Chungha Sung Dedication To my beloved parents, Kwanghyun and So Hyun, and my brother, Ho Young, for their endless support, encouragement and love. ii Acknowledgements Pursuing a Ph.D. has been one of the most rewarding and valuable experiences of my life. It was truly challenging, I could not have nished it without help and support from many people I have become acquainted with in my pursuit of the Ph.D. First and foremost, I would like to thank my advisor Chao Wang for his dedication, patience and continuous support in guiding me to become an independent researcher. His attitude toward research with sincere passion and endless curiosity has always helped me to build my research foundation to overcome numerous challenges and unveil new inventions. His creativity was never lacking that he never stops challenging my ideas with his profound vision and insight from diverse aspects. Having to defend my arguments against his challenges motivated me to hone my ideas and construct better arguments, which led to achieving strong research results and publishing top-tier research papers. Furthermore, despite our relationship being that of a student and advisor, he respected me and gave me the room I needed to deal with whatever problems that may have arisen during dicult times in my personal life. That helped me immensely with standing by my journey despite wanting to, and even coming close to dropping out. I truly believe that being able to start and nish this journey under his guidance is a blessing, a gift; and I will never forget a single piece of advice or help oered to me. He has inspired me to become an excellent researcher and a good mentor just as he was to me, and I believe that this is the best way to express the never ending gratitude I have towards him. iii I would also really like to thank all committee members in my defense, proposal and qualifying exam: Nenad Medvidović, Sandeep Gupta, William G.J. Halfond, Jyotirmoy V. Deshmukh and Mukund Raghothaman. When I was having a hard time forming a coherent idea for the dissertation, they never hesitated to spend their time on helping me to build a constructive story from dierent views. Every discussion with them in the last one year enlightened me to realize what was lacking to me to stand alone as a Ph.D. All of the lessons from them were truly valuable, I could have not learned them without their thoughtful advice and comments on my presentation and dissertation. Throughout internships, I met many valuable people outside of school. I wish Shuvendu Lahiri would please accept my deepest appreciation for his passionate mentoring and critical advice during my Microsoft Research internship in the summer of 2019. And, I would like to thank Mike Kaufman, Pallavi Choudhury, Jessica Wolk, Mark Marron, Nachi Nagappan, Hitesh Kanwathirtha, and Madan Musuvathi who always engaged deeply in many discussions with me to achieve a research goal during the internship. To Henry Cox and Alex Pai, I appreciate your thoughtful guiding, mentoring and bike riding during the internship in the summer of 2018 in Boston. Thanks also goes to my many Korean friends who were always with me by supporting me mentally and physically. To Seoungmo Kim, I always miss our gym life. To Mincheol Sung, I wish you will defend your defense soon. To my beloved friends I met at Virginia Tech, Yuhyun Song, Hyunwoo Kang, Heonjoong Lee, Yoonchang Sung, Sungjae Ohn and Hyuncheol Kim, I heartily appreciate your friendship. To my best friends in Korea who always answered my phone calls day and night as if they were in the US, Taejoon Song, Jiyong Park, Heechul Chae, Sehwan Lee, Jaekoo Park, Wonho Jang and Seokhyung Hong, I always wish our bright future. Especially to Yongsang Yu, I was always encouraged by your cheers and words that will live in my heart forever. And, I would like to thank Naeun Park (with Aengdoo and Hayang), Dongjin Huh, Dojoon Park, Sungwoo Choo, Hyesang Park, Sunghyun Park, Sanghoon Chae, Sungbo Park iv and Dongho Lee for their help to overcome many diculties in my life in LA. To everyone here, please forgive me for not being able to spend more lines. I want to thank all members from the RSS group: Jingbo Wang, Markus Kusano, Shengjian (Daniel) Guo, Chenggang Li, Lu Zhang, Tonmoy Roy, Meng Wu, Zunchen Huang, Yannan Li and Brandon Paulsen. I still miss many small parties with Markus and Daniel that helped me to adopt the life in the US during my rst several years. The life in LA would have not been grateful without Jingbo who never rejected me whenever I asked to bend her ear about my problems. All discussions and chats I had with all members were so memorable and I will denitely miss them. Finally, I would like to thank my beloved parents and my brother. To Ho Young Sung and his wife, Jieun, I wish your family and especially, my nephew Haon Sung, all my best. Had it not been for them, I do not know that I would have completed the Ph.D.; it is because of their support, understanding and unconditional love. v TableofContents Dedication ii Acknowledgements iii ListofTables ix ListofFigures x Abstract xii Chapter1: Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.1 Enumeration of possible interleavings in concurrent software is not scalable . . . . 2 1.1.2 Merging possible interleavings in concurrent software is not accurate . . . . . . . 3 1.2 Insight and Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.1 Insight 1: Analyzing interferences in advance is useful in concurrency reasoning . 5 1.2.2 Insight 2: Using constraint-based program analysis to analyze interference . . . . . 6 1.2.3 Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3.1 Accurate modular abstract interpretation for interrupt-driven programs . . . . . . 7 1.3.2 Fast and approximate semantic ding of multi-threaded programs . . . . . . . . . 8 1.3.3 Ecient testing for web applications with DOM-event dependency . . . . . . . . . 8 1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Chapter2: BackgroundandRelatedWork 10 2.1 Constraint-based Program Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1.2 Related Work on Datalog-based Declarative Program Analysis . . . . . . . . . . . . 12 2.2 Related Work on Testing and Verication of Concurrent Software . . . . . . . . . . . . . . 13 2.2.1 Verication of Interrupt-driven Software . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.1.1 Modular Abstract Interpretation Based Approaches . . . . . . . . . . . . 13 2.2.1.2 Model Checking Based Approaches . . . . . . . . . . . . . . . . . . . . . 14 2.2.1.3 Testing Based Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2.1.4 Other Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2.2 Semantic Ding of Sequential and Concurrent Programs . . . . . . . . . . . . . . 15 2.2.2.1 Semantic Ding of Sequential Programs . . . . . . . . . . . . . . . . . . 15 2.2.2.2 Semantic Ding of Concurrent Programs . . . . . . . . . . . . . . . . . 16 vi 2.2.3 Testing Web Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Chapter3: AccurateModularVericationofInterrupt-drivenSoftware 19 3.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.1.1 Modeling of Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.1.2 Modular Abstract Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.1.3 Motivating Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.1.3.1 Examples for Testing Approaches . . . . . . . . . . . . . . . . . . . . . . 27 3.1.3.2 Examples for Model Checking Approaches . . . . . . . . . . . . . . . . . 29 3.1.3.3 Examples for Modular Abstract Interpretation Approaches . . . . . . . . 29 3.1.3.4 Examples for Interrupt Semantic-aware Modular Abstract Interpretation 31 3.1.3.5 Summary of the Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2 Constraint-based Checking of Infeasible Interferences . . . . . . . . . . . . . . . . . . . . . 33 3.2.1 Rules for NoPreempt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.2.2 Rules for CoveredLoad . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.2.3 Rules for InterceptedStore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.2.4 Rules on Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.2.5 Soundness of the Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.3 Optimizing Modular Abstract Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.3.1 Pruning Infeasible Interferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.3.2 The Running Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.4 Implementation and Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.4.2 Results: Infeasible Pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.4.3 Results: Optimizing Modular Abstract Interpretation . . . . . . . . . . . . . . . . . 43 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Chapter4: ScalableSemanticDingofConcurrentPrograms 47 4.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.1.1 Partial Trace Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.1.2 Motivating Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.1.2.1 An Example with Lock-Unlock Changes . . . . . . . . . . . . . . . . . . 53 4.1.2.2 An Example with Signal-Wait Changes . . . . . . . . . . . . . . . . . . . 54 4.1.2.3 Applying Our Method to the Examples . . . . . . . . . . . . . . . . . . . 56 4.2 Constraint-based Synchronization Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.2.1 Rules for Intra-thread Dependency . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.2.2 Rules for Inter-thread Dependency . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.2.3 Rules for Signal-Wait Dependency . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.2.4 Rules for Ad Hoc Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.2.5 Rules for Transitive Closure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.2.6 Rules for Lock-enforced Critical Section . . . . . . . . . . . . . . . . . . . . . . . . 61 4.2.7 Rules for Read-from Relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.2.8 Soundness of the Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.3 Optimizing the Semantic Ding Computation . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.3.1 Symmetric Dierence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.3.2 Dierences at Higher Ranks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.3.3 The Running Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.4 Implementation and Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 vii 4.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.4.2 Results: Optimizing Semantic Ding on Small-sized Benchmarks . . . . . . . . . . 71 4.4.3 Results: Optimizing Semantic Ding on Large-sized Benchmarks . . . . . . . . . . 74 4.4.4 Answering Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.5 Further Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 Chapter5: EcientTestingforWebApplicationswithDOMEventDependencyAnalysis 80 5.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.1.1 Web Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.1.2 JavaScript Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.1.3 Points-to Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.1.4 Call-graph Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 5.1.5 Dependency Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 5.1.6 Motivating Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.1.6.1 DOM Event Dependency on the Example . . . . . . . . . . . . . . . . . . 89 5.1.6.2 Improved Web Application Testing with DOM Event Dependency . . . . 92 5.2 Constraint-based Dependency Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.2.1 Rules for Points-To Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5.2.2 Rules for DOM Event Dependency Analysis . . . . . . . . . . . . . . . . . . . . . . 99 5.2.3 Soundness of the Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.3 Improving Automated Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.3.1 Pruning Redundant Event Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.3.2 The Running Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.4 Implementation and Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.4.2 Results: Dependency Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5.4.3 Results: Improving Artemis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 5.4.4 Results: Redundancy Removal in Test Case Generation . . . . . . . . . . . . . . . . 109 5.5 Further Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.5.1 The Size of the Worklist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5.5.2 Factors that May Aect the Execution time . . . . . . . . . . . . . . . . . . . . . . 114 5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 Chapter6: ConclusionandFutureWork 119 6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 6.2.1 Optimizing Regression Test Selection . . . . . . . . . . . . . . . . . . . . . . . . . . 121 6.2.2 Query Optimization in the Datalog Solver . . . . . . . . . . . . . . . . . . . . . . . 123 References 125 viii ListofTables 3.1 Comparing IntAbs with testing and prior verication methods on the programs in Fig. 3.3 and Fig. 3.6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2 MustNotReadFrom rules based on InterceptedStore, CoveredLoad and Priority . . . . . . . 36 3.3 Benchmark programs used in our experimental evaluation. . . . . . . . . . . . . . . . . . . 42 3.4 Results of total and ltered store-load pairs using IntAbs. . . . . . . . . . . . . . . . . . . . 44 3.5 Results of comparing IntAbs with state-of-the-art techniques on 35 interrupt-driven programs. 45 4.1 Experimental results on the rst set of benchmark programs. . . . . . . . . . . . . . . . . . 73 4.2 Experimental results on the second set of benchmark programs. . . . . . . . . . . . . . . . 74 5.1 Results of the static DOM-event dependency analysis. . . . . . . . . . . . . . . . . . . . . . 107 5.2 Results of comparing Artemis andArtemis+JSdep with a xed number of iterations. . . . 108 5.3 Results of blocked sequence ratio (step 500). . . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.4 Running Artemis andArtemis+JSdep for 10 minutes. . . . . . . . . . . . . . . . . . . . . . 111 5.5 Execution time of Artemis andArtemis+JSdep after 500 iterations. . . . . . . . . . . . . . 115 ix ListofFigures 1.1 An example program with four concurrent events and an assertion. . . . . . . . . . . . . . 2 1.2 An example program with three concurrent events and an assertion. . . . . . . . . . . . . 3 1.3 Possible interleavings before executing line 3 in gure 1.2 . . . . . . . . . . . . . . . . . . . 4 1.4 Overview of the constraint-based program analysis. . . . . . . . . . . . . . . . . . . . . . . 6 2.1 Overview of the constraint-based program analysis. . . . . . . . . . . . . . . . . . . . . . . 11 3.1 IntAbs – iterative verication framework for interrupt-driven programs. . . . . . . . . . . 22 3.2 The interleavings (afterstmt1) allowed by interrupts and threads. . . . . . . . . . . . . . . 23 3.3 An example program with three interrupt handlers and assertions. . . . . . . . . . . . . . . 27 3.4 Some possible interrupt sequences for the program in Fig. 3.3. . . . . . . . . . . . . . . . . 28 3.5 Some possible store-to-load data ows during abstract interpretation. . . . . . . . . . . . . 30 3.6 An example program with three interrupt handlers, where the rst two assertions alway hold but the last assertion may fail. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.7 Examples for each case in Table 3.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.8 A small example with a loop. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.1 Overview of our semantic ding method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.2 Example programs with synchronization dierences (lock-unlock). . . . . . . . . . . . . . 54 4.3 Example programs with synchronization dierences (signal-wait). . . . . . . . . . . . . . . 55 4.4 Analysis steps for programs in Figs. 4.3(a) and 4.3(b). . . . . . . . . . . . . . . . . . . . . . 56 4.5 Ad hoc synchronization (cond = false initially). . . . . . . . . . . . . . . . . . . . . . . . 61 x 4.6 Dierences of abstract traces: +n+ 12 (left) and +n+ 21 (right). . . . . . . . . . . . . . . . . . 64 4.7 Illustrating the rst two rank-2 inference rules. . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.8 Illustrating rank-2 rules related to lock-unlock. . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.9 Example programs with rank-2 dierences. . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.10 Steps of our analysis for the programs in Fig. 4.9. . . . . . . . . . . . . . . . . . . . . . . . . 69 4.11 Code from rtl8169-3: the original (left) and changed (right) versions. . . . . . . . . . . . . . 75 4.12 Various dierences between two programs with over and under-approximated behaviors. . 77 4.13 Example program withT + andT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.1 Overall ow of DOM-event dependency analysis. . . . . . . . . . . . . . . . . . . . . . . . 81 5.2 Example: data/control dependencies in the DOM. . . . . . . . . . . . . . . . . . . . . . . . 88 5.3 Example HTML page and associated JavaScript le. . . . . . . . . . . . . . . . . . . . . . . 90 5.4 DOM event dependencies for the example in Figure 5.3. . . . . . . . . . . . . . . . . . . . . 91 5.5 Event sequences explored by Artemis for Figure 5.3. . . . . . . . . . . . . . . . . . . . . . . 94 5.6 Example for JavaScript code normalization. . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.7 Input relations dened to specify our analysis. . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.8 Datalog rules for the points-to analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5.9 Datalog rules for data dependency analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.10 Datalog rules for DOM dependency analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.11 The worklist size compared to Artemis by the number of iterations. . . . . . . . . . . . . . 112 5.12 Normalized execution time of Artemis+JSdep compared to Artemis after 500 iterations. . 116 5.13 The execution time of Artemis by the length of test sequences executed. . . . . . . . . . . 116 5.14 The execution time of Artemis by covered lines of code and read/written properties. . . . 117 6.1 Example test cases with events. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 6.2 Impacted events based on interference after updatingevent1 in Figure 6.1. . . . . . . . . . 122 xi Abstract Concurrent software is ubiquitous in modern computer systems: it is used everywhere from small computing devices such as smartphones to large systems such as clouds. However, writing correct concurrent software is notoriously dicult as the number of program states to reason about can be extremely large due to interleavings between concurrent events. Moreover, testing and verifying concurrent software by enumerating the enormous number of interleavings is unrealistic. Thus, automated reasoning techniques for concurrent software that rely on exhaustive enumeration of interleavings often lack scalability, and those that do not rely on exhaustive enumeration often lack accuracy. This dissertation shows that substantial improvements of the existing testing and verication techniques for concurrent software can be made by constraint-based analysis of interleavings between concurrent events. The key insight is that an analysis of feasible/infeasible interleavings can be integrated into the existing techniques to achieve more ecient reasoning. Following the insight, I designed and implemented a constraint-based program analysis for the interleavings based on a declarative program analysis framework. With the analysis, I improved three testing and verication techniques. The rst technique is a modular abstract interpretation for interrupt-driven software, which drastically improved its accuracy with the analysis of infeasible interleavings between interrupt handlers. The second technique is a semantic ding method for concurrent programs, whose scalability can be signicantly enhanced with the help of the analysis of feasible interleavings. The third technique is a systematic web application testing method, for which the analysis of feasible interleavings between events can help achieve a signicantly higher xii testing coverage. We conducted experimental evaluation of all three techniques, and our result shows that the proposed constraint-based analysis can improve their eectiveness in terms of both scalability and accuracy. xiii Chapter1 Introduction Parallel computing hardware have become ubiquitous. They are in both small devices such as smartphones and laptops, and large systems such as clouds, clusters and supercomputers. The growing number of parallel computing devices requires programmers to write correct and ecient concurrent software, to be able to fully utilize these parallel computing hardware. Although it has been more than multiple decades since the concurrent software was introduced, writing good quality concurrent software that is correct and ecient is still notoriously hard. A main challenge is the interleaving between concurrent events. Due to the interleaving, the order of concurrent events is not deterministic, allowing interference of shared-memory accesses between concurrent events. Therefore, sequential reasoning is no longer eective to concurrent software. Moreover, the interleaving between concurrent events occurs non-deterministically: the interference between concurrent events might follow dierent orders in dierent executions. Also, the number of interleaving schedules is often astronomically large, i.e., exponential in the number of events. Due to an enormous number of possible interleavings, it is not practical to reason about concurrent software manually. Many concurrent bugs were found in the real world leading to serious problems. For example, data-races in Mozilla’s Firefox web-browser possibly led to reading uninitialized memory [103], buer overows [104], and use-after-free bugs [105]. Similarly, an Intel processor had a race-related concurrency bug in its 1 rmware which gave attackers access to write to the processor’s ash memory [70]. Furthermore, a race bug in Therac-25 caused a serious disaster [139]. To prevent concurrent bugs in advance, there are growing needs of ecient and scalable reasoning techniques for concurrent software. Although there are many existing techniques for testing and verication of concurrent software, they are still facing the challenges mentioned above. This motivates us to develop a more ecient and scalable framework that can help improve a wide a range of testing and verication techniques for concurrent software. 1.1 Motivation A major challenge in reasoning about concurrent software is state explosion due to a lot of interleaving between concurrent events. As mentioned above, the number of possible interleavings grow exponentially in the number of concurrent events and actions. Furthermore, since a program may behave dierently under dierent interleavings, we have to make sure that the program implements its intended behavior under all possible interleavings. Due the reasons, many test and verication techniques for concurrent software are not scale when they enumerate possible interleavings, or are not accurate when they merge possible interleavings in concurrent software. Let us use examples to show the challenges in enumerating interleavings and merging interleavings. 1.1.1 Enumerationofpossibleinterleavingsinconcurrentsoftwareisnotscalable e1() { b = z + 1; assert(b != 4); } e2() { z = y + 1; } e3() { y = x + 1; } e4() { x = 1; } Figure 1.1: An example program with four concurrent events and an assertion. 2 Figure 1.1 shows a program with four concurrent events frome1 toe4. They read and write shared variables x, y, z, and b. In the program, we want to check if the assertion statement in e1 is violated by enumerating possible interleavings. As there is only one instruction in each event, we have 4! possible schedules to consider if we assume each event is executed once (e.g.,e1-e2-e3-e4,e1-e2-e4-e3, ...). And, out of the 4! schedules only one sequencee4-e3-e2-e1 violates the assertion: x is assigned 1 ine4,y is assigned 2 ine3,z is assigned 3 ine2, andb is assigned 4 ine1. This will be the last sequence generated, if the sequence generation follows lexicographical order [8]. However, it is not practical to generate all possible schedules to test as the number of test sequences grows exponentially in number of concurrent events and the number of instructions in concurrent events. For example, 10 concurrent events with one instruction in each event have 3; 628; 800 (= 10!) sequences, and even for the small program in Figure 1.2, there are more than 400; 000 sequences to consider. As a result, it is not practical to cover all test sequences. To make it more ecient, one way is to skip some test sequences which are not useful or redundant. For example, once we teste3-e4-e2-e1, we can skipe3-e2-e4-e1 as both sequences have the same values of shared variables before executing the assertion statement ine1. Unfortunately, it is challenging to decide which test sequence to skip safely due to the shared-memory interference between concurrent events. As mentioned we can skipe3-e2-e4-e1 after testinge3-e4-e2-e1, but note4-e3-e2-e1 as it results in dierent values of shared variables. 1.1.2 Mergingpossibleinterleavingsinconcurrentsoftwareisnotaccurate e1() { 1: lock(a); 2: x = 0; 3: assert(x < 3); 4: unlock(a); } e2() { 5: x = 1; 6: c = false; } e3() { 7: lock(a); 8: if (!c) 9: x = 10; 10: x = 2; 11: unlock(a); } Figure 1.2: An example program with three concurrent events and an assertion. 3 Instead of testing all interleavings individually, some concurrency reasoning techniques merge all interleavings together before analyzing them. For example, modular abstract interpretation [101, 99] maintains abstract values of shared variables by merging possible interleaving schedules. Let us consider an example in Figure 1.2 to describe the merging of possible schedules. In the gure, there are three concurrent events named e1, e2 and e3, and they access shared variables x and c. They use lock and unlock instructions with a synchronization variablea to ensure atomic region. Our goal is to check if the assertion statement ine1 can be violated. Instead of testing all concrete interleavings which can be up to 400,000 interleavings, let us consider the possible interleavings before we execute the assertion. Figure 1.3 shows example schedules before checking the assert. For example, the assertion at line 3 might be executed right after executing line 8, line 5, line 9 or line 10. As we consider interleavings and control ow, merging possible values of x in the possible interleavings would result in possible values for the read of x in line 3. As there are four assignments forx, we might assume all of the values (i.e., 0, 1, 2 and 10) can be read from x at line 3 through the control ow and the interleavings between events. So, we assume the assertion may be violated. ::: ! ::: ! ::: ! if(!c) (line 8) ::: ! ::: ! ::: ! x = 1 (line 5) ::: ! ::: ! ::: ! x = 10 (line 9) ::: ! ::: ! ::: ! x = 2 (line 10) . . . . . . assert(x < 3) (line 3) Figure 1.3: Possible interleavings before executing line 3 in gure 1.2 Although the merge of possible interleavings oers more scalable reasoning, it may produce bogus results. In Figure 1.2, we assume the assertion may be violated especially due to the assignment at line 9 as x is updated to 10. However, the value assigned at line 9 is not read from line 3 in any concrete 4 interleavings due to the atomic region covered bylock andunlock. Specically, after executing line 9 in e3, we cannot execute any instruction ine1 until executingunlock(a) ine3 as all instructions are covered by lock in e1. Therefore, merging possible interleavings might produce bogus warnings, meaning less accurate verication results. Furthermore, it may be dicult for users to analyze the reported warnings and determine whether they are due to infeasible interleavings introduced during the merging of interleavings. 1.2 InsightandHypothesis In this section, we present our key insight that is shown through our research and the hypothesis that this dissertation tests in order to achieve the goal of improving testing and verication for concurrent programs. 1.2.1 Insight1: Analyzinginterferencesinadvanceisusefulinconcurrencyreasoning In software testing and verication, the main two actions to consider areread andwrite as the order of read and write actions may aect program states of the program to be veried or tested. In concurrent software, the interference of shared memory accesses between concurrent events occurs non-deterministically with dierent orders of reads and writes in dierent executions. To reason about them, one way is to enumerate possible execution traces for possible interferences, but it is not scalable; another way is to approximate them, but that is not accurate as discussed in the motivation. In both cases, the main problem is that they do not consider the interferences between concurrent events in advance. If we analyze interferences between concurrent events in advance, we can use them to improve the testing or verication techniques. Specically, feasible interferences help to reduce the number of interleavings to test using a method such as partial order reduction [40, 144], and infeasible interferences help to reduce the range of possible values in shared variables. For example, in Figure 1.1, changing the subsequencee2-e4 intoe4-e2 ine3-e2-e4-e1 does not change the program state at the end ase2 ande4 read and write dierent variables (i.e.,e2 readsy and writes a value toz, ande4 writes a value tox). Also, 5 Concurrent programs Concurrent reasoning techniques (e.g., testing, abstract interpretation) Constraint-based program analysis framework (Analyzing interferences) Improved reasoning Figure 1.4: Overview of the constraint-based program analysis. based on the semantics of atomic regions, we would conclude the value written tox at line 9 is never read at line 2 in Figure 1.2. In this way, various reasoning techniques can benet from interference analysis. 1.2.2 Insight2: Usingconstraint-basedprogramanalysistoanalyzeinterference Another insight is to leverage a constraint-based program analysis framework to analyze interference. Interference of concurrent events may be analyzed in a separate process that is independent of the application. Thus, instead of implementing the same analysis again and again for dierent applications, we leverage an o-the-shelf process for the analysis and plug-in it to various reasoning techniques. Figure 1.4 shows an overview of this insight. The two blocks above the dashed line represent an overview of existing reasoning techniques where concurrent programs are input for various testing and verication methods. The block under the dashed line represents the use of constraint-based program analysis framework that analyzes interferences in concurrent programs and passes the information to testing or verication techniques. Thus, we can use the same type of interference information in dierent applications without implementing the analysis again and again. 1.2.3 Hypothesis Based on the insights, we state the hypothesis of this dissertation as follows: Analyzinginterferencesbetweenconcurrenteventswithconstraint-basedprogram 6 analysishelpstoimprovetheperformanceofautomatedtestingandvericationtechniques forconcurrentsowaresignicantly To test the hypothesis, rst, I implemented a constraint-based program analysis framework by devel- oping rules to analyze feasible and infeasible interference between concurrent events. Then, I import the results of the analysis from the framework to various testing and verication techniques for concurrent software. Especially, we chose three representative concurrent reasoning techniques: modular abstract in- terpretation of interrupt-driven software, testing of web applications and semantic ding of multi-threaded programs. The evaluations are conducted by comparing the performance of each technique, with and without integrating our analysis results, in terms of both scalability and accuracy. The results demonstrated that our analysis helps to improve accuracy and scalability, and, in most cases, maintain the soundness of the techniques with negligible overhead. These results conrm the hypothesis of this dissertation. 1.3 Contributions This dissertation chooses three concurrency reasoning techniques to prove my hypothesis: modular abstract interpretation of interrupt-driven software, testing of web applications and semantic ding of multi-threaded programs. The three techniques are the applications of concurrency reasoning techniques in Figure 1.4. For each technique I develop constraint rules to analyze feasible and infeasible interference between concurrent events. Then, by integrating the analysis results with the techniques, I conduct empirical evaluations to assess the eectiveness of my approach. Let me elaborate the contributions in each technique. 1.3.1 Accuratemodularabstractinterpretationforinterrupt-drivenprograms Modular abstract interpretation has been widely used for verifying multi-threaded programs [81, 79, 101, 99, 100, 98, 102]. Unfortunately, when it is used to verify interrupt-driven programs, it provides too many false 7 positives. This dissertation proposes a more accurate modular abstract interpretation method for conducting static verication of interrupt-driven programs. First, I design and implement a constraint-based program analysis for soundly and eciently identifying and pruning infeasible interference between interrupts. Then, I integrate the method into modular abstract interpretation [101, 99] which can achieve more accurate verication results with our method. Finally, I conduct experimental evaluation on a large number of interrupt-driven programs to demonstrate the eectiveness of my method. 1.3.2 Fastandapproximatesemanticdingofmulti-threadedprograms Semantic ding of programs plays an important role in analyzing unexpected behaviors in evolving software. However, existing approaches for ding concurrent programs [15] suers from scalability issues since there are too many interleavings to consider between threads. The dissertation proposes a fast and approximate analysis to compute semantic dierences in multi-threaded programs. To achieve scalability, I design and implement rules in a constraint-based program analysis to analyze infeasible interference between threads. Then, I utilize the analysis results to compute semantic dierences between two versions of multi-threaded programs. Finally, I conduct experimental evaluation by comparing to the state-of-the-art semantic ding technique [15] to show the high accuracy and low overhead of semantic ding augmented with our analysis. 1.3.3 EcienttestingforwebapplicationswithDOM-eventdependency Testing web applications involves triggering many concurrent UI events. As there is a numerous number of events to trigger, existing testing methods such as Artemis [8] which triggers possible events systematically may not be able to achieve high code coverage. This dissertation proposes a method to to eliminate redundant tests and thus improve test coverage with DOM-event dependencies in web application testing. First, I implement a constraint-based analysis framework to statically analyze DOM-event dependencies 8 which show feasible interferences between UI events. Then, I integrate the analysis to a popular web application tool, Artemis [8]. Based on the analysis results, it soundly prunes redundant tests by utilizing partial order reduction [40, 144]. Finally, we conduct experimental evaluations on a large set of web applications to demonstrate the eciency of the static analysis method and its eectiveness in improving automated testing. 1.4 Outline The remainder of this dissertation is organized as follows. First, I present the technical background and prior work in the three testing and verication techniques in Chapter 2. Then, I present the main contribution of this dissertation in the next three chapters, namely more accurate abstract interpretation in Chapter 3, more scalable semantic ding of concurrent programs in Chapter 4 and ecient testing of web applications in Chapter 5. Lastly, I conclude the dissertation and discuss future work in Chapter 6. In the main chapters, Chapter 3 presents our method for achieving more accurate modular abstract interpretation for interrupt-driven programs by integrating an interference analysis, which is the rst application in the thesis. The work in this chapter has been published [134] in the main track of IEEE/ACM International Conference on Automated Software Engineering. Chapter 4 presents fast and approximate semantic ding of concurrent programs achieved by an interference analysis. The work in this chapter has been published [135] in main track of IEEE/ACM International Conference on Automated Software Engineering. Chapter 5 presents more ecient web application testing by an interference analysis between DOM objects in web applications. The work in this chapter has been published [133] in the main track of ACM SIGSOFT International Symposium on the Foundations of Software Engineering. 9 Chapter2 BackgroundandRelatedWork This chapter introduces the necessary background information that is used throughout the dissertation and related work in the three applications this dissertation focuses on. Section 2.1 discusses the background of the constraint-based program analysis framework. In Section 2.2, we discuss related work for each target application: verication of interrupt-driven programs, semantic ding of concurrent programs and testing of web applications. 2.1 Constraint-basedProgramAnalysis In this section, we review the basic of constraint-based program analysis framework and related work on the constraint-based program analysis. 2.1.1 Overview Constraint-based program analysis framework is typically implemented with Datalog. Datalog is a logic programming language but in recent years has been widely used for declarative program analysis [27, 149, 84, 90, 50, 53, 16, 160, 4]. 10 Input program Program facts (e.g., PO(s 1 ;s 2 ), PO(s 1 ;s 3 ), ..., PO(s 4 ;s 5 )) Inference rules (e.g., HB(a;b) PO(a;b), HB(c;e) HB(c;d)^ HB(d;e)) Constraintsolver (e.g., Datalog) Results (e.g., HB(s 1 ;s 5 );:::) Figure 2.1: Overview of the constraint-based program analysis. There are several advantages in making use of Datalog. First, a Datalog program is polynomial-time solvable. Second, the xed-point computation inside of Datalog solver maps naturally to xed-point computations in program analysis algorithms. Figure 2.1 shows an overview of the constraint-based program analysis framework with examples. The framework consists of three components: program facts, inference rules and a constraint solver. Program facts are relations that encode the structural information of the program such as orders between two instructions in a control ow graph or read/write variables with program points. Inference rules are recursive relations that are used to express xed-point algorithms. And, Constraint solver performs the xed-point computation with the inference rules and the program facts by inferring new relations. Consider a relation namedPO(a;b), which represents the program order of two immediate adjacent instructionsa andb, whileHB(c;d) meansc must happen befored. First, we write down the Datalog facts based on the control ow graph: PO(s 1 ;s 2 ),PO(s 1 ;s 3 ),PO(s 2 ;s 4 ),PO(s 3 ;s 4 ),PO(s 4 ;s 5 ). Then, we write down the Datalog inference rules: HB(a;b) PO(a;b) HB(c;e) HB(c;d)^HB(d;e) 11 Here, the left arrow ( ) separates the inferred Datalog facts on the left-hand side from the existing Datalog fact(s) on the right-hand side. The rst rule says the program-order relation implies the must-happen-before relation. The second rule says the must-happen-before relation is transitive. The last component (constraint solver) is a Datalog solver, which based on the above facts and rules, will compute the maximal set of new relations (i.e.,HB relation in the example). By sending a query to the Datalog solver, one may conrm thatHB(s 1 ;s 5 ) indeed holds whereasHB(s 2 ;s 3 ) does not hold. In this way, We can design several rules and determine the level of structure information for facts depending on applications and purposes. 2.1.2 RelatedWorkonDatalog-basedDeclarativeProgramAnalysis Datalog-based static program analysis was proposed by Whaley and Lam [149]. They introduced a frame- work for implementing points-to analyses as database queries [84] using Binary Decision Diagrams (BDDs). Livshits and Lam [90] and Naik et al. [107] used similar techniques in detecting security errors and data- races. Bravenboer and Smaragdakis [16] also formulated a points-to analysis as database queries; they solved them using a non BDD based engine. For verifying and testing multi-threaded programs, Farzan and Kincaid [30] used BDD-based Datalog to perform an interference analysis to support parameterized concurrent programs in thread-modular abstract interpretation. Kusano and Wang [81, 79] used Datalog to obtain ow-sensitivity in threads to improve the accuracy of thread-modular abstract analysis for concurrent programs under sequential consistency memory model and relaxed memory models. Also, Guo et al. [46, 45] used Datalog to extract program dependency information in a program to improve symbolic execution tool, KLEE [18]. Furthermore, Datalog analysis was used for security analysis. Guarnieri and Livshits [42] implemented a Datalog-based analysis procedure in the GATEKEEPER tool to statically enforce security polices. Wang et al. [145] leveraged Datalog to analyze power side channel leaks in compiler’s register re-allocations. Also, 12 Paulsen et al. [112] implemented a taint analysis using Datalog to track vulnerable sinks in web applications to mitigate compression side channel leaks. 2.2 RelatedWorkonTestingandVericationofConcurrentSoftware In this section, we present related work on each of the applications that the dissertation focuses on. 2.2.1 VericationofInterrupt-drivenSoftware We consider four types of approaches to verication of interrupt-driven software. The rst type is modular abstract interpretation based approaches. The second type is model checking based approaches. The third type is testing based approaches and the fourth type is other approaches not categorized by the rst three types. 2.2.1.1 ModularAbstractInterpretationBasedApproaches Modular abstract interpretation, abstract interpretation [24] based verication technique for concurrent programs, was used by Miné [101]. An extension of his initial work [102, 98, 100] was used to prove the absence of data-races, deadlocks, and other runtime errors in real-time software by adding priorities while targeting the OSEK/AUTOSAR operating systems. Specically, it tracks the eectiveness of mutex, yield and scheduler state based on execution traces to gure out reachability, while using priorities to make the analysis more accurate. However, the technique has not been thoroughly evaluated on practical benchmarks. Regehr et al. [118] proposed to use context-sensitive abstract interpretation of machine code to guarantee stack-safety for interrupt-driven programs. 13 2.2.1.2 ModelCheckingBasedApproaches Wu et al. [151] leveraged (bounded) model checking tools to detect data-races in interrupt-driven programs. Kroening et al. [77] also improved the CBMC bounded model checker to support the verication of interrupt- driven programs. However, they only search for a bounded number of execution steps, and thus cannot prove the validity of assertions. Wu et al. [150] also proposed a source-to-source transformation technique similar to Regehr [117]: it sequentializes interrupt-driven programs before feeding them to a verication tool. However, due to the bounded nature of the sequentialization process, the method is only suitable for detecting violations but not for proving the absence of such violations. There are also formal verication techniques for embedded software based on model checking [59, 154, 143, 141, 47, 127, 140]. For example, Schlich and Brutschy [127] proposed the reduction of interrupt handler points based on partial order reduction when model checking embedded software. Vórtler et al. [140] proposed, within the Contiki system, a method for modeling interrupts at the level of hardware- independent C source code and a new modeling approach for periodically occurring interrupts. Then, they verify programs with interrupts using CBMC, which is again a bounded model checker. This means the technique is also geared toward detecting bugs and thus cannot prove properties. Furthermore, since it models the periodical interrupt invocation only, the approach cannot deal with non-periodic invocations. 2.2.1.3 TestingBasedApproaches Regehr [116] proposed a testing framework that schedules the invocation of interrupts randomly. Higashi and Inoue [55] leveraged a CPU emulator to systematically trigger interrupts to detect data races in interrupt-driven programs. However, it may be practically infeasible to cover all interleavings of interrupts using this type of techniques. 14 Wang et al. [146, 147] proposed a hybrid approach that combines static program analysis with dynamic simulation to detect data races in interrupt-driven programs. Although the approach is useful for detecting bugs, it cannot be used to obtain proofs, i.e., proofs that assertions always hold. 2.2.1.4 OtherApproaches Schwarz and Müller-Olm [128] proposed a static analysis technique for programs synchronized via the priority ceiling protocol. The goal is to detect synchronization aws due to concurrency induced by interrupts, especially for data races and transactional behavior of procedures. However, it is not a general- purpose verication procedure and cannot prove the validity of assertions. Kotker and Seshia [76] extended a timing analysis procedure from sequential programs to interrupt-driven programs with a bounded number of context switches. As such, it does not analyze all behaviors of the interrupts. Furthermore, the user needs to come up with a proper bound of the context switches and specify the arrival time for interrupts. 2.2.2 SemanticDingofSequentialandConcurrentPrograms In this section, we present related work of statically computing the semantic dierences of sequential and concurrent programs. 2.2.2.1 SemanticDingofSequentialPrograms For sequential programs, Jackson and Ladd [60] proposed a method for computing the semantic dierences by summarizing and comparing the dependencies between input and output. Godline and Strichman [41] proposed the use of inference rules to prove the equivalence of two programs. In the SymDi project, Lahiri et al. [82, 83] developed a language-agnostic assertion checking tool for computing the dierences of imperative programs. In the context of incremental symbolic execution [113], various change-impact analysis techniques were used to identify instructions that are aected by code modication and use 15 the information to compute the corresponding test inputs [96]. However, these methods are not directly applicable to concurrent programs. 2.2.2.2 SemanticDingofConcurrentPrograms For concurrent programs, Joshi et al. [69] proposed the use of failure frequencies of assertions to compare two programs, while the general framework of renement checking [2] could also be applied to traces of two programs. However, these techniques are limited to individual executions. Change-impact analysis [87] were also applied to concurrent programs, e.g., in regression testing [157], prioritized scheduling [61], and incremental symbolic execution [45]. However, these techniques focus on reducing the cost of testing and analysis as opposed to identifying the synchronization dierences. Bouajjani et al. [15] computes the dierences between partial data-ow dependencies of two concurrent programs using a bounded model checker. However, the method is costly; furthermore, it requires code instrumentation to insert assertions so they can be veried using a model checker. For example, it took about 30 minutes for a program that can be analyzed by our method in less than a second. 2.2.3 TestingWebApplications Zheng et al. [161] developed a method for modeling AJAX APIs to check possible bugs when there are asynchronous requests. Meyerovich and Livshits [97] also developed a method for ne-grained security policies in the browser. Jensen et al. [65] proposed a type inference algorithm for JavaScript based web applications, which tracks DOM elements and browser APIs based on their IDs and types but did not compute the dependency relations. Feldthaus et al. [31] proposed a method for constructing approximate call graphs but completely ignored their interactions with the DOM. Madsen et al. [93] proposed a static analysis procedure that can infer the behavior of framework APIs but it targeted JavaScript-based applications in 16 Windows 8 only. Madsen et al. [94] developed a static analysis procedure for the event-driven Node.js applications but they were server-side applications as opposed to client-side applications. The methods proposed by Arlt et al. [7] and Cheng et al. [21] for testing Java-based GUI applications are signicantly dierent in that they do not model the registration, modication, and removal of event handlers (the focus of our work). Instead, they assumes that all event handlers are pre-installed, and thus focuses on analyzing only data dependencies between these handlers. This assumption may be reasonable for some Java-based GUI frameworks, but is not valid for JavaScript-based web applications. There is a large body of work on pointer analysis, ow analysis, and type inference for JavaScript, which were not based on Datalog or designed specically for analyzing interactions with the HTML DOM. For example, Chugh et al. [22] proposed a staged information ow analysis for JavaScript to detect certain security violations in client-side code. Sridharan et al. [132] proposed a technique called correlation tracking to improve points-to analysis. Guha et al. [43, 44] proposed a static ow analysis for detecting AJAX intrusions, and typing local control and state. Wei and Ryder [148] developed a set of blended analysis tools, using both dynamic and static analyses to improve the points-to analysis. Andreasen and Møller [6] extended the TAJS analysis framework by adding a static dataow analysis to infer and exploit determinacy information; this improves the type inference and call-graph construction for JavaScript programs using jQuery. TAJS itself builds upon the classic monotone framework of Kam and Ullman [71] using a specialized analysis lattice structure. Alimadadi et al. [5] developed a change impact analysis capturing the interplay between the JavaScript code changes and the HTML DOM, but their method is valid only for the given dynamic execution, whereas our method is static and therefore valid for all possible executions. In addition to improving the performance of Artemis, the result of our static DOM event dependency analysis may be used to improve a wide range of dynamic analysis tools. For example, theSymJS tool of Li et al. [88] relies on symbolic execution to generate test inputs for JavaScript based web applications, but does not leverage the result of any static dependency analysis procedure. The Kudzu tool of Saxena et 17 al. [126] uses a virtual machine based symbolic execution procedure to analyze client-side JavaScript code injection. TheJalangi tool developed by Sen et al. [129] provides a generic framework for implementing dynamic analysis techniques for JavaScript, e.g., concolic testing, but lacks the capability of conducting static analysis analysis. Nguyen et al. [108] proposed a delta-debugging based method for reducing the redundant parts of a test case generated by a symbolic execution tool. Jensen et al. [63] developed a stateless model checking tool for systematic testing of event-driven applications with a xed data input. However, these methods focus on dynamic analysis, whereas our work focuses on static analysis and therefore is complementary. 18 Chapter3 AccurateModularVericationofInterrupt-drivenSoftware In this chapter, we propose a static verication tool geared toward proving the absence of bugs based on abstract interpretation [24]. The main advantage of abstract interpretation is the sound approximation of complex constructs such as loops, recursions and numerical computations. However, although abstract interpretation techniques have been successfully applied to sequential [25] and multi-threaded software [98], they have not been able to precisely model the semantics of interrupt-driven software. Interrupts have been widely used in safety-critical embedded computing systems, information pro- cessing systems, and mobile systems to interact with hardware and respond to outside stimuli in a timely manner. However, since interrupts may arrive non-deterministically at any moment to preempt the normal computation, they are dicult for developers to reason about. The situation is further exacerbated by the fact that interrupts often have dierent priority levels: high-priority interrupts may preempt low-priority interrupts but not vice versa, and interrupt handlers may be executed in a nested fashion. Overall, methods and tools for accurately modeling the semantics of interrupt-driven software are still lacking. Broadly speaking, existing techniques for analyzing interrupts fall into two categories. The rst category consists of techniques based on testing [116, 55], which rely on executing the program under various interrupt invocation sequences. Since it is often practically infeasible to cover all combinations of interrupt invocations, testing will miss important bugs. The second category consists of static verication 19 techniques such as model checking [17, 127, 151, 140, 77], which rely on constructing and analyzing a formal model. During the modeling process, interrupt-related behaviors such as preemption are considered. Unfortunately, existing tools such as iCBMC [77] need to bound the execution depth to remain ecient, which means shallow bugs can be detected quickly, but these tools cannot prove the absence of bugs. At the high level, interrupts share many similarities with threads, e.g., both interrupt handlers and thread routines may be regarded as sequential programs communicating with others via the shared memory. However, there are major dierences in the way they interleave. For example, in most of the existing verication tools, threads are allowed to freely preempt each other’s execution. In contrast, interrupts often have various levels of priority: high-priority interrupts can preempt low-priority interrupts but not vice versa. Furthermore, interrupts with the same level of priority cannot preempt each other. Thus, the behavior manifested by interrupts has to be viewed as a subset of the behavior manifested by threads. To accurately analyze the behavior of interrupts, we develop IntAbs, an iterative abstract interpretation framework for interrupt-driven software. That is, the framework always analyzes each interrupt handler in isolation before propagating the result to other interrupt handlers and the per interrupt analysis is iterated until results on all interrupt handlers stabilize, i.e., they reach a xed point. Thus, in contrast to traditional techniques, it never constructs the monolithic verication model that often causes exponential blowup up front. Due to this reason, our method is practically more ecient than these traditional verication techniques. The IntAbs framework also diers from prior techniques for statically analyzing interrupt-driven software, such as the source-to-source transformation-based testing approach proposed by Regehr [117], the sequentialization approach used by Wu et al. [150], and the model checking technique implemented in iCBMC [77]. For example, none of these existing techniques can soundly handle innite loops, nested invocations of interrupts, or prove the absence of bugs. Although some prior abstract interpretation techniques [100] over-approximate of the interrupt behavior, they are either non-modular or too inaccurate, 20 e.g., by allowing too many infeasible store-to-load data ows between interrupts. In contrast, our approach precisely models the preemptive scheduling of interrupts to identify apparently-infeasible data ows. As shown in Fig. 3.1, by pruning away these infeasible data ows, we can drastically improve the accuracy of the overall analysis. IntAbs provides not only a more accurate modeling of the interrupt semantics but also a more ecient abstract interpretation framework. We have implemented IntAbs in a static analysis tool for C/C++ programs, which uses Clang/LLVM [3] as the front-end, Apron [62] for implementing the numerical abstract domains, andZ [56] for checking the feasibility of data ows between interrupts. We evaluated IntAbs on 35 interrupt-driven applications with a total of 22,541 lines of C code. Our experimental results show that IntAbs can eciently as well as more accurately analyze the behavior of interrupts by removing a large number of infeasible data ows between interrupts. In summary, the main contributions of this chapter are: • A new abstract interpretation framework for conducting static verication of interrupt-driven pro- grams. • A method for soundly and eciently identifying and pruning infeasible data ows between interrupts. • The implementation and experimental evaluation on a large number of benchmark programs to demonstrate the eectiveness of the proposed techniques. The remainder of this chapter is organized as follows. We rst discuss background and motivation of this chapter in Section 3.1. Next, we present our new method for checking the feasibility of data ows between interrupts in Section 3.2, followed by our method for integrating the feasibility checking with abstract interpretation in Section 3.3. We present our experimental evaluation in Section 3.4. Lastly, we summarize the chapter in Section 3.5. 21 Interrupt Driven Program Query Abstract Interpretation of Each Interrupt Invariants Feasibility Checking (µZ) Datalog Rules Datalog Facts Propagating Results to Other Interrupts Figure 3.1: IntAbs – iterative verication framework for interrupt-driven programs. 3.1 BackgroundandMotivation In this section, we describe how interrupt-driven programs are modeled in our framework by comparing their behavior to the behavior of threads. Then, we review the basics of prior abstract interpretation techniques which we target to improve. After that, we use examples to illustrate the problems of prior techniques such as testing, model checking, and thread-modular abstract interpretation. Finally, we explain how our method overcomes these problems with the examples. 3.1.1 ModelingofInterrupts We consider an interrupt-driven program as a nite setT =fT 1 ;:::;T n g of sequential programs. Each sequential programT i , where 1in, denotes an interrupt handler. For ease of presentation, we do not distinguish between the main program and the interrupt handlers. Globally, sequential programs inT are executed in a strictly interleaved fashion. Each sequential program may access its own local variables; in 22 run0() { stmt1; stmt2; } run1() { stmt3; stmt4; } • Possible traces for interrupts: – stmt1 ! stmt2 ! stmt3 ! stmt4 – stmt1 ! stmt3 ! stmt4 ! stmt2 • Possible traces for threads: – stmt1 ! stmt2 ! stmt3 ! stmt4 – stmt1 ! stmt3 ! stmt4 ! stmt2 – stmt1 ! stmt3 ! stmt2 ! stmt4 Figure 3.2: The interleavings (afterstmt1) allowed by interrupts and threads. addition, it may access a set of global variables, through which it communicates with the other sequential programs inT . The interleaving behavior of interrupts is a strict subset of the interleaving behavior of threads (c.f. [77]). This is because concurrently running threads are allowed to freely preempt each other’s executions. However, this is not the case for interrupts. Consider the example program in Fig. 3.2, which has two functions named run0 and run1. If they were interrupts, whererun1 has a higher priority level thanrun0, then after executingstmt1, there can only be two possible traces. The rst one is for run1 to wait until run0 ends, and the second one is for run1 to preemptrun0. If they were threads, however, there can be three possible traces after executing stmt1. In addition to the traces allowed by interrupts, we can also executestmt3 inrun1, then execute stmt2 inrun0, and nally executestmt4 inrun1. The third trace is infeasible for interrupts because the high-priorityrun1 cannot be preempted bystmt2 of the low-priorityrun0. Since the interleaving behavior of interrupts is a strict subset of the interleaving behavior of threads, it is always safe to apply a sound static verication procedure designed for threads to interrupts. If the 23 verier can prove the absence of bugs by treating interrupts as threads, then the proof is guaranteed to be valid for interrupts. The result of this discussion can be summarized as follows: Theorem1 Since the interleaving behavior of interrupts is a subset of the interleaving behavior of threads, proofs obtained by any sound abstract interpretation over threads remain valid for interrupts. However, the reverse is not true: a bug reported by the verier that treats interrupts as threads may not be a real bug since the erroneous interleaving may be infeasible. In practice, there are also tricky corner cases during the interaction of interrupts, such as nested invocations of handlers, which call for a more accurate modeling framework for interrupts. Interrupts may also be invoked in a nested fashion as shown by case3 in Fig. 3.4, which complicates the static analysis. Here, we say interrupts are nested when one’s handler function is invoked before another’s handler function returns, and the third handler function is invoked before the second handler function returns. Such nested invocations are possible, for example, if the corresponding interrupts have dierent priority levels, where the inner most interrupt has the highest priority level. This behavior is dierent from thread interleaving; with numerous corner cases, it requires the development of dedicated modeling and analysis techniques. 3.1.2 ModularAbstractInterpretation Existing methods for modular abstract interpretation are designed almost exclusively for multi-threaded programs [101, 99, 98, 81]. Typically, the analyzer works on each thread in isolation, without creating a monolithic verication model as in the non-modular techniques [30, 150], to avoid the up-front complexity blowup. At a high level, the analyzer iterates through threads in two steps: (1) analyzing each thread in isolation, and (2) propagating results from the shared-memory writes of one thread to the corresponding reads of other threads. 24 Let the entire programP be a nite set of threads, where each threadT is represented by a control- ow graphhN;n 0 ;i with a set of nodesN, an entry noden 0 , and the transition relation. Each pair (n;n 0 )2 means control may ow fromn ton 0 . Each noden is associated with an abstract memory-state over-approximating the possible concrete states atn. We assume the abstract domain (e.g., intervals) is dened as a lattice with appropriate top (>) and bottom (?) elements, a partial-order relation (v), and widening/narrowing operators to ensure that the analysis eventually terminates [24]. We also dene an interferenceI that maps a variablev to the values stored intov by some threadT . Algorithm 1 shows how a thread-local analyzer works onT assuming some interferencesI provided by the environment (e.g., writes in other threads). It treatsT as a sequential program. LetS(n) be the abstract memory-state at noden,n 0 be the entry node ofT , andW be the set of nodes inT left to be processed. The procedure keeps removing noden from the work-listW and processing it untilW is empty (i.e., a xed point is reached). If noden corresponds to a shared-memory read of variablev, then the transfer functiontfunc (Line 7) assumes that n can read either the local value (from S) or the value written by another thread (the interferenceI(v)). The transfer functiontfunc of an instructionn takes some memory-state as input and returns a new memory-state as output; the new memory-state is the result of executing the instruction in the given memory-state. Otherwise, ifn is a local read, in which case the transfer functiontfunc uses the local memory-state (Line 9) as in the abstract interpretation of any sequential program. The analysis result (denotedS) is an over-approximation of the memory states withinT assuming interferencesI. The procedure that analyzes the entire program is shown in Algorithm 2. It rst analyzes each thread, computes the interferences, and then analyzes each thread again in the presence of these interferences. The iterative process continues until a xed point on the memory-states of all threads is reached. Initially, S maps each node in the program to an empty memory-state?. S 0 contains the analysis results after one iteration of the xed-point computation. The functionInterf returns the interferences of threadT , 25 Algorithm1 Local analysis ofT with prior interferencesI. 1: functionAnalyzeLocal(T =hN;n 0 ;i;I) 2: S ? . Map from nodes to states 3: W fn 0 g . Set of nodes to process 4: while9n2W do 5: W Wnfng 6: ifn is a shared-memory read of variablev then 7: s tfunc(n;S(n)tI(v)) 8: else 9: s tfunc(n;S(n)) 10: forallhn;n 0 i2 such thats6vS(n 0 )do 11: S(n 0 ) S(n 0 )ts 12: W W[fn 0 g 13: returnS Algorithm2 Analysis of the entire program, i.e., a set ofT ’s. 1: functionAnalyzeProg(P ) 2: S map all nodes to? 3: S 0 S 4: repeat 5: S =S 0 6: forallT2P do 7: I U Interf(T 0 ;S) for eachT 0 2P;T 0 6=T 8: S 0 S 0 ]AnalyzeLocal(T;I) 9: untilS 0 =S 10: functionInterf(T =hN;n 0 ;i, S) 11: I ? 12: foralln2N do 13: ifn is a shared memory write to variablev then 14: I(v) I(v)ttfunc(n;S(n)) 15: returnI i.e., a map from some variablev to all the (abstract) values stored intov byT . Each threadT is analyzed in isolation by the loop at Lines 4–9. Here, we use] to denote the join (t) of all memory-states on the matching nodes. Thisthread-modular abstract interpretation framework, while more ecient than monolithic verication, is potentially less accurate. For example, a loadl may see any value written into the shared memory by a stores even if there does not exist a path in the program wherel observess. This is why, as shown in Table 3.1, techniques such as [98, 81] cannot obtain proofs for the programs in Fig. 3.3 and Fig. 3.6. In 26 irq_H() { ... assert(y==0); } irq_L() { x = 0; assert(x==0); } irq_M() { y = 1; x = 1; assert(x==1); } Figure 3.3: An example program with three interrupt handlers and assertions. the context of interrupt handlers with priorities, it means that even infeasible store-to-load ows due to priorities may be included in the analysis, thus causing false alarms. Later in this chapter, we show how to introduce priorities into the propagation of data ows between interrupts during the analysis, thereby increasing the accuracy while retaining its eciency. 3.1.3 MotivatingExamples Consider the example program in Fig. 3.3, which has three interruptsirq_H,irq_L andirq_M. The suxH is used to denote high priority,L for low priority, andM for medium priority. Interrupts with higher priority levels may preempt interrupts with lower priority levels, but not vice versa. Inside the handlers, there are two variablesx andy, which are set to0 initially. Among the three assertions, the rst two may fail, while the last one always holds. 3.1.3.1 ExamplesforTestingApproaches Testing an interrupt-driven program requires the existence of interrupt sequences, which must be generated a priori. In Fig. 3.3, for example, since the interrupt handlers have dierent priority levels, we need to consider preemption while creating the test sequences. Since a high-priority interrupt handler may preempt, at any time, the execution of a medium- or low-priority interrupt handler, whenirq_L is executing,irq_H may be interleaved in between its instructions. 27 Fig. 3.4 shows four of the possible interrupt sequences for the program. Specically, case0 is the sequential execution of the three handler functions; case1 shows that irq_H preempts irq_M, followed byirq_L; case2 is similar, except thatirq_L executes rst and then is preempted byirq_H, followed by irq_M; and case3 is the nested case whereirq_L is preempted byirq_M and then byirq_H. irq_H irq_M irq_L irq_M irq_H irq_L case0 case1 irq_L irq_M irq_H case3 irq_L irq_H irq_M case2 Figure 3.4: Some possible interrupt sequences for the program in Fig. 3.3. The main problem of testing is that there can be too many such interrupt sequences to explore. Even if we can somehow guarantee that each interrupt handler is executed only once, the total number of test sequences can be enormously large, even for small or medium-sized programs. 28 3.1.3.2 ExamplesforModelCheckingApproaches Model checking tools such as CBMC [23] may be used to search for erroneous interrupt sequences, e.g., those leading to assertion violations. For instance, in the running example, all assertions hold under the sequencescase0 andcase2 in Fig. 3.4. This is because, althoughirq_H preemptsirq_L, they access dierent variables and thus do not aect the assertion conditions, whileirq_M checks the value of x after assigning 1 tox. In case1, however, the execution order of the three interrupt handlers is dierent, thus leading to an assertion violation insideirq_H. More specically,irq_M is preempted byirq_H at rst. Then, after both completes,irq_L is executed. So, the change of y may aect the read of y inirq_H, leading to the violation. Finally, in case3, both of the rst two assertions may be violated, because the check of x inirq_L and the check of y inirq_H can be aected byirq_M’s own assignments of x andy. Although bounded model checking can quickly nd bugs, e.g., the assertion violations in Fig. 3.3, the depth of the execution is often bounded, which means in practice, tools such as iCBMC [77] cannot prove the absence of bugs. 3.1.3.3 ExamplesforModularAbstractInterpretationApproaches Abstract interpretation is a technique designed for proving properties, e.g., assertions always hold. Unfortu- nately, existing methods based on abstract interpretation are mostly designed for threads as opposed to interrupts. Since threads interact with each other more freely than interrupts, these methods are essentially over-approximated analysis. As such, they may still be leveraged to prove properties in interrupt-driven programs, albeit in a less accurate fashion. That is, when they prove an assertion holds, the assertion indeed holds; but when they cannot prove an assertion, the result is inconclusive. For the running example in Fig. 3.3, for instance, existing abstract interpretation techniques such as Miné [98, 101], designed for analyzing threads, cannot prove any of the three assertions. To see why, let us 29 assert(y==0) y = 0; initialized value y = 1; from irq_M assert(x==0) x = 0; from irq_L x = 1; from irq_M assert(x==1) x = 0; from irq_L x = 1; from irq_M Feasible Infeasible Figure 3.5: Some possible store-to-load data ows during abstract interpretation. rst assume that interrupt handlers are thread routines. During thread-modular abstract interpretation, the verication procedure would rst gather all possible pairs of load and store instructions with respect to the global variables, as shown in Fig. 3.5, where each assertion has two possible loads. Specically, the load of y inirq_H corresponds to the initial value0 and the store inirq_M. The load of x inirq_L corresponds to the stores inirq_L andirq_M. The load of x inirq_M corresponds to the stores inirq_L andirq_M. Since these existing methods [98, 101] assume that all stores may aect all loads, they would incorrectly report that all three assertions may fail. For example, it reports that the load of x inirq_M may (incorrectly) read from the storex=0 inirq_L despite thatirq_L has a lower priority and thus cannot preemptirq_M. In contrast, our new method can successfully prove the third assertion. Specically, we model the behavior of interrupts with dierent levels of priorities. Due to the dierent priority levels, certain store-to-load data ows are no longer feasible, as shown by the stores marked by red boxes in Fig. 3.5: these two stores have lower priority than the corresponding load in the assertions. 30 irq_M() { if (...) y = 0; y = 1; assert(x==1); } irq_L() { ... y = 1; ... assert(y==1); } irq_H() { if (...) x = 0; x = 1; assert(y==1); } Figure 3.6: An example program with three interrupt handlers, where the rst two assertions alway hold but the last assertion may fail. 3.1.3.4 ExamplesforInterruptSemantic-awareModularAbstractInterpretation Modeling the priority levels alone, however, is not enough for proving all assertions because even without preemption, a low-priority interrupt may aect a high-priority interrupt. Consider the rst red box in Fig. 3.5. Althoughy=1 fromirq_M cannot aect the load of y inirq_H through preemption, if irq_H is invoked afterirq_M ends,y can still get the value1, thus leading to the assertion violation. Therefore, our new verication procedure has to consider all possible sequential interleavings of the interrupt handlers as well. Now, consider the program in Fig. 3.6, which has three interrupt handlersirq_M,irq_L andirq_H. In these handler functions, there are two global variablesx andy, which are set to0 initially. Among the three assertions, the rst two always hold, whereas the last one may fail. For ease of comprehension, we assume the computer hardware running this program provides the sequentially consistent memory [85, 159, 79]. Note thatirq_M has two stores of y, one inside the conditional branch and the other outside, andirq_H has two stores of x, one inside the conditional branch and the other outside. With prior thread-modular analysis [101, 98, 81], all three assertions may fail because the store y=0 inirq_M may be interleaved right before the assertions inirq_L andirq_H. Furthermore, the storex=0 inirq_H executed beforeirq_M may lead to the violation of the assertion inirq_M. In contrast, with our precise modeling of the interrupt behavior, the new method can prove that the rst two assertions always 31 hold. Specically, the assertion inirq_L holds because, even if it is preempted byirq_M, the value of y remains1. Similarly, the assertion inirq_M holds because, even if it is preempted byirq_H, the storex=1 post-dominates the storex=0, meaning the value of x remains1 afterirq_H returns. Table 3.1: Comparing IntAbs with testing and prior verication methods on the programs in Fig. 3.3 and Fig. 3.6. Property Testing [116, 55] Model checking (bounded) [77, 150] Abs. Int. for threads [98, 81] IntAbs for interrupts (new) Assertion inirq_H (Figure 3.3) violation violation warning warning Assertion inirq_L (Figure 3.3) violation violation warning warning Assertion inirq_M (Figure 3.3) warning (bogus) proof Assertion inirq_M (Figure 3.6) warning (bogus) proof Assertion inirq_L (Figure 3.6) warning (bogus) proof Assertion inirq_H (Figure 3.6) violation violation warning warning In contrast, the assertion inirq_H may fail if irq_H preemptsirq_M right after the conditional branch that setsy to0. This particular preemption is feasible becauseirq_H has a higher priority thanirq_M. Therefore, our new method has to consider not only the dierent levels of priority of all interrupts, but also the domination and post-domination relations within each handler. It decides the feasibility of store-to-load data ows based on whether a load has a dominated store, whether a store has a post-dominated store, and whether a load-store pair is allowed by the priority levels of the interrupts. We present the details of this feasibility-checking algorithm in Section 3.2. 3.1.3.5 SummaryoftheExamples To sum up, the main advantages of IntAbs over state-of-the-art techniques are shown in Table 3.1. Specif- ically, testing and (bounded) model checking tools are good at detecting bugs (e.g., assertion violations) 32 but cannot prove the absence of bugs, whereas thread-modular abstract interpretation tools are good at obtaining proofs, but may report many false positives (i.e., bogus warnings). In contrast, our new abstract interpretation method is signicantly more accurate. It can obtain more proofs than prior techniques and, at the same time, can signicantly reduce the number of bogus warnings. 3.2 Constraint-basedCheckingofInfeasibleInterferences In this section, we present our method for precisely modeling the priority-based interleaving semantics of interrupts, and deciding the feasibility of store-to-load data ows between interrupts. If, for example, a certain store-to-load data ow is indeed not feasible, it will not be propagated across interrupts in Algorithm 2. More formally, given a set of store-to-load pairs, we want to compute a new MustNotReadFrom relation, such thatMustNotReadFrom(l;s), for any loadl and stores, means if we respect all the other existing store-to-load pairs, then it would be infeasible forl to get the value written bys. We have developed a Datalog-based declarative program analysis procedure for computing Must- NotReadFrom. Toward this end, we rst generate a set of Datalog facts from the program and the given store-to-load pairs. Then, we generate a set of Datalog rules, which infer the new MustNotReadFrom relation from the Datalog facts. Finally, we feed the facts together with the rules to an o-the-shelf Datalog engine, which computes theMustNotReadFrom relation. In our implementation, we used the-Z Datalog engine [56] to solve the Datalog constraints. Before presenting the rules, we dene some relations: • Dom(a;b): statementa dominatesb in the CFG of an interrupt handler function. • PostDom(a;b): statementa post-dominatesb in the CFG of an interrupt handler function. • Pri(s;p): statements has the priority levelp. 33 • Load(l;v):l is a load of global variablev. • Store(s;v):s is a store to global variablev. Dominance and post-dominance are eciently computable [32] within each interrupt handler (not across interrupt handlers). Priority information for each interrupt handler, and thus all its statements, may be obtained directly from the program. Similarly,Load andStore relations may be directly obtained from the program. Next, we present the rules for inferring three new relations: NoPreempt,CoveredLoad andInter- ceptedStore. 3.2.1 RulesforNoPreempt The relation meanss 1 cannot preempts 2 , wheres 1 ands 2 are instructions in separate interrupt handlers. From the interleaving semantics of interrupts, we know a handler may only be preempted by another handler with a higher priority. Thus, NoPreempt(s 1 ;s 2 ) Pri(s 1 ;p 1 )^Pri(s 2 ;p 2 )^ (p 2 p 1 ) Here, Pri(s 1 ;p 1 ) meanss 1 belongs to a handler with priorityp 1 , and Pri(s 2 ;p 2 ) meanss 2 belongs to a handler with priorityp 2 . Ifp 1 is not higher thanp 2 , thens 1 cannot preempts 2 . 3.2.2 RulesforCoveredLoad The relation means a loadl of a variablev is covered by a stores tov inside the same interrupt handler; this is the case whens occurs beforel along all program paths. This is captured by the dominance relation in the corresponding control ow graph: 34 CoveredLoad(l) Load(l;v)^Store(s;v)^Dom(s;l) 3.2.3 RulesforInterceptedStore The relation is similar toCoveredLoad. We say a stores 1 is intercepted by another stores 2 ifs 2 occurs after s 1 along all program paths in the same handler. Intuitively, the value written bys 1 is always overwritten bys 2 before the handler terminates. Formally, InterceptedStore(s 1 ) Store(s 1 ;v)^Store(s 2 ;v) ^PostDom(s 2 ;s 1 ) Finally, the MustNotReadFrom relation is deduced using all aforementioned relations including NoPreempt,CoveredLoad andInterceptedStore. It indicates that, under the current situation (dened by the set of existing store-to-load data ows), a loadl cannot read from a stores in any feasible interleaving. There are several cases: First, we say a loadl covered by a store in a handlerI cannot read from a stores intercepted by another store in a handlerI 0 , becausel cannot read froms using any preemption or by runningI andI 0 sequentially. MustNotReadFrom(l;s) CoveredLoad(l)^Load(l;v) ^Store(s;v)^InterceptedStore(s) Second, we say a loadl covered by a stores in a handlerI cannot read from any stores 0 that cannot preemptI, because the value ofs 0 will always be overwritten bys. That is, sinces 0 cannot preemptI, it cannot execute in betweens andl. 35 MustNotReadFrom(l;s) CoveredLoad(l)^Load(l;v) ^Store(s;v)^NoPreempt(s;l) Third, we say that, if a stores is intercepted in a handlerI, then a loadl of the same variable that cannot preempts, cannot read from the value stored bys. This is because the store interceptings will always overwrite the value. MustNotReadFrom(l;s) InterceptedStore(s)^Store(s;v) ^Load(l;v)^NoPreempt(l;s) 3.2.4 RulesonExamples To help understand howMustNotReadFrom is deduced from the Datalog rules and facts, we provide a few examples. For ease of comprehension, we show in Table 3.2 howMustNotReadFrom may be deduced from InterceptedStore,CoveredLoad andNoPreempt. Since all stores are either in or outsideIntercept- edStore, and all loads are either in or outsideCoveredLoad, our rules capture theMustNotReadFrom relation between all stores and loads. Table 3.2: MustNotReadFrom rules based on InterceptedStore, CoveredLoad and Priority InterceptedStore(s) NotInterceptedStore(s) CoveredLoad(l) MustNotReadFrom(l,s) 1) NotNoPreempt(s,l)! PossiblyReadFrom(l,s) 2)NoPreempt(s,l)! MustNotReadFrom(l,s) NotCoveredLoad(l) 1)NoPreempt(l,s)! PossiblyReadFrom(l,s) MustNotReadFrom(l,s) 2) NotNoPreempt(l,s)! PossiblyReadFrom(l,s) 36 Specically, if a stores isInterceptedStore and a loadl isCoveredLoad, there is no way for the load to read from the store (Row 2 and Column 2). If a loadl is not CoveredLoad and a stores is not InterceptedStore, the load may read from the store by running sequentially or via preemption (Row 3 and Column 3). If a loadl isCoveredLoad, a stores is notInterceptedStore and the handler of the store can preempt the handler of the load, the load may read from the store through preemption (the rst case at Row 2 and Column 3). However, if the handler of the store cannot preempt the handler of the load, it is impossible for the load to read from the store; in this case, the load always reads from the store in the same interrupt handler (the second case at Row 2 and Column 3). Lastly, if a loadl is not CoveredLoad, a stores is InterceptedStore, and the handler of the load cannot preempt the handler of the store, the load cannot read from the store since the value of the store is always overwritten by another store in the same handler (the rst case at Row 3 and Column 2). However, if the handler of the load can preempt the handler of the store, then the load can read from the store through preemption in between two stores (the second case at Row 3 and Column 2). irq0() { store(x) load(x) } irq1() { store(x) store(x) } (a) CoveredLoad &InterceptedStore irq0() { store(x) load(x) } irq1() { ... store(x) } (b) CoveredLoad & NotInterceptedStore irq0() { ... load(x) } irq1() { store(x) store(x) } (c) NotCoveredLoad &InterceptedStore irq0() { ... load(x) } irq1() { ... store(x) } (d) NotCoveredLoad & NotInterceptedStore Figure 3.7: Examples for each case in Table 3.2. 37 Fig. 3.7 shows concrete examples of the four cases presented in Table 3.2, where the gures correspond to the cases. Fig. 3.7(a) represents the case at Row 2 and Column 2 in Table 3.2 and Fig. 3.7(b) represents the case at Row 2 and Column 3. In both programs, only the interference between bold-style statements are considered. Fig. 3.7(a) shows an interference between CoveredLoad and InterceptedStore. Since the load in irq0 is always overwritten by another store in it and a value of the rst store inirq1 is always updated by a store in it, the load cannot read a value from the store by preemption or running sequentially. Fig. 3.7(b) shows an interference between CoveredLoad and not InterceptedStore. In this case, if irq1 can preempt irq0, then the store from irq1 can occur between the store and the load in irq0. Otherwise, the load fromirq0 cannot read a value from the store fromirq1. Fig. 3.7(c) shows an interference between notCoveredLoad andInterceptedStore. Similarly, if irq0 can preemptirq1, the load fromirq0 can occur between the two stores fromirq1. Thus, it is possible for the load to read a value from the rst store inirq1. Otherwise, the load cannot read a value from the store by preemption of irq1 or running sequentially. Fig. 3.7(d) shows an interference between notCoveredLoad and notInterceptedStore. Here, the load inirq0 can read a value from the store inirq1 by running sequentially or through preemption. Therefore, it is possible for the load to read a value from the store as described at Row 3 and Column 3 in Table 3.2. To sum up, we can use the three inference rules to determine infeasible store-to-load pairs for all these cases. 3.2.5 SoundnessoftheAnalysis By soundness, we mean theMustNotReadFrom relation deduced from the Datalog facts and rules is an under-approximation. That is, any pair (l;s) of load and store in this relation is guaranteed to be infeasible. 38 However, we do not attempt to identify all the infeasible pairs because the goal here is to quickly identify some infeasible pairs and skip them during the more expensive abstract interpretation computation. Theorem2 WheneverMustNotReadFrom(l;s)holds,theloadlcannotreadfromthestoresonanyconcrete execution of the program. The soundness of our analysis as stated above can be established in two steps. First, assume that each individual rule is correct, the composition is also correct. Second, while presenting these rules, we have sketched the intuition behind the correctness of each rule. A more rigorous proof can be formulated via proof-by-contradiction in a straightforward fashion, which we omit for brevity. 3.3 OptimizingModularAbstractInterpretation 3.3.1 PruningInfeasibleInterferences We now explain how to integrate the feasibility checking technique into the overall procedure for iterative analysis, which leverages theMustNotReadFrom relation to improve performance. Specically, when analyzing each interrupt handlerT , we lter out any interfering stores from other interrupt handlers that are deemed infeasible, thereby preventing their visibility toT . This can be implemented in Algorithm 2 by modifying the functionInterf, as well as the functionAnalyzeLocal dened in Algorithm 1. Our modications toInterf are shown in Algorithm 3. That is, when computing the interferences of T , we choose to create a set of store–state pairs, instead of eagerly joining all these states. By delaying the join of these states, we obtain the opportunity to lter out the infeasible store-to-load pair individually. For this reason, we overload the denition of] to be the join (t) of sets on matching variables. Next, we modify the abstract interpretation procedure for a single interrupt handler as shown in Algorithm 4. The process remains the same as AnalyzeLocal of Algorithm 1 except that, when a load 39 Algorithm3 Analysis of the entire program (cf. Alg. 2). 10: functionInterf(T =hN;n 0 ;i,T 0 =hN 0 ;n 0 0 ; 0 i, S) 11: I ? 12: foralln2N do 13: ifn is a shared memory write to variablev then 14: I(v) I(v)]f(n;tfunc(n;S(n))g 15: returnI Algorithm4 Analysis of a single interrupt (cf. Alg. 1). 1: functionAnalyzeLocal(T =hN;n 0 ;i;I) . . . 6: ifn is a shared-memory read of variablev then 7: i F fnj (st;s)2I(v)^:MustNotReadFrom(l;st)g 8: s tfunc(n;S(n)ti) 9: else 10: s tfunc(n;S(n)) . . . l is encountered (Line 6), we join the state from all interfering stores while removing any that must not interfere withl, as determined by theMustNotReadFrom relation (Line 7). The remainder of the modular analysis remains the same as in Algorithm 2. 3.3.2 TheRunningExample In Fig. 3.7(a), existing thread-modular abstract interpretation methods would consider the two stores from irq1 for the load of x inirq0. In contrast, we use Algorithm 4 to remove the pairing of the load inirq0 and the rst store inirq1, since the load and the store satisfy theMustNotReadFrom relation. Similarly, the pairing of the load of x inirq0 and the store of x inirq1 is ltered out whenirq1’s priority is not higher thanirq0’s priority as shown in Fig. 3.7(b). Our method can handle programs with loops. Fig. 3.8 shows an example, which has two interrupt handlers where irq1 has higher priority than irq0. Note that irq0 loads x and stores the value into b. Sincex is initialized to0, the handler checks whether the value of b is0. irq1 has a loop containing two stores of x. First, it stores the value 1 and then the value 0. Using traditional thread-modular abstract 40 irq0() { b = x; assert(b == 0); } irq1() { while(...) { x = 1; x = 0; } } Figure 3.8: A small example with a loop. interpretation, we would assume that x=1 and x=0 are all possible stores to the load of x in irq0. This would lead to a bogus violation of the assertion inirq0. However, in our analysis, this bogus violation is avoided by using the post-dominate relation between statements. Inside the while-loop of irq1,x=0 post-dominatesx=1, meaning thatx=0 always occurs after x=1. Therefore, using our Datalog inference rules presented in the previous section, we conclude that the storex=1 cannot reach the load of x inirq0. Thus, it is impossible for the value1 to be stored inb and then cause the assertion violation. 3.4 ImplementationandExperiments 3.4.1 ExperimentalSetup We have implemented IntAbs, our new abstract interpretation framework in a static verication tool for interrupt-driven C programs. It builds on a number of open-source tools including Clang/LLVM [3] for implementing the C front-end, Apron library [62] for implementing the abstract domains, andZ [56] for solving the Datalog constraints. We experimentally compared IntAbs with both iCBMC [77], a model checker for interrupt-driven programs and the state-of-the-art thread-modular abstract interpretation method by Miné [98, 101]. We conducted our experiments on a computer with an Intel Core i5-3337U CPU, 8 GB of RAM, and the Ubuntu 14.04 Linux operating system. Our experiments were designed to answer the following research questions: 41 Table 3.3: Benchmark programs used in our experimental evaluation. Name Description test Small programs created to conduct the sanity check of IntAbs’s handling of various interrupt semantics. logger Programs that model parts of the rmware of a temperature logging device from a ma- jor industrial enterprise. There are two major interrupt handlers: one for measurement and the other for communication. blink Programs that control LED lights connected to the MSP430 hardware, to check the timer values and change LED blinking based on the timer values. brake Programs generated from the Matlab/Simulink model of a brake-by-wire system from Volvo Technology AB, consisting of a main interrupt handler and four other handlers for computing the braking torque based on the speed of each wheel. usbmouse USB mouse driver from the Linux kernel, consisting of the device open, probe, and disconnect tasks with interrupt handlers. usbkbd USB keyboard driver from the Linux kernel, consisting of the device open, probe, and disconnect tasks with interrupt handlers. rgbled USB RGB LED driver from the Linux kernel. We use initialization of led and rgb functions and the led probe function, and check the consistency of the led and rgb device values using interrupts. rcmain Linux device driver for a remote controller core, including operations such as device register, free, check the device information, and update protocol values. We check the consistency of the device information and protocol values using several interrupt handlers. others Programs collected from Linux kernel drivers for supporting hardware such as ISA boards, TCO timer for i8xx chipsets, and watch dog. • Can IntAbs prove more properties (e.g., assertions) than state-of-the-art techniques such as iCBMC [77] and Miné [98, 101]? • Can IntAbs achieve the aforementioned higher accuracy while maintaining a low computational overhead? • Can IntAbs identify and prune away a large number of infeasible store-load pairs? Toward this end, we evaluated IntAbs on 35 interrupt-driven C programs, many of which are from real applications such as control software, rmware, and device drivers. These benchmark programs, together with our software tool, have been made available online [58]. The detailed description of each benchmark group is shown in Table 3.3. In total, there are 22,541 lines of C code. 42 3.4.2 Results: InfeasiblePairs Since IntAbs removes infeasible store-load pairs during the iterative analysis of individual interrupt handlers, it tends to spend extra time checking the feasibility of these data ow pairs. Nevertheless, this is the main source of accuracy improvement of IntAbs. Thus, to understand the trade-o, we have investigated, for each benchmark program, the total number of store-load pairs and the number of infeasible store-load pairs identied by our technique. Table 3.4 summarizes the results, where Column 3 shows the total number of store-load pairs, Column 4 shows the number of infeasible pairs, and Column 5 shows the percentage. Overall, our Datalog-based method for computing theMustNotReadFrom relation helped remove 69% of the load-store pairs, which means the subsequent abstract interpretation procedure only has to consider the remaining 31% of the load-store pairs. This allows IntAbs to reach a xed point not only quicker but also with signicantly more accurate results. 3.4.3 Results: OptimizingModularAbstractInterpretation Table 3.5 shows the experimental results. Columns 1-4 show the name, the number of lines of code (LoC), the number of interrupt handlers, and the number of assertions used for each benchmark program. Columns 5-7 show the results of iCBMC [77], including the number of violations detected, the number of proofs obtained, and the total execution time. Columns 8-10 show the results of Miné’s abstract interpretation method [98, 101]. Columns 11-13 show the results of IntAbs, our new abstract interpretation tool for interrupts. SinceiCBMC conductsbounded analysis, when it detects a violation, it is guaranteed to be a real violation; however, when it does not detect any violation, the property remains undetermined. Furthermore, since iCBMC by default stops as soon as it detects a violation, we evaluated it by repeatedly removing the violated property from the program until it could no longer detect any new violation. Also note that sinceiCBMC requires the user to manually set up theinterrupt-enabledpoints as described in [77], during the experiments, 43 Table 3.4: Results of total and ltered store-load pairs using IntAbs. Name LOC # of Pairs # of Filtered Pairs Filtered Ratio test1 46 1 1 100% test2 65 4 2 50% test3 86 16 8 50% test4 56 4 3 75% test5 54 4 3 75% logger1 161 18 2 11% logger2 183 32 6 18% logger3 195 34 6 17% blink1 164 19 15 78% blink2 174 56 32 57% blink3 194 120 63 52% brake1 819 34 24 70% brake2 818 82 58 70% brake3 833 164 128 78% usbmouse1 426 12 8 66% usbmouse2 442 168 136 80% usbmouse3 449 288 208 72% usbkbd1 504 40 20 50% usbkbd2 512 120 80 66% usbkbd3 531 400 280 70% rgbled1 656 76 38 50% rgbled2 679 228 152 66% rgbled3 701 456 304 66% rcmain1 2060 84 84 100% rcmain2 2088 560 476 85% rcmain3 2102 840 714 85% i2c_pca_isa_1 321 33 33 100% i2c_pca_isa_2 341 210 110 52% i2c_pca_isa_3 363 434 240 55% i8xx_tco_1 757 14 12 85% i8xx_tco_2 949 28 20 74% i8xx_tco_3 944 39 33 84% wdt_pci_1 1239 60 40 66% wdt_pci_2 1290 150 82 54% wdt_pci_3 1339 288 139 48% Total 22,541 5,116 3,560 69% we rst ordered the interrupts by priority and then set interrupt-enabled points at the beginning of the next interrupt handler. For example, given three interruptsirq_L,irq_M andirq_H, we would set the enabled point of irq_L in a main function, the enabled point of irq_M at the beginning of irq_L, and the enabled point of irq_H at the beginning of irq_M. 44 Table 3.5: Results of comparing IntAbs with state-of-the-art techniques on 35 interrupt-driven programs. iCBMC [77] Miné [98, 101] IntAbs (new) Name LOC Inter. Ass. V. P. Time (s) W. P. Time (s) W. P. Time (s) test1 46 2 2 0 0 0.23 1 1 0.18 0 2 0.07 test2 65 3 3 1 0 0.55 3 0 0.05 1 2 0.06 test3 86 4 4 1 0 0.52 4 0 0.06 2 2 0.10 test4 56 2 2 1 0 0.52 2 0 0.04 1 1 0.05 test5 54 2 2 1 0 1.56 2 0 0.04 1 1 0.04 logger1 161 2 1 0 0 0.45 1 0 0.22 0 1 0.27 logger2 183 3 3 0 0 0.50 2 1 0.29 0 3 0.39 logger3 195 4 4 0 0 0.46 1 3 0.31 0 4 0.43 blink1 164 3 3 1 0 0.65 3 0 0.12 2 1 0.18 blink2 174 4 3 1 0 0.67 3 0 0.16 2 1 0.30 blink3 194 5 4 2 0 1.14 4 0 0.25 3 1 0.46 brake1 819 2 5 1 0 0.87 3 2 0.66 1 4 0.98 brake2 818 3 4 3 0 2.24 4 0 1.67 3 1 1.91 brake3 833 4 5 2 0 2.38 5 0 2.58 4 1 3.48 usbmouse1 426 2 8 2 0 0.79 7 1 0.11 2 6 0.13 usbmouse2 442 4 16 2 0 0.69 16 0 0.31 5 11 0.69 usbmouse3 449 5 20 11 0 4.00 20 0 0.52 11 9 1.28 usbkbd1 504 2 8 3 0 0.91 8 0 0.23 4 4 0.39 usbkbd2 512 3 12 2 0 1.20 12 0 0.51 4 8 1.09 usbkbd3 531 5 20 3 0 1.19 20 0 1.86 12 8 4.44 rgbled1 656 2 10 5 0 0.71 10 0 0.41 5 5 0.77 rgbled2 679 3 15 5 0 1.11 15 0 0.99 5 10 2.39 rgbled3 701 4 20 5 0 1.07 20 0 2.18 10 10 5.68 rcmain1 2060 3 9 0 0 5.36 9 0 1.58 0 9 1.80 rcmain2 2088 5 15 6 0 12.39 15 0 6.93 6 9 9.46 rcmain3 2102 6 18 9 0 3.95 18 0 12.20 9 9 16.35 i2c_pca_isa_1 321 4 6 0 0 0.41 6 0 0.14 0 6 0.29 i2c_pca_isa_2 341 6 10 8 0 2.24 10 0 0.36 8 2 1.05 i2c_pca_isa_3 363 8 14 12 0 4.98 14 0 0.85 12 2 2.48 i8xx_tco_1 757 3 2 0 0 0.30 2 0 0.28 0 2 0.35 i8xx_tco_2 949 4 2 1 0 0.96 2 0 0.43 1 1 0.54 i8xx_tco_3 944 6 3 0 0 0.52 3 0 0.81 0 3 1.04 wdt_pci_1 1239 4 2 0 0 0.41 2 0 0.40 0 2 0.61 wdt_pci_2 1290 6 3 0 0 0.43 3 0 0.78 1 2 1.45 wdt_pci_3 1339 8 4 0 0 0.39 4 0 1.41 3 1 3.21 Total 22,541 136 262 88 0 56.75 (254) 8 39.92 (118) 144 64.21 Abbreviations - Inter.: Interrupts, Ass.: Assertions, V.: Violations, P.: Proofs, W.: Warnings. indicates the results contain bogus warnings, because the technique was designed for threads, not for interrupts. Overall,iCBMC found 88 violations while obtaining 0 proofs. Miné’s method, which was geared toward proving properties in threads, obtained 8 proofs while reporting 254 warnings, many of which turned out to 45 be bogus warnings. In contrast, our new method, IntAbs, obtained 144 proofs while reporting 118 warnings. This is signicantly more accurate than the prior techniques. In terms of the execution time, IntAbs took 64 seconds, which is slightly long than the 39 seconds taken byMiné’s method and the 56 seconds taken byiCBMC. 3.5 Summary We have presented an abstract interpretation framework for static verication of interrupt-driven software. It rst analyzes each individual handler function in isolation and then propagates the results to other handler functions. To lter out the infeasible data ows, we have also developed a constraint-based analysis of the scheduling semantics of interrupts with priorities. It relies on constructing and solving a system of Datalog constraints to decide whether a set of data ow pairs may co-exist. We have implemented our method in a software tool and evaluated it on a large set of interrupt-driven programs. Our experiments show the new method not only is ecient but also signicantly improves the accuracy of the results compared to existing techniques. More specically, it outperformed both iCBMC, a bounded model checker, and the state-of-the-art abstract interpretation techniques. 46 Chapter4 ScalableSemanticDingofConcurrentPrograms In this chapter, we present a scalable method for computing the semantic dierences of tow structurally similar concurrent programs. When an evolving concurrent program is modied, often times, the sequential program logic is not changed; instead, the modication focuses on thread synchronization, e.g., to optimize performance or remove bugs such as data-races and atomicity violations. Since concurrency is hard, it is important to ensure the modication is correct and does not introduce unexpected behavior. However, manually comparing two programs to identify the semantic dierence is dicult, and the situation is exacerbated in the presence of thread interactions: changing a single instruction in a thread may have a ripple eect on many instructions in other threads. Although techniques have been proposed to compute the synchronization dierence, e.g., by leveraging model checkers [15], they are expensive for practice use. For example, comparing two versions of a program with 578 lines of C code takes half an hour. To ll the gap, we develop a fast and approximate static analysis to compute such dierences with the goal of reducing analysis time from hours or minutes to seconds. We assume the two programs are closely related versions of an evolving software where changes are made to address issues related to thread synchronization as opposed to the sequential computation logic. Therefore, same as in prior works [15, 130], we focus on synchronization dierences. However, our method is orders-of-magnitude faster because 47 instead of model checking we leverage a polynomial-time declarative program analysis framework which uses a set of Datalog rules to model and reason about thread interactions. The reason why prior techniques are expensive is because they insist on being precise. Specically, they either enumerate interleavings or use a model checker to ensure a semantic dierence, represented as a set of data-ow edges, is allowed by one of the programs but not by the other. However, this in general is equivalent to program verication, which is an undecidable problem [115]; even in cases where it is reduced to a decidable problem, the cost of model checking is too high. Our insight is that in practice, it is relatively easy for developers to inspect a given dierence to determine if it is feasible; what is not easy and hence requires tool support is a systematic exploration of behaviors of the two programs to identify all possible dierences in the rst place. Unfortunately, developing such a tool is a non-trivial task; for example, the naive approach of comparing individual thread interleavings would not work due to the often exponential blowup in the number of interleavings. Our method avoids the problem by being approximate in that it does not enumerate interleavings. This also means infeasible behaviors are sometimes included. However, our approximation is carefully designed to take into consideration the program semantics most relevant to thread interaction. Furthermore, the approximation can be rened by iteratively increasing the number of data-ow edges used to characterize a synchronization dierence. We shall show through experiments that our fast and approximate analysis method does not lead to overly inaccurate results. To the contrary, the synchronization dierences reported by our method closely match the ones identied by human. Compared to the prior technique based on model checking, which often takes minutes or even hours, our method can be 10x to 1000x faster. Figure 4.1 shows the overall ow of our method. The input consists of two versions of a concurrent program:P 1 is the original version,P 2 is the changed version, and patch info represents their syntactic dierence, e.g., information about which instructions are added, removed or modied. The output consists of a set of dierences, each of which is represented by a set of data-ow edges allowed in one of the 48 LLVM Datalog Facts Datalog Facts Patch info μZ Datalog Engine in Z3 Datalog Inference Rules LLVM Differences Δ 12 = Δ 21 = Query EC-Diff Framework Figure 4.1: Overview of our semantic ding method. programs but not the other. When data-ow edges are allowed inP 1 but notP 2 , for example, they represent a removed behavior. Conversely, when data-ow edges are allowed inP 2 but notP 1 , they represent a new behavior introduced by the change. Our method rst generates a set of Datalog facts that encode the structural information of the control ow graphs. These facts are then combined with inference rules that codify the analysis algorithm. When the combined program is fed to a Datalog solver, the resulting xed point contains new relations (facts) that represent the analysis result. Specically, it contains data-ow edges that may occur in each program. By comparing data-ow edges from the two programs, we can identify the semantic dierences. Since program verication is undecidable in general, and with concurrency, it is undecidable even for Boolean programs [115], approximation is inevitable. Our method makes two types of approximations. The rst one is in checking the feasibility of data-ow edges. The second one is related to the number of data-ow edges used to characterize a dierence, also referred to as the rank of an analysis [15]. Although in the worst case, a precise analysis means the rank needs to be as large as the length of the execution, we restrict it to a small number in our method because prior research [106, 13] shows that concurrency bugs often can be exposed by executions with a bounded number of context switches. 49 Since our method is approximate in nature, the usefulness depends on how close it approaches the ground truth. Ideally, we want to have few false positives and few false negatives. Toward this end, we choose to stay away from the tradition of insisting the analysis being either sound or complete when one cannot have both. For a concurrent program, being sound often means existential abstraction: a data-ow edge is considered feasible (in all interleavings) if it is feasible in an interleaving, and being complete often means universal abstraction: a data-ow edge is considered feasible only if it is feasible in all interleavings. Both cases result in extremely coarse-grained approximations, which in turn lead to numerous false positives or false negatives. Instead, we want to minimize the dierence between our analysis result and the ground truth. We have implemented our method in a tool named EC-Di, which uses LLVM [3] as the front-end andZ [56] in Z3 as the Datalog solver. We evaluated EC-Di on 47 multithreaded programs with 13,500 lines of C code in total. These are benchmarks widely used in prior research [12, 14, 142, 92, 156, 153, 155, 91, 34, 33, 36, 35, 37, 66, 54]: some illustrate real concurrency bug patterns [156] and the corresponding patches [72] while others are applications from public repositories. We applied EC-Di to these benchmarks while comparing with the prior technique of Bouajjani et al. [15]. Our results show that EC-Di can detect, often in seconds, the same dierences identied by human. Furthermore, compared to the prior technique based on model checking, EC-Di is 10x to 1000x faster. To summarize, this chapter makes the following contributions: • We propose afast andapproximate analysis based on a polynomial-time declarative program analysis framework to compute synchronization dierences. • We show why our approximate analysis is reasonably accurate due to the custom-designed inference rules and iterative increase of the number of data-ow edges. 50 • We implement our method in a practical tool and evaluate it on a large number of benchmarks to conrm its high accuracy and low overhead. The remainder of the chapter is as follows. First, we provide the technical background and examples to motivate for conducting a dierential analysis in Section 4.1. Then, we present our analysis method in Section 4.2. This is followed by our procedures for interpreting the analysis result and optimizing performance in Section 4.3. We present our experimental results in Section 4.4. Finally, we give our discussions and summary in Section 4.5 and Section 4.6. 4.1 BackgroundandMotivation In this section, we present the background of partial trace comparison that we use to provide semantic dierences between two programs. Then, we use examples to motivate the need for conducting a dierential analysis. Programs used in these examples illustrate common bug patterns (also used during our experiments in Section 4.4). In each example, there are two program versions: the original one may violate ahypothetical assertion and the changed one avoids it. These assertions are hypothetical (added for illustration purposes only) in the sense that our method does not need them to operate. 4.1.1 PartialTraceComparison To compare the synchronizations of two concurrent programs, we use the notion of partial trace introduced by Shasha and Snir [130] and extended by Bouajjani et al. [15]. LetP be a program andG be the set of global variables shared by threads inP . For eachx2 G, letW (x) denote a store instruction andR(x) denotes a load instruction. LetI be the set of all instructions in the program. Any binary relation over these instructions is a subset ofII. 51 For example, ^ soII is a relation that orders the store instructions;W 1 (x)<W 2 (x) meansW 1 2I is executed beforeW 2 2I. Thus, in Fig. 4.2(a), (L1,L4), (L4,L1), (L1,L5), (L5,L1), (L4,L5) belong to ^ so, but(L5,L4) does not belong to ^ so because it is not consistent with the program order. Similarly, ^ rf is a relation between load and store instructions. In Fig. 4.2(a), we have (L4,L2) and (L5,L2) in ^ rf, meaning the load at Line 2 may read from values written at Lines 4 and 5. Given ^ so and ^ rf, we dene ^ sets as a set of subsets of ^ rf[ ^ so, where each elementss2 ^ sets has at mostk edges. Edges inss are from either ^ rf or ^ so – they capture the abstract trace. The numberk, which is called the rank [15], is bounded by the length of the trace. Denition1(AbstractTracewithRankk) Anabstracttracewithrankkisatuple ^ T =h ^ so; ^ rf; ^ sets;ki, where ^ sofW 1 (x)W 2 (x)jW 1 2I;W 2 2I; andW 1 <W 2 in some execution traceg, ^ rffW (x) R(x)jW2I andR2Ig, and ^ setsfss ^ rf[ ^ sojjssjkg. Given the abstract traces ^ T 1 and ^ T 2 of two programsP 1 andP 2 , respectively, we dene their dierence as = ( 12 ; 21 ), where 12 = ^ T 1 n ^ T 2 and 21 = ^ T 2 n ^ T 1 . Next, we dene what it means for ^ T 1 to be a renement of ^ T 2 , denoted ^ T 1 ^ T 2 . Denition2(AbstractTraceRenement) Giventwoabstracttraces ^ T 1 =h ^ so 1 ; ^ rf 1 ; ^ sets 1 ;ki and ^ T 2 = h ^ so 2 ; ^ rf 2 ; ^ sets 2 ;ki, we say ^ T 1 is a renement of ^ T 2 , denoted ^ T 1 ^ T 2 , if and only if ^ so 1 ^ so 2 , ^ rf 1 ^ rf 2 , and ^ sets 1 ^ sets 2 . That is, when ^ T 1 ^ T 2 , the abstract behavior ofP 1 is covered by that ofP 2 . And the dierence ( ^ T 2 n ^ T 1 ) is characterized by ^ so 2 n ^ so 1 , ^ rf 2 n ^ rf 1 , and ^ sets 2 n ^ sets 1 . Finally, if the abstract traces ofP 1 andP 2 rene each other, we say they are rank-k equivalent. 52 Although comparison of abstract traces involves ^ so and ^ rf, when reporting the dierences, we focus on the ^ rf edges only because they directly aect the observable behaviors of the programs. In contrast, store-store ordering ( ^ so) may not be observable unless they also aect the read-from ( ^ rf) edges. 4.1.2 MotivatingExamples 4.1.2.1 AnExamplewithLock-UnlockChanges Fig. 4.2(a) shows a two-threaded program where the shared variablex is initialized to 0. The assertion at Line 3 may be violated, e.g., whenthread1 executes the statement at Line 2 right afterthread2 executes the statement at Line 5. The reason is because no synchronization operation is used to enforce any order. Assume the developer identies the problem and patches it by adding locks (Figure 4.2(b)), the assertion violation will be avoided. To see why this is the case, consider the data-ow edge from Line 5 to Line 2: due to the critical sections enforced by lock-unlock pairs, the load of x at Line 2 is not aected by the store of x at Line 5. For example, if the critical section containing Line 5 is executed rst, the subsequentunlock(a) must be executed before thelock(a) inthread1, which in turn must be executed before Line 1 and Line 2. Since the store of x at Line 1 is the most recent, the load of x at Line 2 will get its value, not the value written at Line 5. Thus, the allowed data-ow edges are as follows: RF(L4,L2) andRF(L5,L2) for the original program, andRF(L4,L2) for the changed program. This notion of comparing concurrent executions was introduced by Shasha and Snir [130] and extended by Bouajjani et al. [15], although in both cases, enumeration or model checking techniques were used. In our work, the goal is to avoid such heavyweight analyses while maintaining sucient accuracy. In addition to RF edges, there are other types of relations considered during our analysis, including program order, inter-thread order imposed by thread create, join, signal-wait as well as store-store order. Nevertheless, when interpreting the nal results, we focus on dierences in theRF edges because they aect 53 thread1 { 1: x = x + 1; 2: if (x == 0) 3: assert(0); } thread2 { 4: x = 1; ... 5: x = 0; } RF RF (a) Before change thread1 { lock(a); 1: x = x + 1; 2: if (x == 0) 3: assert(0); unlock(a); } thread2 { 4: x = 1; ... lock(a); 5: x = 0; unlock(a); } RF RF (b) After change Figure 4.2: Example programs with synchronization dierences (lock-unlock). the externally observable behavior of a program, e.g., characterized by assertions and other reachability properties. 4.1.2.2 AnExamplewithSignal-WaitChanges Fig. 4.3 shows a more sophisticated example: the use of signal-wait, which is often dicult for static analyzers. Since the variablex is initialized to 0, when the critical section inthread1 is executed before thread2, the load of x at Line 1 will get the value 0, which leads to the assertion violation in Fig. 4.3(a). Assume the intended behavior is forthread2 to complete rst, an inter-thread execution order must be enforced, e.g., by using the signal-wait pair shown in Fig. 4.3(b). The assertion violation is avoided because the load of x at Line 1 can only read from the store of x at Line 5. To correctly deploy the signal-wait pair, a variable namedcBool needs to be added. If the operating system voluntarily schedulesthread2 rst,thread1 needs to be aware – by checking the value of cBool – and then skips the execution of wait; otherwise, wait may get stuck because the corresponding signal has 54 already been red (and lost). But if thread1 is executed rst, sincecBool has not been set, it will invoke wait which forces the corresponding signal to be sent. As for the data-ow edges, we can see that RF(L5,L1) and RF(L3,L4) are allowed in the original program, but onlyRF(L5,L1) is allowed in the changed program. RF(L3,L4) is not allowed because Line 4 must happen before Line 5, Line 5 must happen before signal, and signal must happen before wait, which resides before Lines 1-3 in thread1. Thus, there is a cycle (contradiction). thread1 { lock(a); 1: if (x == 0) 2: assert(0); 3: y = foo(x); unlock(a); } thread2 { ... lock(a); 4: bar(y); 5: x = 4; unlock(a); } RF RF (a) Before change thread1 { lock(a); if (!cBool) wait(cond); 1: if (x == 0) 2: assert(0); 3: y = foo(x); unlock(a); } thread2 { ... lock(a); 4: bar(y); 5: x = 4; cBool = 1; signal(cond); unlock(a); } RF RF (b) After change Figure 4.3: Example programs with synchronization dierences (signal-wait). 55 Fig 4.2(a) mustHB f(1; 2); (2; 3); (1; 3); (4; 5)g mayHB mustHB[f(1; 4); (1; 5); (2; 4); (2; 5); (3; 4); (3; 5); (4; 1); (4; 2);:::g MayRF f(4; 1); (4; 2); (5; 1);(5,2)g Fig 4.2(b) mustHB f(1; 2); (2; 3); (1; 3); (4; 5)g mayHB mustHB[f(1; 4); (1; 5); (2; 4); (2; 5); (3; 4); (3; 5); (4; 1); (4; 2);:::g MayRF f(4; 1); (4; 2); (5; 1)g Fig 4.3(a) mustHB f(1; 2); (2; 3); (1; 3); (4; 5)g mayHB mustHB[f(1; 4); (1; 5); (2; 4); (2; 5); (3; 4); (3; 5); (4; 1); (4; 2);:::g MayRF f(3,4); (5; 1); (5; 3)g Fig 4.3(b) mustHB f(1; 2); (2; 3); (1; 3); (4; 5);(4,1),(4,2),(4,3),(5,1),(5,2),(5,3)g mayHB mustHB MayRF f(5; 1); (5; 3)g Figure 4.4: Analysis steps for programs in Figs. 4.3(a) and 4.3(b). 4.1.2.3 ApplyingOurMethodtotheExamples Our method diers from prior techniques which rely on either enumerating interleavings and conducing pairwise comparison [130], or model checking based techniques [15]. Both are computationally expensive. Instead, we use lightweight static analysis. Our method represents the control and data dependencies of each program as a set of Datalog facts. We also design a set of Datalog inference rules, which capture our algorithm for deriving new facts from existing facts. Leveraging a Datalog solver, we can repeatedly apply the inference rules over the facts until a xed point is reached. We will explain details of our Datalog facts and inference rules in Section 4.2. For now, consider the steps of computing synchronization dierences for the programs in Fig. 4.2 and Fig. 4.3, which are outlined by the tables in Fig. 4.4. First, our method computes must-happen-before (mustHB) edges, which represent the execution order of two instructions respected by all thread interleavings. FrommustHB, our method computes may-happen- before (mayHB) edges, which represent the execution order respected by some interleavings, e.g., thread context switches not contradicting to mustHB. From mayHB, our method computes MayRF edges, which represent data ows (over shared variables) from store instructions to the corresponding load instructions. 56 TheMayRF edges are over-approximated in that, if an edge is included inMayRF, the corresponding data owmay occur in an execution. But if an edge is not included inMayRF, we know for sure the corresponding data ow is denitely infeasible. For example, in Fig. 4.4,MayRF has four edges for Fig. 4.2(a) but only three edges for Fig. 4.2(b). RF(L5,L2) is no longer allowed in the changed program, indicating it is a dierence between the two programs. For the example in Fig. 4.3, we computemustHB based on the sequential program order and, in Fig. 4.3(b), the inter-thread execution order imposed by signal-wait. Then, frommustHB we computemayHB, which includes edges inmustHB and more. For Fig. 4.3(a), since there is no restriction on the inter-thread execution order, all pairs of events are included, whereas for Fig. 4.3(b), there is only one-way data ow. Finally, we computeMayRF based onmayHB. There are three edges for Fig. 4.3(a) but only two for Fig. 4.3(b). TheRankofanAnalysis When comparingMayRF in these two examples, we identify the dierence as edges allowed in only one of the two programs, such asRF(L5,L2)in Fig. 4.2 andRF(L3,L4) in Fig. 4.3. However, even if MayRF edges are allowed individually, they may not occur in the same execution. For example,RF(L5,L1) andRF(L3,L4) in Fig. 4.3(a) cannot occur together because, otherwise, they form a cycle together with the program order edges. Our method has inferences rules designed to check if two or more data-ow edges can occur together—this is referred to as the rank [15]. With the notion of rank, we can capture ordered sets of MayRF edges, as opposed to individualMayRF edges. Thus, even if theMayRF relation remains the same, there may be dierences of high ranks: two or more edges fromMayRF may occur together inP 1 but not inP 2 . We will present our method for checking such dierences in Section 4.3 following the baseline procedure in Section 4.2. 57 4.2 Constraint-basedSynchronizationAnalysis In this section, we present our method for computing abstract traces of a single program. In the next section, we leverage the abstract traces of two programs to compute their dierences. First, we dene the elementary relations that can be constructed directly from the CFG of a program. • St(s 1 ;th 1 ): Statements 1 resides in Threadth 1 • Po(s 1 ;s 2 ): Statements 1 is befores 2 in a thread • Dom(s 1 ;s 2 ): Statements 1 dominatess 2 in a thread • PostDom(s 1 ;s 2 ):s 1 post-dominatess 2 in a thread • ThrdCreate(th 1 ;s 1 ;th 2 ): Threadth 1 createsth 2 ats 1 • ThrdJoin(th 1 ;s 1 ;th 2 ): Threadth 1 joins backth 2 ats 1 • CondWait(s 1 ;v 1 ):s 1 waits for condition variablev 1 • CondSignal(s 1 ;v 1 ):s 1 sends condition variablev 1 • Load(s 1 ;v 1 ): Statements 1 reads from variablev 1 • Store(s 1 ;v 1 ): Statements 1 writes to variablev 1 • InCS(s 1 ;l 1 ):s 1 resides in a critical section guarded by lock(l 1 )–unlock(l 1 ) pair • SameCS(s 1 ;s 2 ;l 1 ):s 1 ands 2 are in the same critical section guarded byl 1 • DiffCS(s 1 ;s 2 ;l 1 ):s 1 ands 2 are in dierent critical sections guarded byl 1 While traversing the CFG to compute thePo, Dom, andPostDom relations, we take loops into con- sideration. For example, two instructions involved with the same loop may not have aDom orPostDom 58 relation, but an instruction outside the loop can have aDom orPostDom relation with an instruction inside the loop. Next, we dene inference rules for computing new relations such asMayHb,MustHb, andMayRf. 4.2.1 RulesforIntra-threadDependency To capture the execution order of instructions, we dene the following relations: MayHb(s 1 ;s 2 ) meanss 1 may happen befores 2 in some execution, andMustHb(s 1 ;s 2 ) meanss 1 happens befores 2 in all executions when both occur. Since the program order in each thread implies the execution order, we have the following rule: MustHb(s 1 ;s 2 ) Po(s 1 ;s 2 ) In this work, we assume sequential consistency but Datalog is capable of handling weaker memory mod- els [79] as well. By denitionMustHb impliesMayHb, which means MayHb(s 1 ;s 2 ) MustHb(s 1 ;s 2 ) 4.2.2 RulesforInter-threadDependency When a parent threadth 1 creates a child threadth 2 at the statements 1 , e.g., by invokingpthread_create, any statements 2 in the child thread must occur afters 1 . MustHb(s 1 ;s 2 ) ThrdCreate(th 1 ;s 1 ;th 2 )^St(s 2 ;th 2 ) 59 Similarly, when a parent threadth 1 joins back a child threadth 2 ats 1 , any statements 2 inth 2 must occur befores 1 . MustHb(s 2 ;s 1 ) ThrdJoin(th 1 ;s 1 ;th 2 )^St(s 2 ;th 1 ) 4.2.3 RulesforSignal-WaitDependency When a condition variablec is used, e.g., through signal(c) and wait(c), it imposes an execution order. MustHb(s 1 ;s 2 ) CondSignal(v 1 ;s 1 )^CondWait(v 1 ;s 2 ) However, the rule needs to be used with caution. In practice, wait(c) is often wrapped in an if-condition as shown in Figure 4.3(b). To be conservative, our method analyzes the control ow of these threads and applies the above rule only after detecting the usage pattern. Since our method does not analyze the concrete values of any shared variables, it does not check if the if-condition is valid. Also, developers may use condition variables in a dierent way. Thus, in our experiments (Section 4.4), we evaluated the impact of this conservative approach—assuming the if-condition is always valid—to conrm it does not lead to signicant loss of accuracy. 4.2.4 RulesforAdHocSynchronization We handle ad hoc synchronization similar to signal-wait. Fig. 4.5 shows an example where cond is a user-added ag initialized to 0. The busy-waiting inthread2 ensures thata=1 always occurs beforex=a. By traversing the CFGs of these threads, we can identify the pattern; this is practical since the number of usage patterns is limited. After that, we add aMustHb edge fromcond=true towhile(!cond). This is similar to addingMustHb edges forCondWait andCondSignal. As a result, we can decide the read-from edge betweenx=a and the initialization of a is infeasible. 60 thread1() { a = 1; cond = true; } thread2() { while(!cond) {} x = a; } Figure 4.5: Ad hoc synchronization (cond = false initially). 4.2.5 RulesforTransitiveClosure SinceMustHb is transitive, we use the following rule to compute the transitive closure: MustHb(s 1 ;s 3 ) MustHb(s 1 ;s 2 )^MustHb(s 2 ;s 3 ) When instructions in concurrent threads are not ordered byMustHb, we assume they may occur in any order: MayHb(s 1 ;s 2 ) St(s 1 ;th 1 )^St(s 2 ;th 2 )^:MustHb(s 2 ;s 1 ) TheMayHb relation is also transitive: MayHb(s 1 ;s 3 ) MayHb(s 1 ;s 2 )^MayHb(s 2 ;s 3 ) 4.2.6 RulesforLock-enforcedCriticalSection For critical sections based on lock-unlock, we introduce rules based on access patterns. First, we compute CoveredStore(s 1 ;v 1 ;l 1 ), meaning the store ins 1 is overwritten by a subsequent store in the same critical section. Consider lk(a)!W 1 (v)!W 2 (v)!unlk(a), whereW 1 (v) is a covered store and thus not visible to reads in other critical sections protected by the same lock. CoveredStore(s 1 ;v 1 ;l 1 ) Store(s 1 ;v 1 )^Store(s 2 ;v 1 )^PostDom(s 2 ;s 1 )^SameCS(s 1 ;s 2 ;l 1 ) 61 Similarly, CoveredLoad(s 2 ;v 1 ;l 1 ) means the load ofv 1 ins 2 is covered and thus can only read from a preceding store in the same critical section. CoveredLoad(s 2 ;v 1 ;l 1 ) Store(s 1 ;v 1 )^Load(s 2 ;v 1 )^Dom(s 1 ;s 2 )^SameCS(s 1 ;s 2 ;l 1 ) Consider lk(a)!W(v)!R(v)!unlk(a) as an example: R(v) is covered byW(v) and thus cannot read from stores in other critical sections protected by the same lock. 4.2.7 RulesforRead-fromRelation Finally, we computeNoRf(s 1 ;s 2 ) which means the read-from edge betweens 1 ands 2 is infeasible. NoRf(s 1 ;s 2 ) Store(s 1 ;v 1 )^Store(s 3 ;v 1 )^Load(s 2 ;v 1 )^MustHb(s 1 ;s 3 )^MustHb(s 3 ;s 2 ) That is, in W(x)! W(x)! R(x), the rst store cannot be read by the load. In addition to this generic rule, we have two more inference rules: NoRf(s 1 ;s 2 ) Store(s 1 ;v 1 )^Load(s 2 ;v 1 )^MayHb(s 1 ;s 2 ) ^CoveredLoad(s 2 ;v 1 ;l 1 )^DiffCS(s 1 ;s 2 ;l 1 ) This rule means if one store may happen before one load, the load is covered, and the store is in a dierent critical section, the load cannot read from the store. This is because another store will overwrite the value to be read. NoRf(s 1 ;s 2 ) Store(s 1 ;v 1 )^Load(s 2 ;v 1 )^MayHb(s 1 ;s 2 ) ^CoveredStore(s 1 ;v 1 ;l 1 )^DiffCS(s 1 ;s 2 ;l 1 ) This rule means if a store is covered, i.e., overwritten by a subsequent store, the store cannot reach to any load in other critical sections protected by the same lock. 62 We also computeMayRf(s 1 ;s 2 ) which means the load ins 2 may read from the store ins 1 . MayRf(s 1 ;s 2 ) Store(s 1 ;v 1 )^Load(s 2 ;v 1 )^MayHb(s 1 ;s 2 )^:NoRf(s 1 ;s 2 ) 4.2.8 SoundnessoftheAnalysis For each program, our Datalog rules inferMayRf relations in each program. As we will discuss the symmetric dierences in Section 4.3, we compare two over-approximated sets. Each set is a set of MayRf relations that is over-approximation. Therefore, the computation of dierentiating over two sets might lose the soundness of results. However, as our rules are specically designed to check synchronization dierences, the empirical results in Section 4.4 show the results are accurate enough for targeted synchronization changes. 4.3 OptimizingtheSemanticDingComputation In this section, we show how to compare abstract traces of the two programs to identify the dierences. 4.3.1 SymmetricDierence Fig. 4.6 shows the Venn diagram of our method for computing the dierences when given the abstract traces of two programs. The actual behaviors of programsP 1 andP 2 are represented by the circles with solid lines that is represented byT 1 andT 2 . There are two approximate behaviors for each program: over- approximated behavior represented by the circles with blue-colored dashed lines and under-approximated behavior represented by the circles with red-colored dashed lines. The over-approximated behaviors are denoted asT + 1 andT + 2 , and the under-approximated behaviors are denoted asT 1 andT 2 respectively. In this work, we consider the over-approximated behavior as abstract traces (i.e., ^ T 1 and ^ T 2 ). Conceptu- ally, the symmetric dierence in this work is computed based on +n+ 12 =T + 1 nT + 2 and +n+ 21 =T + 2 nT + 1 , 63 P 1 P 2 P 2 P 1 Figure 4.6: Dierences of abstract traces: +n+ 12 (left) and +n+ 21 (right). and for each is presented as pink-colored region in Fig. 4.6 (left and right). The details of them are presented in the remainder of this section. To compute the dierence, we dene two relationsDiffP1 andDiffP2 and rules for computing them: DiffP1(s 1 ;s 2 ) MayRf(s 1 ;s 2 ;P 1 )^:MayRf(s 1 ;s 2 ;P 2 ) DiffP2(s 1 ;s 2 ) MayRf(s 1 ;s 2 ;P 2 )^:MayRf(s 1 ;s 2 ;P 1 ) DiffP1 represents edges that may happen inP 1 but not inP 2 . Similarly,DiffP2 represents edges that may happen inP 2 but not inP 1 . If DiffP1 is not empty, there are more behaviors inP 1 ; and if DiffP2 is not empty, there are more behaviors inP 2 . Since the Datalog solver may enumerate all possible MayHb edges (used to compute MayRf), and the number of MayHb edges increases rapidly as the program size increases, we need to reduce the computational overhead. Our insight is that, since we are only concerned with synchronization dierences in the end, as opposed to behaviors of the sequential computation, we can restrict our analysis to instructions that access global variables. Toward this end, we dene a new relation namedAccess(v 1 ;s 1 ) which means s 1 accesses a global variablev 1 , and use it to guard the inference rules forMayHb (and henceMustHb). It forces the Datalog solver to consider only global accesses, which reduces the computational overhead without losing accuracy. We demonstrate the eectiveness of this optimization using experiments in Section 4.4. 64 s 1 :W(x) s 2 :W(x) s 3 :R(x) RF RF s 4 :R(x) s 1 :W(x) s 2 :R(x) s 3 :W(x) RF RF MustHB MustHB Figure 4.7: Illustrating the rst two rank-2 inference rules. 4.3.2 DierencesatHigherRanks The rules so far use individualread-from edges to characterize the dierences, which is equivalent torank-1 analysis [15], but some programs may not have rank-1 dierence but have dierences of higher ranks. To detect them, we need to compute ordered sets of data-ow edges allowed in one program but not in the other. To be specic, for rank-2, we extend the MayRf relation, which was dened over two instructions (an edge), toMayRfs dened over four instructions, to represent an ordered set of (two) read-from edges. Similarly, we extend theNoRf relation toNoRfs, which is also dened over four instructions to represent an ordered set of (two) read-from edges. Previously,NoRf(s 1 ;s 2 ) means there is no execution trace where the stores 1 can be read by the load s 2 , whereas MayRf(s 1 ;s 2 ) means there may exist some execution trace that allows the read-from edge (s 1 ;s 2 ). Similarly, NoRfs((s 1 ;s 2 ); (s 3 ;s 4 )) means there is no execution trace where the two read-from edges (s 1 ;s 2 ) and (s 3 ;s 4 ) occur together and in that order; and MayRfs((s 1 ;s 2 ); (s 3 ;s 4 )) means there may exist some execution trace that allows the two read-from edges to occur together and in that order. First, we present our rules for computingNoRfs, which in turn is used to computeMayRfs. Since it is not possible to enumerate all scenarios due to theoretical limitations, we resort to the most common 65 scenarios. Nevertheless, we guarantee that NoRfs is an under-approximation, and the corresponding MayRfs is an over-approximation. NoRfs((s 1 ;s 3 ); (s 2 ;s 3 )) MayRf(s 1 ;s 3 )^MayRf(s 2 ;s 3 ) This rule is obvious because, as in Fig. 4.7 (left), in the same execution trace a load (s 3 ) cannot read from two dierent stores (s 1 ;s 2 ). NoRfs((s 1 ;s 2 ); (s 3 ;s 4 )) MayRf(s 1 ;s 2 )^MayRf(s 3 ;s 4 )^MustHb(s 2 ;s 3 )^MustHb(s 4 ;s 1 ) This rule is also obvious because, as shown in Fig. 4.7 (right), if the two read-from edges form a cycle together with the must-happen-before edges, they lead to a contradiction. NoRfs((s 1 ;s 2 ); (s 3 ;s 4 )) SameCS(s 1 ;s 4 ;l 1 )^SameCS(s 2 ;s 3 ;l 1 )^DiffCS(s 1 ;s 2 ;l 1 ) This rule is related to lock-unlock pairs. The rationale behind it can be explained using the diagram in Fig. 4.8 (left). Due to thelock-unlock pairs, there are only two possible interleavings: (1) ifs 1 happens before s 2 ,s 4 must happen befores 3 ands 2 , which contradicts to the read-from edge (s 3 ;s 4 ); (2) ifs 3 happens befores 4 ,s 2 must happen befores 1 , which contradicts to the read-from edge (s 1 ;s 2 ). Thus, the read-from edges cannot occur in the same execution trace. Next, we dene another rule related to lock-unlock pairs. In this rule, we usePostDom(s 3 ;s 2 ) to mean, afters 2 is executed,s 3 is guaranteed to be executed as well. NoRFs((s 1 ;s 2 ); (s 1 ;s 4 )) Store(s 3 ;v 1 )^PostDom(s 3 ;s 2 )^DiffCS(s 2 ;s 4 ;l 1 )^SameCS(s 2 ;s 3 ;l 1 ) 66 lk(l 1 ) s 1 :W(x) s 4 :R(x) unlk(l 1 ) lk(l 1 ) s 2 :R(x) s 3 :W(x) unlk(l 1 ) RF RF lk(l 1 ) s 4 :R(x) unlk(l 1 ) s 1 :W(x) lk(l 1 ) s 2 :R(x) s 3 :W(x) unlk(l 1 ) RF RF PostDom Figure 4.8: Illustrating rank-2 rules related to lock-unlock. The rationale behind this rule can be explained using the diagram in Fig. 4.8 (right). Here, the loads and stores access the same variable. If the read-from edge (s 1 ;s 2 ) is ahead of (s 1 ;s 4 ) in the same execution trace, the store ins 3 contradicts to the read-from edge (s 1 ;s 4 ). Finally, we computeMayRFs based onNoRFs: MayRFs((s 1 ;s 2 ); (s 3 ;s 4 )) :NoRFs((s 1 ;s 2 ); (s 3 ;s 4 )) It means the read-from edges (s 1 ;s 2 ) and (s 3 ;s 4 ) may occur together and in that order in some execution trace. WithMayRFs, we compute dierences (DiffP1 andDiffP2) by replacingMayRf withMayRFs. Our method for computing dierences of rank 3 or higher are similar, and we omit the details for brevity. 4.3.3 TheRunningExample Fig. 4.9 shows an example that illustrates the rank-2 analysis. Here,thread1 setst to 0 andx to 1 before creatingthread2. Due to lock-unlock pairs, the assertion cannot be violated in Fig. 4.9(a). However, if the lock-unlock in thread1 is removed as in Fig. 4.9(b), the assertion may be violated because, in between Lines 4 and 5, there may be a context switch which was not allowed previously. 67 However, the synchronization dierence cannot be captured by any individualMayRF edge. In fact, the table in Fig. 4.10 shows that the two programs have the same set of MayRF edges. In particular, since there are two stores of x, the load at Line 2 may read from both Line 1 and Line 5. To capture the dierence, we need rank-2 analysis. • Assume RF(L1,L4) occurs rst, meaning thread2 acquires the lock and thus prevents thread1 from acquiring the same lock untilthread2 exits the critical section. It means the store at Line 5 will setx to 2. Therefore, the load of x at Line 2 will have to read from Line 5, not from Line 1. In other words,RF(L1,L2) cannot occur afterRF(L1,L4) in the same execution. • Assume RF(L1,L2) occurs rst and thread2 will not be executed until thread1 nishes. In this case,RF(L1,L4) is allowed since no store of x is inthread1. As a result, the program in Fig. 4.9(a) allows the ordered set {RF(L1,L2),RF(L1,L4)} but not the ordered set {RF(L1,L4),RF(L1,L2)}. However, the program in Fig. 4.9(b) allows the ordered set {RF(L1,L4), RF(L1,L2)} as well, due to the removal of the lock-unlock pairs inthread1. Specically, whenRF(L1,L4) occurs at the start of an execution,thread1 may execute Line 2 beforethread2 execute Line 5, which allows Line 2 to read the value of x from Line 1. Our steps of conducting the rank-2 analysis, based on inference rules presented so far, are shown in Fig. 4.10. There is no dierence in the MayRF sets; however, when comparing the ordered set of MayRF edges, we can still see the dierence. To support this analysis, we apply the aforementioned inference rules of rank 2, which checks the existence of (1; 4)! (1; 2). 68 thread1 { t = 0; 1: x = 1; create(t2); lock(a); ... 2: assert(x != t); unlock(a); } thread2 { ... lock(a); 4: t = x; ... 5: x = 2; unlock(a); ... } 1 :RF 2 :RF RF (a) Before change thread1 { t = 0; 1: x = 1; create(t2); lock(a); ... 2: assert(x != t); unlock(a); } thread2 { ... lock(a); 4: t = x; ... 5: x = 2; unlock(a); ... } 1 :RF HB 2 :RF (b) After change Figure 4.9: Example programs with rank-2 dierences. Fig 4.9(a) mustHB f(1; 2); (1; 4); (1; 5); (4; 5)g mayHB mustHB[f(2; 4); (2; 5); (4; 2); (5; 2)g MayRF f(1; 2); (1; 4); (5; 2)g Rank2 f[(1; 2)! (1; 4)]; [(1; 4)! (5; 2)]g Fig 4.9(b) mustHB f(1; 2); (1; 4); (1; 5); (4; 5)g mayHB mustHB[f(2; 4); (2; 5); (4; 2); (5; 2)g MayRF f(1; 2); (1; 4); (5; 2)g Rank2 f[(1; 2)! (1; 4)], [(1; 4)! (5; 2)], [(1,4)! (1,2)]g Figure 4.10: Steps of our analysis for the programs in Fig. 4.9. 69 4.4 ImplementationandExperiments 4.4.1 ExperimentalSetup We have implemented the method in a tool named EC-Di, which uses LLVM [3] as the frontend and Z [56] in Z3 as the Datalog solver at the backend. Specically, we use Clang/LLVM to parse the C/C++ code of multithreaded programs and construct the LLVM intermediate representation (IR). Then, we traverse the LLVM IR to generate program-specic Datalog facts. These Datalog facts, when combined with a set of program-independent inference rules, form the entire Datalog program. Finally, theZ Datalog solver is used to solve the program, which repeatedly applies the rules to the fact until a xed point is reached. By querying relations in the xed point, we can retrieve the analysis result. We used two sets of benchmarks in our experiments. The rst set of benchmarks consists of 41 multithreaded programs, which previously [72] have been used to illustrate concurrency bug patterns found in real applications [12, 14, 92, 156, 155, 91, 34, 33, 36, 35, 37, 66, 54]. With these programs, our goal is to evaluate how well the various types of concurrency bugs are handled by our method, and how our results compare to that of the prior technique based on model checking [15]. For these benchmarks, the prior technique is not able to soundly instrument all applications. Therefore, we manually insert assertions to be checked later by the CBMC bounded model checker for detecting only one dierent edge. The second set of benchmarks consists of 6 medium-sized applications from open-source repositories; they have also been used previously [152, 156] to evaluate testing and automated program repair tools. Similarly, we are not able to apply the prior technique [15] because it has limitations to instrument large size programs and it is impossible for us to manually insert assertions. Nevertheless, we can evaluate how ecient our new method EC-Di is on these real applications. In total, our benchmarks has 13,500 lines of C code. 70 For each benchmark program, there are two versions, one of which is the original program and the other is the changed program. These changed programs are patches collected from various sources: some are from benchmarks used in prior research on testing [156, 152] and repair [72], whereas others are from benchmarks used in dierential analysis [15]. We also created four programs,case1-4, to illustrate motivating examples used throughout this paper. These benchmark programs, together with our experimental data, the LLVM-based tool, as well as data obtained from applying the prior technique [15], have been made available online ∗ . Our experiments were designed specically to answer the following research questions: • Is our new method, based on afast andapproximate static analysis as opposed to heavy-weight model checking techniques, accurate enough for identifying the actual synchronization dierences in the benchmark programs? • Is our new method signicantly more ecient, measured in terms of the analysis time, than the prior technique based on model checking? In all these experiments, we used a computer with an Intel Core i5-4440 CPU @ 3.10 GHz x 4 CPUs with 12 GB of RAM, running the Ubuntu-16.04 LTS operating system. 4.4.2 Results: OptimizingSemanticDingonSmall-sizedBenchmarks Table 4.1 shows our results on the rst set of benchmarks, with 41 programs illustrating common bug patterns. Columns 1 and 2 show the name and the number of lines of C code. Column 3 shows the number of threads. Column 4 shows the type of bug illustrated by the program. Specically,Sync. means the bug is due to misuse of locks, and thus to repair it, some lock-unlock pairs have been added, removed or modied; Cond. means the bug is due to misuse of condition variables, and thus to repair it, some signal-wait pairs have been added, removed or modied;Th.Order means the bug is related to thread creation and join and ∗ https://github.com/ChunghaSung/EC-Di 71 thus involvesThrdJoin orThrdCreate; andOrder. means the bug is related to ordering of instructions imposed by ad-hoc synchronization. Note that, in each of these benchmarks, there is some synchronization dierence. The remaining columns show the statistics reported by EC-Di as well as the prior technique [15]. Specically, Column 5 shows if EC-Di detected the synchronization dierence. Column 6 shows at which rank our analysis is conducted (Section 4.3): we iteratively increase the rank starting from 1, until a synchronization dierence is detected. To be ecient, we bound the rank to 3 during our evaluation. Columns 7 and 8 show the number of dierences in +n+ 12 =T 1 + nT 2 + and +n+ 21 =T 2 + nT 1 + . For a rank-1 analysis, it is the number of read-from edges; for a rank-2 or rank-3 analysis, it is the number of ordered sets of read-from edges. The next two columns show the total number of MayHb edges (used to computeMayRf) inP 1 andP 2 , respectively. The last two columns compare the analysis time of our method and the model checking time of the prior technique [15] to check one dierent edge. For each benchmark, we limit the run time to one hour. Our results show EC-Di often nishes each benchmark in a second whereas the prior technique can take up to 2,384 seconds (rtl8169-2). In total, EC-Di took less than 16 seconds whereas the prior technique took more than 3 hours. In terms of accuracy, except for one program, EC-Di detected all the synchronization dierences. This has been conrmed through manual inspection where the reported dierences are compared with the ground truth. Since we have randomly labeled the original and changed programs asP 1 andP 2 , some of the dierences are in +n+ 12 whereas the others are reported in +n+ 21 . In total, EC-Di found 251 dierences in +n+ 12 and 151 dierences in +n+ 21 . The missed dierence resides in rtl8169-3: after running the rank-3 analysis, our method still could not nd it. The reason is because the dierentiating behavior involves a deadlock and the patch that removed it. We explain why our method cannot detect it in Section 4.4.4. 72 Table 4.1: Experimental results on the rst set of benchmark programs. EC-Di Prior Work [15] Name LoC Th. Type Di. Rank j +n+ 12 j j +n+ 21 j # of mayHB inP 1 # of mayHB inP 2 Time (s) Time (s) case1 52 3 Sync. yes 1 0 7 1,343 1,343 0.26 11.53 case2 53 3 Cond. yes 1 0 3 1,357 1,474 0.26 4.80 case3 67 3 Th.Order yes 1 2 0 546 482 0.19 46.64 case4 94 3 Sync. yes 2 0 1 421 421 0.20 8.59 i2c-hid [15] 76 3 Sync. yes 1 1 0 2,570 2,570 0.28 27.28 i2c-hid-noa [15] 70 3 Sync. yes 1 1 0 1,573 1,573 0.26 7.48 r8169-1 [15] 65 3 Order yes 1 1 0 870 852 0.25 3.38 r8169-2 [15] 80 3 Order yes 1 1 0 873 839 0.25 2.17 r8169-3 [15] 105 4 Order yes 1 1 0 769 769 0.25 8.37 rtl8169-1 [15] 578 8 Order yes 1 1 0 60,741 60,691 0.89 1580.16 rtl8169-2 [15] 578 8 Order yes 1 1 0 60,741 60,741 0.89 2384.14 rtl8169-3 [15] 578 8 Order no 3 0 0 60,741 60,741 2.40 0.00 cherokee [156] 150 3 Sync. yes 1 0 2 1,148 1,148 0.31 7.59 transmission [156] 91 3 Cond. yes 1 1 0 690 613 0.29 6.89 apache-21287 [156] 74 3 Sync. yes 1 2 0 1,406 1,406 0.27 6.29 apache-25520 [156] 181 3 Sync. yes 2 8 0 3,206 3,206 0.33 23.81 account [12] 82 4 Cond. yes 1 0 2 3,701 3,881 0.30 13.46 barrier [12] 138 4 Cond. yes 1 3 0 7,289 6,655 0.26 150.54 boop [12] 134 3 Sync. yes 1 3 0 2,625 2,625 0.25 8.90 bbench [12] 63 3 Cond. yes 1 0 71 5,248 6,321 0.28 1483.33 lazy [12] 76 4 Cond. yes 2 0 6 3,409 3,549 0.24 32.16 reorder [12] 170 5 Cond. yes 1 3 0 9,493 8,737 0.40 12.79 threadRW [12] 147 5 Cond. yes 1 2 0 9,092 8,552 0.30 7.57 lineEq-2t [14] 90 3 Sync. yes 2 0 8 2,905 2,905 0.30 23.34 linux-iio [14] 114 3 Sync. yes 1 3 0 5,851 5,851 0.31 24.13 linux-tg3 [14] 130 3 Cond. yes 1 2 0 15,979 15,160 0.63 617.01 vectPrime [14] 127 3 Sync. yes 2 2 0 35,014 35,014 0.52 2.22 mozilla-61369 [92] 84 3 Cond. yes 1 0 1 473 565 0.25 3.57 mysql-3596 [92] 92 3 Cond. yes 1 1 0 773 733 0.25 3.82 mysql-644 [92] 110 3 Cond. yes 1 0 2 1,343 1,434 0.33 5.40 counter-seq [54] 47 3 Sync. yes 2 0 2 1,135 1,135 0.26 18.13 ms-queue [54] 116 3 Sync. yes 2 2 0 5,754 5,754 0.59 29.01 mysql5 [72] 59 3 Sync. yes 2 0 4 1,283 1,283 0.20 22.92 freebsd-a [155] 176 4 Cond. yes 1 0 22 7,910 10,109 0.33 25.40 llvm-8441 [91] 127 3 Cond. yes 1 0 10 3,042 3,118 0.41 16.36 gcc-25530 [34] 87 3 Sync. yes 2 2 0 806 806 0.20 12.15 gcc-3584 [35] 83 3 Sync. yes 2 2 0 1,843 1,843 0.24 17.23 gcc-21334 [33] 136 3 Sync. yes 2 8 0 5,290 5,290 0.35 195.20 gcc-40518 [36] 102 3 Sync. yes 1 0 8 3,027 3,027 0.25 14.31 glib-512624 [37] 95 3 Sync. yes 1 198 0 5,748 5,748 0.32 >3600.00 jetty-1187 [66] 69 3 Sync. yes 2 0 2 885 885 0.22 19.34 Total 251 151 338,913 339,849 15.57 > 3h > 3600:00 means verication of the edge inP 1 succeeded, but verication of the edge inP 2 timed out after an hour. 73 Table 4.2: Experimental results on the second set of benchmark programs. EC-Di Name LoC Th. Type Di. Rank j +n+ 12 j j +n+ 21 j # of mayHB inP 1 # of mayHB inP 2 Time (s) pbzip-1 [156, 152] 1,143 5 Th.Order yes 1 6 0 782,846 773,934 14.98 pbzip-2 [156, 152] 1,143 7 Th.Order yes 1 12 0 1,150,404 1,135,428 30.61 aget-1 [156, 152] 1,523 4 Cond. yes 1 4 0 1,099,047 1,078,695 9.41 aget-2 [156, 152] 1,523 6 Cond. yes 1 8 0 3,218,034 3,162,684 28.60 pfscan-1 [152] 1,327 3 Cond. yes 1 0 6 2,094,446 2,107,760 19.72 pfscan-2 [152] 1,327 5 Cond. yes 1 0 36 4,138,361 4,164,989 39.96 Total 30 42 12,483,138 1,242,3490 140.28 4.4.3 Results: OptimizingSemanticDingonLarge-sizedBenchmarks Table 4.2 shows our results on the second set of benchmarks, consisting of six medium-sized programs. Note that these programs are already out of the reach of the prior technique [15] due to its requirement of manual code instrumentation; therefore, we only report the statistics of applying EC-Di. Again, the original and modied programs are randomly labeled asP 1 andP 2 , respectively, to facilitate evaluation. In total EC-Di found 30 dierences in +n+ 12 and 42 dierences in +n+ 21 . Furthermore, all of them were found during rank-1 analysis, and conrmed by manual inspection. What is impressive is that these dierences were identied by sifting through a combined total of 24 millionMayHb edges, and yet, the analysis of all programs took only 140 seconds. The eciency is, in large part, due to the restriction of our analysis on instructions that access global variables as opposed to all instructions in the program (refer to the last paragraph of Section 4.3.1). Otherwise, the number of MayHb edges would have been orders-of-magnitude larger. 4.4.4 AnsweringResearchQuestions Now, we answer the two research questions. Q1: Is EC-Di accurate enough for identifying synchronization dierences? The answer is yes. As shown in our experimental results, EC-Di produced a large number of dierences, the majority of which are at rank 74 thread1() { lock(a); lock(b); ... unlock(b); unlock(a); } thread1() { lock(b); lock(a); ... unlock(a); unlock(b); } Figure 4.11: Code from rtl8169-3: the original (left) and changed (right) versions. 1, which means they are individualread-from edges allowed in only one of the two programs, while the rest are at rank 2. Although we do not guarantee that EC-Di nds all dierentiating behaviors, these detected ones have been conrmed by manual inspection. Given that these benchmarks contain real concurrency bug patterns reported and analyzed by many existing tools for testing and repair, the result of EC-Di is suciently accurate. The success in a large part is due to the nature of these programs, where two versions behave almost same except for the thread synchronization. In such cases, our approximate analysis can come really close to the ground truth. Q2: Is EC-Di more ecient than the prior technique based on model checking? The answer is yes. As shown in our results, EC-Di was 10x to 1000x faster and, in total, completed dierential analysis of 13,500 lines of multithreaded C code in about 160 seconds. In contrast, the prior technique took a longer time to analyze a program. Thus, we conclude that EC-Di is eective in identifying synchronization dierences in evolving programs. In practice, when developers update a program to x concurrency bugs or remove performance bugs (e.g., by eliminating redundant locks), the dierences in behavior are often reected in (sets of) data-ow edges being feasible in one version but not in the other version. Thus, computing these (sets of) data-ow edges can be a fast way of checking if the changes introduce unexpected behaviors. The Missing Case: Although EC-Di successfully detected most of the actual dierences, it missed the one in rtl8169-3. Fig. 4.11 shows the code snippet of thread1 from the original program (P 1 ) on the 75 left-hand side and the changed program (P 2 ) on the right-hand side. The purpose of this patch is to resolve a deadlock issue by changing the acquisition order of locks. Since EC-Di focuses solely on data-ow edges, it is not able to detect behavioral dierences related to locking only. In some sense, this is a limitation shared by techniques relying on the notion of abstract traces [130, 15]: the two programs do not have data-related semantic dierence other than the fact that a deadlock exists in one program but does not exist in the other program. 4.5 FurtherDiscussion In this work, we compute symmetric dierences using the abstract partial traces considered on over- approximated behaviors (i.e., MayRf). As each program has dierent abstract traces from over- and under-approximated behaviors, there are various ways to compute dierences. We discuss the usefulness and practicality of the various ways in this section. We consider a set of partial abstract traces for dierentiating semantic changes in a program. There are two types of partial traces: under-approximated partial traces and over-approximated partial traces. Under-approximated partial traces are partial traces that always appear in all possible traces. Whereas, over-approximated partial traces are partial traces that may appear at least one of actual traces. With the two sets of traces, we might consider three more possible dierences as shown in Figure 4.12. Specically, Figure 4.12(a) shows the dierence of abstract traces with two under-approximated behaviors (i.e., n 12 = T 1 nT 2 ), Figure 4.12(b) shows the dierence of abstract traces with under-approximated behavior from over-approximated behavior (i.e., n+ 12 =T 1 nT + 2 ), and Figure 4.12(c) shows the dierence of abstract traces of over-approximated behavior from under-approximated behavior (i.e., +n 12 =T + 1 nT 2 ). We do not consider the dierences between two under-approximated behaviors (i.e., Figure 4.12(a)) due to practical reasons. That is, it is not easy to compute under-approximated behaviors with Datalog rules since the rules only captures possible partial traces, not traces that must appear in all feasible traces, 76 P 1 P 2 (a) Dierence between under-approximated behaviors ( n 12 ) P 1 P 2 (b) Dierence of under-approximated behavior from over-approximated behavior ( n+ 12 ). P 1 P 2 (c) Dierence of over-approximated behavior from under-approximated behavior ( +n 12 ). Figure 4.12: Various dierences between two programs with over and under-approximated behaviors. especially for concurrent programs. Consider Figure 4.13(a) as an example. There are three ReadFrom edges over the variablesx andy: the edge from line 1 to line 2 overx, the edge from line 4 to line 2 over x, and the edge from line 2 to line 3 over y. The rst two rows of Figure 4.13(b) show the classication of the edges into under-approximated traces (i.e.,T ) and over-approximated traces (i.e.,T + ). ForT , there is one edge from 2 to 3. The edges (4, 2) and (1, 2) are not included inT since they do not always appear in all possible traces. For example, once (4, 2) happens (1, 2) never happens in the same trace as the twoReadFrom edges with the same variable destination must not happen together. In practice, it is not 77 trivial to dierentiate a set of traces that must appear from a set of traces that may appear with Datalog rules by enumerating all possible interleavings between threads. Therefore, retrieving under-approximated behavior soundly in a declarative way is practically dicult. Furthermore, in Figure 4.12(c), where the dierence of over-approximated behavior from under- approximated behavior, we can claim that there is no new behavior if there is no dierence. However, this often does not provide useful information. Let us assume we compare two identical programs from Figure 4.13(a) to perform the ding based on Figure 4.12(c). As shown by the last row in Figure 4.13(b), the dierence is still not empty (i.e.,;) even for two identical programs. To sum up, all other ways of computing the dierences between abstract partial traces do not give useful results since it is not easy to accurately compute under-approximated behaviors in Datalog based analysis. thread1 { 1: x = 1; 2: y = x; 3: k = y; } thread2 { 4: x = 2; } RF RF RF (a) A simple program havingT + andT . Fig 4.13(a) T f(2; 3)g T + f(1; 2); (4; 2); (2; 3)g T + nT f(1; 2); (4; 2)g (b) Classication of the edges into under-approximated traces (i.e.,T ) and over-approximated traces (i.e.,T + ). Figure 4.13: Example program withT + andT . 4.6 Summary We have presented a fast and approximate static analysis method for computing the synchronization dierences of two concurrent programs. The method uses Datalog to capture structural information of the 78 programs, and uses a set of inference rules to codify the analysis algorithm. The analysis result, computed by an o-the-shelf Datalog solver, consists of sets of data-ow edges that are allowed by only one of the two programs. We implemented the proposed method and evaluated it on a large number of benchmark programs. Our results show the method is orders-of-magnitudes faster than the prior technique while being suciently accurate in identifying the actual dierences. 79 Chapter5 EcientTestingforWebApplicationswithDOMEventDependency Analysis In this chapter, we propose an ecient testing method for web applications by static interference analysis between UI-events in JavaScript. Static analysis of client-side JavaScript web applications is dicult not only due to the language’s dynamic features [120, 64] but also due to the subtle interactions between JavaScript code and the event- driven execution environment. At the center of this execution environment is the HTML Document Object Model (DOM). The DOM stores the buttons, images, text-boxes, and other visible objects on the web page, together with a large number of event-handler functions attached to these DOM objects. Prior work on statically analyzing JavaScript focused primarily on modeling the language [22, 42, 43, 44, 97, 132, 6] as opposed to the language’s interaction with the DOM. For example, existing methods do not robustly handle dependencies between DOM event handlers, e.g., the various functions responding to the user’s actions, timers, AJAX requests, or their callbacks, despite that such dependencies are crucial in reasoning about client-side web applications. We propose a constraint-based static analysis method for computing dependencies both across event- handlers and between HTML DOM elements. Such DOM event dependencies fundamentally dier from traditional control and data dependencies over program variables because they are tied to the event-driven 80 Source Code (HTML, JavaScript) Normalized Code (CFG) Datalog Facts Datalog Rules 1. Alias Analysis 2. Control/Data Dependency 3. DOM Event Dependency µZ Datalog Engine in Z3 DOM Event Dependencies Automated Testing Tool (Artemis) Figure 5.1: Overall ow of DOM-event dependency analysis. execution environment. Specically, a modern JavaScript web application stores various data inside the DOM while simultaneously using JavaScript code to read and manipulate this data in response to various, often user-triggered, events such as onclick, onload, and timeout. If executing the handlerm A of eventA causes the handlerm B of eventB to be registered, triggered, or removed, we say that eventB depends on eventA, denotedA! DOM B. This diers from the traditional notion of control dependencies (! ctrl ) and data dependencies (! data ) over program variables. Furthermore, statically reasoning about DOM event dependencies is challenging: it requires proper handling of the aliasing between DOM elements, and modeling the eects of APIs provided by the browser and frameworks such as jQuery. 81 Figure 5.1 shows the ow of our DOM event dependency analysis, which follows the declarative program analysis framework [149, 107, 90, 84]. Given the HTML and JavaScript source le(s) of a client-side web application, we rst extract the JavaScript code and generate its control ow graph (CFG). We traverse the CFG to encode its control and data ows in a set of logical constraints called Datalogfacts. Next, we specify our static dependency analysis in a set of Datalog inference rules. Finally, we use an o-the-shelf Datalog engine [56] in Z3 [28] to solve the Datalog program. Internally, the Datalog engine repeatedly applies the set of inference rules to the set of facts until they reach a x-point. The x-point results in a new relation ! DOM over DOM events. This relation allows the user to query for dependency information through Z3’s API. Our method for statically computing DOM event dependencies diers from the prior work. First, it diers from the declarative methods [149, 107, 90, 84, 16] for analyzing programs written in standard programming languages such as Java: we analyze JavaScript web applications. Additionally, the static analysis of Guarnieri and Livshits [42], while targeting JavaScript, focused on type inference as opposed to inter-event-handler dependencies in the HTML DOM. Our method also diers from the dynamic change impact analysis of Alimadadi et al. [5], which analyzed concrete executions to identify the interplay between JavaScript code changes and the content of the DOM: since it is dynamic, their analysis is valid only for the given executions; ours, based on static analysis, is valid for all executions. Madsen et al. [94, 93] proposed several static analyses for JavaScript, but they targeted applications usingNode.js [94] or Windows 8 APIs [93]. The static analysis tool of Jensen et al. [65] modeled some aspects of the HTML DOM and browser APIs, but its focus was on type inference as opposed to a dependency analysis. We implemented our new method in a static analysis tool named JSdep, building upon Esprima for parsing the JavaScript source code,JS-WALA for generating the control-ow graph, and Z3 for solving the Datalog program. We evaluated JSdep on a large set of real-world web applications. Overall, we analyzed 21 programs totaling 18,559 lines of JavaScript code. Our experiments show that our static analysis method 82 can quickly process the JavaScript code of these applications and compute the DOM event dependencies with reasonable accuracy. To demonstrate our technique’s usefulness, we leveraged its results to improve the performance of a popular automated web application testing tool named Artemis [8]. Artemis traverses the application’s execution space by systematically triggering handlers of various DOM events. However, since Artemis cannot statically compute DOM event dependencies, it relies on heuristics for generating sequences of event-handler executions. We show empirically that these heuristics are largely random and introduce many redundant tests. But, our DOM event dependency analysis canprovably prune redundant test sequences and thus direct Artemis to explore truly useful tests. In particular, the default Artemis stuck at 67% statement coverage even after running for 3.5 hours, whereas our new method enabled Artemis to quickly reach 80% coverage. Besides Artemis, our static DOM event dependency analysis may benet other dynamic analysis or symbolic execution tools such as Kudzu [126], SymJS [88], and Jalangi [129]. A problem that is common to these tools is that they lack the capability of conducting a whole-program static analysis; in this sense, our new method is complementary. In a broader sense, our dependency analysis method is useful in many other software engineering applications, e.g., to improve program understanding, software maintenance, automated debugging, and program repair. In summary, the main contributions of this chapter are: • We propose the rst constraint-based static dependency analysis for client-side web applications, taking into consideration not only traditional control and data dependencies but also the new DOM- event dependencies • We propose a new method for leveraging our static dependency analysis results in an automated web application testing tool, Artemis, to eliminate redundant tests and improve test coverage. 83 • We implement these new methods and evaluate them on a large set of web applications to demonstrate the eciency of the static analysis method and its eectiveness in improving automated testing. The remainder of this chapter is organized as follows. We rst establish notation and motivate the main ideas of our methods through examples in Section 5.1. And, we formalize our static dependency analysis in Section 5.2. We present the integration of our dependency analysis with Artemis in Section 5.3. We evaluate our approach empirically in Section 5.4. Finally, we summarize the chapter in Section 5.6. 5.1 BackgroundandMotivation In this section, we introduce the fundamental concepts and notations for our work. An,d we show what DOM event dependencies are and how they can improve the automated testing of web applications. 5.1.1 WebApplications Client-side web applications are executed by the web browser, which loads and parses the HTML/JavaScript les, represents them as a DOM tree, and then executes the JavaScript code. Each node in the DOM tree represents an object on the web page, or a JavaScript code block to be executed immediately after parsing. Each object may also be associated with a set of events initiated either by the user or by the browser, such as onload and onclick. These events are responded to by a set of JavaScript functions called event handlers. For example, when a user clicks a button, the callback function associated with the onclick event will be executed. Callback functions may be registered statically inside the HTML le or dynamically inside the JavaScript code. Although the browser ensures that each callback function is executed atomically, i.e., in a single-threaded fashion, the execution of multiple callback functions may interleave; this makes the execution of the entire web application nondeterministic. 84 5.1.2 JavaScriptStatements LetSt be the set of JavaScript statements. Following the notation of Guarnieri and Livshits [42], we dene the syntax of each statementst2 St as follows. st ::= j [empty] st 1 ;st 2 j [sequence] v = newv 0 (v 1 ;:::;v n ) j [constructor] v 1 =v 2 j [assignment] v 1 =v 2 :f j [load] v 1 :f =v2 j [store] m = function (v 1 ;:::;v n )fst;g j [functionDecl] v =m(v 1 ;:::;v n ) j [functionCall] return v j [return] Each statementst is either empty, an elementary statement, or a sequence of statements of the form st 1 ;st 2 . An elementary statement can be an object construction, wherev 0 is a constructor andv 1 ;:::;v n are its arguments; an assignment; a load of the object eldv 2 :f; a store to the object eldv 1 :f; a denition of a function; a call to a function; or a return from a function. Other complex statements may be transformed into a sequence of equivalent statements through preprocessing prior to applying our analysis. 5.1.3 Points-toAnalysis Points-to analysis is the process of determining whether a reference variablev2V can point too2O, a JavaScript object or an HTML DOM element. As in the literature [123], we useV to denote the set of all reference variables dened in the program,O to denote the set of objects created at the setL of allocation 85 sites, andF to denote the set of object elds. For each sitel i 2L, we map all objects created atl i to a single abstract objecto i 2O. The points-to relation, denotedT ptsTo , consists of a set of pairs of the form (v;o i ), meaning the reference variablev2V points to the objecto i 2O, and of the form (o i :f;o j ), meaning the eldf2F of the objecto i 2O points to the objecto j 2O. We dene an abstract transformer for eachst2 St as a functionf ptsTo :T ptsTo St!T ptsTo , which takes a points-to relationTT ptsTo as input and returns a new points-to relationT 0 T ptsTo as output. For brevity, we provide denitions only for the following statements: • Allocation: l = new c • Assignment: l = r • Store: l.f = r • Load: l = r.f For each of the above statements, the new points-to relationT 0 is dened with respect to the old points-to relationT as follows: • Allocation:T 0 =T[f(l;o i )g • Assignment:T 0 =T[f(l;o i )j (r;o i )2Tg • Store:T 0 =T[f(o i :f;o j )j (l;o i )2T and (r;o j )2Tg • Load:T 0 =T[f(l;o i )j (r;o j )2T and (o j :f;o i )2Tg For an allocation, we add (l;o i ) to the points-to relation. For an assignment, if the pair (r;o i ) is already in the points-to relation, we add (l;o i ) as well. For a store and a load, the abstract transformers are dened similarly. 86 5.1.4 Call-graphConstruction Although many of the function calls in JavaScript code can be resolved to a unique target function at the time of the static analysis, there are cases where the resolution has to be carried out at run time. In such cases, our analysis over-approximates the set of functions that may be called. We leverage the result of our points-to analysis to determine which function may be invoked. Specically, consider the statementl = v 0 .m(v 1 ,...,v n ), wherev 0 2R is a reference variable,m2F is the eld name, andv 1 ;:::;v n 2V are the actual parameters of the function call. Letm i (p 0 ;p 1 ;:::;p n ;ret j ) be a function thatv 0 :m may point to, wherep 0 refers back to the object, ret j refers to the return value, andp 1 ;:::;p n are the formal parameters. For each object thatv 0 may point to, denoted (v 0 ;o 0 )2 T , and for each function thato 0 :m may point to, denoted (o 0 ;m;m i )2 T , we transform the function call to the following statements: • p 1 = v 1 ; :::; p n = v n ; • executing the code inm i (); and • l = ret j . The abstract transformer for the function call is dened as follows:T 0 =T[f (p 0 ;o 0 ); (p 1 ;o 1 );:::; (p n ; o n ); (l;o j )g, such that (v 1 ;o 1 )2T ,:::, (v n ;o n )2T , and (ret j ;o j )2T . 5.1.5 DependencyRelations For each statementst2 St, letV RD (st) be the set of memory locations read byst, andV WR (st) be the set of memory locations written to byst. We dene the traditional control and data dependency relations [32] as follows: a data dependency,! data , exists between two statements st 1 ;st 2 2 St if st 1 is a write to some variablex andst 2 is a read ofx. That is, (st 1 ;st 2 )2! data ifV WR (st 1 )\V RD (st 2 )6=;. A control dependency,! ctrl , exists between two statementsst 1 ;st 2 2 St ifst 1 is a branch statement,st 2 is another statement, and the evaluation of the predicatep inst 1 determines the execution ofst 2 . 87 1 DomA.onclick(function() { 2 c = true; 3 }); 4 DomB.onclick(function() { 5 if (c) { 6 statement1; 7 } else { 8 statement2; 9 } 10 }); Figure 5.2: Example: data/control dependencies in the DOM. Since each JavaScript code block is executed atomically, we are concerned with the dependency relations between code blocks as opposed to the individual statements. Letm 1 andm 2 be two JavaScript functions. We say (m 1 ;m 2 )2! ctrl if executingm 1 may aect the control ow ofm 2 ; that is, there existsst 1 2m 1 andst 2 2m 2 such that (st 1 ;st 2 )2! ctrl . Similarly, we say (m 1 ;m 2 )2! data if (st 1 ;st 2 )2! data . The DOM event dependency relation, in contrast, is dened directly over events. Intuitively, if the execution of some callback functionm 1 of the eventev 1 aects the execution of some callback function m 2 of the event ev 2 , there is a DOM event dependency between ev 1 and ev 2 . More so,m 1 may aect m 2 through control/data dependencies; or,m 1 may aectm 2 by registering, removing, or modifying the callback functions of eventev 2 , which includesm 2 . This eect is unique to the event driven environment of client-side web applications. Formally: Denition3 Two eventsev 1 ;ev 2 2 EV are in the DOM event dependency relation, (ev 1 ;ev 2 )2! DOM , if there exists a callback functionm 1 ofev 1 and a callback functionm 2 ofev 2 such that, • either (m 1 ;m 2 )2 (! data [! ctrl ) , or • executingm 1 registers, removes, or modies the handlerm 2 . Here,T denotes the the transitive closure of a relationT . Consider the code in Figure 5.2 as an example. There are two functions registered as the onclick event handlers of DomA and DomB; c is a global variable used in the two event handlers. Inside the handler of 88 DomA, there is an assignment toc. The value of c is used as the predicate of a branch in the event handler of DomB. So, clickingDomA aects the reachability of the statements guarded by the branchif (c). Thus, DomB is DOM-event dependent onDomA. 5.1.6 MotivatingExample 5.1.6.1 DOMEventDependencyontheExample Consider the example in Figure 5.3. An HTML le denes the DOM elements including four buttons, and a JavaScript le denes the functions manipulating these elements. The four buttons, namedtest1–test4, are referenced in the JavaScript using the variablesa,b,c, andd, respectively. The onclick event handler of a, i.e., the function executed if the buttontest1 is clicked, registers the onclick event handler of c to the functionmakeSomeNoise(). The onclick event handler of b increments the value of x. Sincex is used in makeSomeNoise() to control the branch conditions, the handler of b aects, in some sense, the behavior of the event handler of c. Finally, the event handler of d prints a message to the console. From the JavaScript code, we identify the following dependencies: • Clickingtest1 registers an event handler totest3. • Clickingtest2 increments the value of x, which in turn aects the same handler of test2. • Clickingtest3 traverses the program paths of the handler functionmakeSomeNoise() based on the value of x. We say that the onclick event of test3 depends on the onclick event of test1 since the handler of test3 is registered only when the handler of test1 is executed. Also, the onclick event of test3 depends on the onclick event of test2 since the handler of test2 modies the value of x read by the handler of test3. Similarly, the onclick event of test2 depends on itself due to the reads/writes to x. In contrast, 89 1 <html> 2 <head> 3 <p Click example of three buttons </P> 4 <script type="text/javascript" src="ex.js"></script> 5 </head> 6 <body> 7 <div id="content"> ... </div> 8 <div id="buttons"> 9 <button id="test1" type="button"> b1 </button> 10 <button id="test2" type="button"> b2 </button> 11 <button id="test3" type="button"> b3 </button> 12 <button id="test4" type="button"> b4 </button> 13 </div> 14 </body> 15 </html> 1 var a = document.getElementById(’test1’); 2 var b = document.getElementById(’test2’); 3 var c = document.getElementById(’test3’); 4 var d = document.getElementById(’test4’); 5 var x = 0; 6 function makeSomeNoise() { 7 if (x<2) {console.log("x is lower than 2");} 8 else if (x<4) {console.log("x is lower than 4");} 9 else if (x<6) {console.log("x is lower than 6");} 10 else if (x<8) {console.log("x is lower than 8");} 11 else {console.log("x is higher than 8"); 12 some error codes;} 13 } 14 a.addEventListener("click", function() { 15 c.onclick = makeSomeNoise; }); 16 b.addEventListener("click", function() { 17 x = x + 1; }); 18 d. addEventListener("click", function() { 19 console.log("test4 is clicked!"); }); Figure 5.3: Example HTML page and associated JavaScript le. the handler of test4 does not depend on any DOM event. These DOM event dependencies are shown in Figure 5.4. There are two types of dependencies in Figure 5.4: one relying on traditional control and data depen- dencies, and another relying on the new DOM event dependency relation. Specically, test2 depends on test2 because the read and write of variable x in the handler of test2 changes the program state, 90 Click of test 1 Registration test 3 Click of test 2 Click of test 2 Click of test 2 Click of test 3 Figure 5.4: DOM event dependencies for the example in Figure 5.3. which also aects the behavior of makeSomeNoise() for test3. In contrast, test3 depends on test1 because the handler of test1 installs the handler of test3 – this type of dependency arises only from the event-driven execution environment of the web browser; it cannot be expressed using the traditional control and data dependency relations. To the best of our knowledge, the only work somewhat related to our new dependency analysis is the change impact analysis procedure developed by Alimadadi et al. [5]. It monitors the interplay between JavaScript code changes and their impact on the DOM. However, it relies on a trace-based dynamic analysis, and is therefore only valid for the given execution traces. Our method, in contrast, is solely static and valid over all possible executions. In addition, the modeling of dependencies between event handlers in Alimadadi et al. [5] is not as accurate as our method. In particular, they assume that functiong depends on functionf (Denition 9 in [5]) iff invokesg and either (1) the signature ofg indicates that it takes parameters or (2) the denition off includes a return value. This is a much coarser denition than ours: we model the actual impact of the statements in a function during our dependency analysis. 91 5.1.6.2 ImprovedWebApplicationTestingwithDOMEventDependency Next we show how DOM event dependencies can help improve automated web application testing tools like Artemis. Such tools generate test sequences by systematically triggering user events up to a xed depth. The search tree of our running example (Figure 5.3) up to depth three can be seen in Figure 5.5(a). Each edge represents the execution of an event handler, and each path represents a test sequence. The default algorithm in Artemis ineciently explores the search space since many of its randomly generated test sequences are actually redundant. For example, the onclick event of test4 does not have a DOM event dependency with any DOM event. Any permutation of sequences involvingtest4 is redundant, e.g.: test1!test4!test3 leads to the same behavior astest1!test3!test4 and therefore only one needs to be tested. Using the newly computed DOM event dependencies in Artemis allows many redundant test sequences to be pruned away. We will explain the detailed redundancy-pruning algorithm in Section 5.3, but for now, it suces to say that permutations involving two independent event handlers can safely be ignored without aecting the exploration capability of the tool. After such reduction, the new search tree, shown in Figure 5.5(b), is signicantly smaller. Here, grayed-out edges are those deemed redundant and therefore are skipped. For example, the onclick event of test1 is not dependent with itself as seen in the dependency relation in Figure 5.4. So, executing theonclick event of test1 after anotheronclick of test1 does not alter the program’s state and therefore can be skipped. Similarly,test1!test4!test3 is skipped because an equivalent sequence,test1!test3!test4 has already been tested. Also note that exploring all test sequences up to the depth 3 does not guarantee to cover all statements in this program. Indeed, only the rst branch of the function makeSomeNoise() in Figure 5.3 (Line 9) can be executed; sequences of only length three are not long enough to increment x above 2 while also registering and executing the handler associated withtest3. Fully covering all the statements, in this case, 92 requires at least a sequence of length 15: That is, test1!test3!test2!test2!test3!test2 !test2!test3!test2!test2!test3!test2!test2!test3!test4. Since we need to test up to depth 15 for full coverage, the default search algorithm may explore more than 3 1 + + 3 15 = 21; 523; 359 sequences. In contrast, with our new pruning technique, complete statement coverage can be achieved by exploring at most 60 sequences. We ran Artemis with our new improvement on this example and reached 100% coverage in only 0.37 seconds. The original version of Artemis could not reach 100% coverage after 10 minutes. In the remainder of this chapter, we present the detailed algorithm of our new DOM event dependency analysis. 5.2 Constraint-basedDependencyAnalysis In this section, we present our static analysis algorithm for computing DOM event dependencies. We rst normalize the JavaScript code to break down complex statements into series of simpler state- ments by adding auxiliary variables. Figure 5.6 shows an example of this. Then, we traverse the control ow graph (CFG) of the simplied code and, for each statement, generate its Datalog facts. Later on, these Datalog facts are merged with a predened set of Datalog rules that specify our dependency analysis algorithm. Finally, we use a Datalog engine to solve the program to obtain the analysis results. The Datalog facts generated from the input program populate the relations shown in Figure 5.7. The domains used in these relations are: V , the set of variables; St, the set of statement IDs; O, the set of objects; F , the set of object elds; andE =fload;mouse;keyboard;timeout;ajax;otherg, the set of event handler types. Next, we provide examples on this process. Largely, the relations in Figure 5.7 correspond to various statements in the program, e.g., assignments, loads, and stores. Each statement then, for the most part, generates a corresponding input fact. Specically, every statement in the program is identied with a unique IDst2 St. The formal arguments of a function 93 Artemis Search Tree test 1 test4 test2 test 1 test 2 test 3 test 4 test 1 test 2 test 4 test 1 test 2 test 4 (a) The default algorithm with no pruning. Artemis Search Tree test 1 test4 test2 test 1 test 2 test 3 test 4 test 1 test 2 test 4 test 1 test 2 test 4 (b) With DOM event dependency based pruning. Figure 5.5: Event sequences explored by Artemis for Figure 5.3. Chained statement Normalized form var a = document.images.length; var temp0 = document.images; var temp1 = temp0.length; var a = temp1; Figure 5.6: Example for JavaScript code normalization. 94 Assign(v 1 :V;v 2 :V;st : St) Variable Assignment:v 1 =v 2 with IDst Load(v 1 :V;v 2 :V;f :F;st : St) Object eld load:v 1 =v 2 :f with IDst Store(v 1 :V;f :F;v 2 :V;st : St) Object eld store:v 1 :f =v 2 with IDst FuncDecl(v : V;o : O) v assigned functiono: v = function(){..} Formal(o :O;n :N;v :V ) v is then th formal argument of functiono Actual(st : St;n :N;v :V ) v is then th argument in call-site atst MethodRet(o :O;v :V ) v is the return value of functiono CallRet(st : St;v :V ) Return into variablev at call-sitest Stmt(st : St;o :O) st is a statement in functiono Heap(v :V;o :O) Allocation of heap objecto into variablev PtsTo(v :V;o :O) Variablev points-to objecto DOM(o :O) o is a DOM object DOM-Modify(o :O;e :E;f :O;st : St) Attach functionf to objecto’s event typee Figure 5.7: Input relations dened to specify our analysis. are those used within the function itself, e.g., for functionf(a, b){...},a andb are the formal arguments. Given this function declaration, ifa;b2V represent the variablesa andb andf2O represents function f() then Formal(f; 1;a) and Formal(f; 2;b). At a call-site, v = f(a1, b2), a1 and a2 are the actual arguments, e.g., Actual(s; 1;a 1 ) wheres is the statement ID of the call-site anda 1 2V represents the variablea1. Continuing the example, assume the statementreturn r is the return statement of functionf(). Let r2V representr thenMethodRet(f;r). Each call-site similarly has its own return value from a function. Using the previous call-site letv2V represent variablev, then we haveCallRet(s;v). Next, we present examples for generating facts about DOM elements and operations. DOMReferences We model DOM elements as heap objects and operations modifying DOM elements as those on heap objects. For a DOM elemento d 2O, we add an implicit heap allocation and the corresponding factDom(o d ) indicating thato d is a DOM object. We treat alternate methods to access the same DOM element in a unied manner, e.g., both a = document.getElementById(“model”) and a = $(“#model”)[0] can be used 95 to access the same element. Leto m 2O be the DOM element with ID “model”,v g 2V be an auxiliary variable representing the value return from the getElementById() call at the call-site. Then, we have Dom(o m ), andPtsTo(v g ;o m ) indicating that the result of the call points-to DOM objecto m . Furthermore, leta2V be the variable storing the returned value; we haveAssign(a;v g ;s) wheres is the statement ID of the call. The use of $(“#model”)[0] can be handled similarly with an auxiliary variable. We also handle the various ways of accessing attributes of DOM elements in the same way as reads/writes to objects. For example, usinga.setAttribute(“value”,x); y=a.getAttribute(“value”) results in the facts: Store(a,value,x) andLoad(y;a;value). DOMListeners Prior works on statically analyzing JavaScript often do not accurately model the dynamic registration, triggering, and removal of event handlers. For example, Jensen et al. [65] abstract away the information on where in the DOM tree an event handler is registered. Furthermore, they assumed that load handlers always executed before other kinds of handlers; this may not be true. In contrast, we model such information more accurately. We distinguish the dierent categories of events: load, mouse, keyboard, timeout, ajax, and other. These correspond to event attributes such asonkeydown andonclick. We model both the static and the dynamic methods for registering and removing event handlers. For example,<obj id=“a1” onclick=“script”> statically installs the callback functionscript to the onclick event of the DOM objecta1. In contrast, one may dynamically modify a callback function using an explicit store,tmp.onclick = script, or using an API, tmp.addEventListener(“click”, script). In all three cases above, we generate the same fact: DOM-Modify(o e , mouse, o s , st) whereo e 2 O is the DOM element a1, mouse2 E the type of event, o s 2O thescript function, andst the ID of the statement. Timer related DOM APIs call functions after durations of time, e.g.,setTimeOut(func, t) callsfunc after time t. We model timers with a DOM element o t 2 O with event type timeout. The previous 96 setTimeOut() becomesDOM-Modify(o t ;timeout;o f ;st) whereo f is the object representingfunc, and st the statement ID. Removing DOM event handlers also usesDom-Modify. Considero.removeEventListener(“click”, f); this removes event handler f from o’s “click” event. We model it as Dom-Modify(o;mouse;f;s) whereo2O is the object representingo,f representingf, ands the statement ID. Essentially, the act of removing an event handlerf may eect any of the event handlers whichf may eect; this is the same as if f were installed, or removed. We provide more examples shortly in the next sub-section. As seen in the previous examples, generating the input set of Datalog facts amounts to traversing each statement in the CFG and generating its corresponding fact. Thus, it is a linear-time process. Our modeling of the global objects, and the DOM elements in particular, is analogous to using a single global object and then modeling all reads/writes to JavaScript globals as loads and stores of elds of this global object. DOMAliasing There are three ways of handling DOM node aliasing: over-approximation, under-approx- imation, and precise modeling. Since precise modeling is expensive, we omit it from the discussion. Below is a summary of the other two approaches: • To over-approximate aliasing, the simplest approach is to treat all elements in the DOM as a single abstract object [42, 43]. That is, reading from or writing to one DOM element will be regarded as potentially reading from or writing to any DOM element. A more accurate over-approximation is to group all DOM elements of the same type as a single abstract object [65]. That is, reading from or writing to any integer variable will be regarded as potentially reading from or writing to any integer variable. However, it will be distinguished from a non-integer variable. • To under-approximate aliasing, the simplest approach is to assume that each access (read/write) is on a separate object [5]. That is, one can pretend that dependencies through the DOM don’t exist. 97 These approaches represent two extreme cases, and therefore may not be accurate enough, but have the advantage of being scalable in practice. In this work, we focus on a conservative static analysis that uses the over-approximation. Next, we introduce the rules that specify our DOM dependency analysis. These use the existing facts to infer new relationships. For ease of understanding, we divide these rules into two subsets. The rst subset is for the points-to analysis, and the second subset is for the dependency analysis. These analyses are interleaved in our implementation. 5.2.1 RulesforPoints-ToAnalysis Our rules for the points-to analysis are shown in Figure 5.8. They implement a ow-insensitive and context-insensitive analysis following Guarnieri and Livshits [42]. The main dierence is that we encode the locations of assign, store, and load operations. which will be crucial in computing the DOM event dependencies. PtsTo(v 1 ,o 1 ) Heap(v 1 ,o 1 ) PtsTo(v 1 ,o 1 ) FuncDecl(v 1 ,o 1 ) PtsTo(v 1 ,o 1 ) PtsTo(v 2 ,o 1 ),assign(v 1 ,v 2 ,st) HeapPtsTo(o 1 ,f,o 2 ) Store(v 1 ,f,v 2 ,st),PtsTo(v 1 ,o 1 ),PtsTo(v 2 ,o 2 ) PtsTo(v 1 ,o 1 ) Load(v 1 ,v 2 ,f,st),PtsTo(v 2 ,o 2 ),HeapPtsTo(o 2 ,f,o 1 ) Calls(o,st) Actual(st, 0,v 1 ),PtsTo(v 1 ,o) Assign(v 1 ,v 2 ,st 1 ) Calls(o 1 ,st 1 ),Formal(o 1 ,n 1 ,v 1 ),Actual(st 1 ,n 1 ,v 2 ) Assign(v 1 ,v 2 ,st 1 ) Calls(o 1 ,st 1 ),MethodRet(o 1 ,v 2 ),CallRet(st 1 ,v 1 ) Figure 5.8: Datalog rules for the points-to analysis. Based on the points-to analysis, we can proceed to compute the dependency relations between operations on the DOM elements, including DOM reference, DOM read, and DOM write operations. 98 5.2.2 RulesforDOMEventDependencyAnalysis First, we compute the traditional data dependency relation as in Figure 5.9. Here,v 1 ;v 2 2V are reference variables,st 1 ;st 2 2 St are statement IDs, ando 1 ;o 2 ;o 3 ;f2O are heap objects. Write1(v 1 ,st 1 ) Assign(v 1 ,v 2 ,st 1 ) Write1(v 1 ,st 1 ) Load(v 1 ,v 2 ,f,st 1 ) Write2(v 1 ,f,st 1 ) Store(v 1 ,f,v 2 ,st 1 ) Read1(v 2 ,st 1 ) Assign(v 1 ,v 2 ,st 1 ) Read1(v 2 ,st 1 ) Store(v 1 ,f,v 2 ,st 1 ) Read2(v 2 ,f,st 1 ) Load(v 1 ,v 2 ,f,st 1 ) Data-Dep(st 1 ,st 2 ) Read1(v 1 ,st 2 ),Write1(v 1 ,st 1 ) Data-Dep(st 1 ,st 2 ) Read2(v 1 ,f,st 2 ),Write1(v 1 ,st 1 ) Data-Dep(st 1 ,st 2 ) Read1(v 1 ,st 2 ),Write2(v 2 ,f,st 1 ) PtsTo(v 1 ,o 1 ),PtsTo(v 2 ,o 1 ) Data-Dep(st 1 ,st 2 ) Read2(v 1 ,f,st 2 ),Write2(v 2 ,f,st 1 ) PtsTo(v 1 ,o 1 ),PtsTo(v 2 ,o 1 ) Data-Dep(st 1 ,st 3 ) Data-Dep(st 1 ,st 2 ),Data-Dep(st 2 ,st 3 ) Call-Edge(o 2 ;o 1 ) Calls(o 1 ;st 1 ),stmt(st 1 ;o 2 ) Call-Edge(o 1 ;o 3 ) Call-Edge(o 1 ;o 2 ),Call-Edge(o 2 ;o 3 ) Figure 5.9: Datalog rules for data dependency analysis. To model theData-Dep relation, we use auxiliary relationsWrite1,Write2,Read1,Read2 to represent the writes and reads of variables/elds of objects. They correspond to the rst six rules of Figure 5.9. Given the auxiliary read and write relations, we consider two statements to be data dependent,Data-Dep(st 1 ;st 2 ), if there is a read atst 2 and a write to the same variable(s) atst 1 . The rst two rules are data dependencies through variables; the next two two are data-dependencies through aliasing objects; and the fth rule is for the transitivity of data-dependencies. For two function objectso 1 ;o 2 2O,Call-Edge(o 2 ;o 1 ) if the functiono 1 is called in functiono 2 ; this is specied in the second to last rule in Figure 5.9. The last rule says that the relation is transitive – it represents edges in the call-graph. 99 To compute the control dependency relation, we implemented the algorithm of Cytron et al. [26] on the JavaScript CFG to generate the relationCtrl-Dep(st 1 ;st 2 ), meaning thatst 2 is control-dependent onst 1 . The corresponding Datalog rules are omitted for brevity. Prog-Dep(st 1 ,st 2 ) Data-Dep(st 1 ,st 2 ) Prog-Dep(st 1 ,st 2 ) Ctrl-Dep(st 1 ,st 2 ) Prog-Dep(st 1 ,st 3 ) Prog-Dep(st 1 ,st 2 ) Prog-Dep(st 2 ,st 3 ) Func-Dep(m 1 ,m 2 ) Prog-Dep(st 1 ,st 2 ) Stmt(st 1 ,m 1 ) Stmt(st 2 ,m 2 ) Dom-Prog-Dep(o 1 ,e 1 ,o 2 ,e 2 ) DOM-Modify(o 1 ,e 1 ,m 1 ,st 1 ) DOM-Modify(o 2 ,e 2 ,m 2 ,st 2 ) Func-Dep(m 1 ,m 2 ) DOM-Modify-Dep(o 1 ,e 1 ,o 2 ,e 2 ) DOM-Modify(o 1 ;e 1 ;m 1 ;st 1 ) DOM-Modify(o 2 ;e 2 ;m 2 ;st 2 ) Stmt(st 2 ;m 1 ) DOM-Modify-Dep(o 1 ,e 1 ,o 2 ,e 2 ) DOM-Modify(o 1 ,e 1 ,m 1 ,st 1 ) Call-Edge(m 1 ,m 3 ) DOM-Modify(o 2 ,e 2 ,m 2 ,st 2 ) Stmt(st 2 ,m 3 ) Dom-Dep(o 1 ,e 1 ,o 2 ,e 2 ) DOM-Modify-Dep(o 1 ,e 1 ,o 2 ,e 2 ) Dom-Dep(o 1 ,e 1 ,o 2 ,e 2 ) Dom-Prog-Dep(o 1 ,e 1 ,o 2 ,e 2 ) Figure 5.10: Datalog rules for DOM dependency analysis. Figure 5.10 shows the rules for computing the DOM event dependency relation. Here,m 1 ;m 2 ;m 3 2O are function objects,o 1 ;o 2 2 O are DOM objects, st 1 ;st 2 2 St are statement IDs, ande 1 ;e 2 2 E are DOM event types. First, we create the program-dependence relation [32], Prog-Dep, i.e., the transitive closure of the control- and data-dependencies. Then, we leverageProg-Dep to create theFunc-Dep relation representing dependencies across functions. Finally, we consider the two types of dependencies across DOM events: those through program dependencies, and those involving event handler modications. The relation Dom-Prog-Dep captures the rst case, where a program-dependency exists between two functions called from DOM event handlers. Specically, let two DOM objectso 1 ;o 2 2O have event handlersm 1 andm 2 attached to their events of typee 1 ande 2 , respectively. Ifm 2 is dependent onm 1 , then 100 we say thatDom-Prog-Dep(o 1 ;e 1 ;o 2 ;e 1 ), i.e., there is a DOM event dependency betweeno 2 ’s handler of typee 2 ando 1 ’s handler of typee 1 . The relationDOM-Modify-Dep captures the second cas, when the event handler of one DOM object installs/removes/modies the event handler of another DOM object. The rst DOM-Modify-Dep rule captures the simplest case: there is a functionm 1 which is an associated event handler of DOM objecto 1 ’s event of typee 1 . Also, there is a DOM event handler add/remove/modication at statementst 2 wherest 2 is in functionm 1 . Because there is a DOM modication ofo 2 ’s evente 2 inm 1 (at statementst 2 ) we say thato 2 ’s evente 2 is dependent ono 1 ’s evente 1 : DOM-Modify-Dep(o 1 ;e 1 ;o 2 ;e 2 ). The nextDOM-Modify-Dep rule is similar, but captures the case where a DOM event handler calls a function which modies a DOM object’s event handler. Specically, there is a functionm 1 registered to DOM objecto 1 ’s evente 1 , and there is call fromm 1 to some functionm 3 , which has a DOM modication of objecto 2 ’s evente 2 at statementst 2 . Sincem 1 transitively aects DOM objecto 2 ’s evente 2 through m 3 , we say: DOM-Modify-Dep(o 1 ;e 1 ;o 2 ;e 2 ). Recall thatCall-Edge is dened as the transitive closure of function calls; it captures some DOM event-handler calling an arbitrary sequence of function calls leading to a DOM modication. Finally, the DOM event dependency relation, Dom-Dep, is the combination of Dom-Prog-Dep and DOM-Modify-Dep. 5.2.3 SoundnessoftheAnalysis Since we focus on the over-approximated analysis, we deal with event propagations (capturing and bubbling) and AJAX callbacks conservatively. Recall that how the web browser propagate events through the HTML DOM tree may aect the control ow of the application. When capturing is enabled, the parent element captures the event rst and then passes it down to the children. In contrast, when bubbling is enabled, the target element captures the event rst before passing it up to the parent elements. For eciency reasons, 101 distinguishing these two cases in a static analysis is dicult. Therefore, we conservatively assume that all JavaScript functions in the application may be executed in any order. This approximation also works for modeling the execution of asynchronous callbacks of the AJAX requests. Although our static dependency analysis is designed to be sound in the absence of JavaScript’s reexive features, there is no theoretical guarantee of its soundness. However, this is consistent with the norm of the research eld. As the authors of [120, 64] have argued, due to the impact of JavaScript’s dynamic features, it is impossible to develop a truly sound and, at the same time, practically useful static analysis framework. Thus, in practice, software tools strive for achievingsoundiness [89] as opposed to achievingsoundness. The goal is being as sound as possible without signicantly compromising precision and scalability. Since we focus on improving the test coverage of Artemis, as opposed to proving properties, achieving soundiness is sucient. 5.3 ImprovingAutomatedTesting In this section, we leverage the static DOM event dependency analysis to improve Artemis [8], a popular automated tester of client-side web applications. Since Artemis generates event sequences randomly (like Randoop [111]), it often lacks the ability to reach high statement coverage. During our experiments, for example, the default algorithm in Artemis could not reach more than 65% coverage even after 500 iterations. In contrast, leveraging the result of our static DOM event dependency analysis enabled Artemis to quickly reach 80% coverage. Algorithm 5 shows the default test input generation procedure in Artemis. It takes the initial test hu 0 ;S 0 ; 0 i as input and returns a set Results of explored tests as output. Here, a test input is dened as a tuplehu;S 0 ;i whereu is the URL of the web page, S 0 is the initial state of the application, and = ev 1 :::ev n is a sequence of events. An event ev =hparam;state;envi captures not only activities performed by the user, but also timer responses and AJAX callbacks. Here,param denotes the values of 102 Algorithm5 Test sequence generation algorithm in Artemis. Initially:Worklist :=fg; runArtemis(u 0 ;S 0 ; 0 ). 1: Artemis(URLu 0 , StateS 0 , Sequence 0 ) { 2: Results :=fg; 3: Worklist :=fhu 0 ;S 0 ; 0 ig; 4: while (Worklist6=; and:timeout and:maxruns) 5: c =hu;S;i =Worklist:removeNext(); 6: S 0 := ExecuteApplication(c); 7: Results :=Results[f(c;S 0 )g; 8: //make test inputs by modifying the last event in 9: foreach (variantev 0 n ofev n in = ev 1 :::ev n ) { 10: 0 := ev 1 :::ev n1 ev 0 n ; 11: Worklist :=Worklist[fhu;S; 0 ig 12: } 13: //make test inputs by extending with a new event 14: if (S 0 62VisitedStates) { 15: VisitedStates:add(S 0 ); 16: foreach (ev 0 n+1 enabled atS 0 ) { 17: 00 :=ev 0 n+1 ; 18: if (:IsRedundant(u;S; 00 )) 19: Worklist :=Worklist[fhu;S; 00 ig 20: } 21: } 22: } 23: returnResults; 24: } the event parameters,state denotes the values of the HTML form elds, andenv denotes the values of environment parameters, such as the window size and time of day. Line 18 shows our new pruning method: leveraging the DOM event dependencies to skip redundant sequences. Artemis starts with an empty setResults of tests and a work-list consisting of onlyhu 0 ;S 0 ; 0 i. Then, it loads the web page fromu 0 with initial stateS 0 and executes the sequence 0 of events. LetS 0 be the application state after applying these events. Next, it generates new event sequences using one of the following methods. The rst method is to generate a variant ev 0 n of the last event ev n in the sequence = ev 1 :::ev n ; this creates a new sequence 0 = ev 1 :::ev n1 ev 0 n (Lines 8–12). In this case, ev 0 n =hparam 0 ;state 0 ;env 0 i will have the same event type as ev n but dierent values for the event parameters, form elds, and environmental parameters; meaning 0 may lead to a dierent program state. 103 Algorithm6 Checking if the sequence is redundant. 1: IsRedundant(URLu, StateS, Sequence) { //Default in Artemis 2: return false; 3: } 4: IsRedundant(URLu, StateS, Sequence) { //Our New Method 5: Let = ev 1 :::ev n ev n+1 ; 6: if (ev n 6! DOM ev n+1 ^ev n+1 6! DOM ev n ^ev n 6 lex ev n+1 ) 7: return true; 8: return false; 9: } The second method for generating a new event sequence is to append a new eventev n+1 to the end of to create the new sequence 00 (Lines 13–21). In this case, the main problem is that the default algorithm in Artemis never checks whether 00 is redundant, i.e., whether 00 is equivalent to some event sequence(s) that have already been explored. In contrast, our new method will perform such a check. As shown in Line 18, if 00 is proved to be redundant by this newly added check, it will not be added toWorklist. 5.3.1 PruningRedundantEventSequences Algorithm 6 shows the pseudocode of IsRedundant() used at Line 18 in Algorithm 5. InsideIsRedundant(), the theoretical foundation for deciding whether 00 is redundant is partial order reduction (POR [40, 80, 158]). We say that two sequences 1 and 2 are equivalent if we can transform one sequence into the other by repeatedly swappingadjacent andindependent events. Two eventsev 1 andev 2 are adjacent if they occur consecutively. They are dependent if the two events access the same object and at least one of them is a write (modifying the content of the object); we say they areindependent if the two events are not dependent on each other. Consider 1 = ev 1 :::ev i ev j :::ev n , whereev i andev j are independent. Since swapping the order ofev i andev j does not change the behavior of the application (they are commutative), we know 2 = ev 1 :::ev j ev i :::ev n triggers the same behavior as 1 . Therefore, 1 and 2 are equivalent. During testing, 104 if Artemis has already explored 1 , then we can safely skip 2 , since it suces to test one representative from each equivalence class of sequences. In Algorithm 6, the pruning of equivalent sequences is implemented using a special form of the sleep-set based reduction [40, 144]. Toward this end, we assign the events of the application a lexical order,< lex . When two adjacent eventsev n andev n+1 satisfy the following conditions: • (1) (ev n 6! DOM ev n+1 )^ (ev n+1 6! DOM ev n ), meaning they are independent with each other, and • (2)ev n < lex ev n+1 we choose to explore the sequence:::ev n+1 ev n ::: while skipping the sequence:::ev n ev n+1 :::. As shown in Line 6 of Algorithm 6, we use the result of our DOM event dependency analysis to check whether the two events are independent from each other. 5.3.2 TheRunningExample Consider the application in Figure 5.3, whose DOM event dependencies are shown in Figure 5.4. Since the click event of test1 is independent with itself, we skip the test sequencetest1 ! test1 :::, as shown by the gray path on the left side of Figure 5.5(b). Also, since the click event of test1 is independent with the click of test2, we exploretest1 ! test2 but skiptest2 ! test1; we also skip the subsequence test2 ! test1 ! test2. Similarly, we skip all the other gray sequences in Figure 5.5(b). Therefore, up to depth 3, our new method can reduce the total number of test sequences generated by Artemis from 49 down to 14. 5.4 ImplementationandExperiments 5.4.1 ExperimentalSetup We implemented the new dependency analysis in a software tool named JSdep. It usesEsprima for parsing and normalizing the JavaScript code,JS-WALA for constructing the control ow graph, and theZ xed 105 point engine [56] in Z3 [28] for solving the Datalog program. To demonstrate the usefulness of the analysis we applied it to improve the performance of Artemis [8], a state-of-the-art web application testing tool. Our experiments were designed to answer two questions: • Can JSdep compute the DOM event dependency relations with reasonable accuracy at negligible run-time cost? • Can JSdep help Artemis reach a higher testing coverage than the default algorithm? We evaluated JSdep on a number of client-side web applications. Our benchmarks fall into two groups. The rst are four variants of Figure 5.3, case1 to case4, with four to eight buttons. The second group consists of seventeen real web applications, ranging from hundreds to thousands of lines of code. Two are from Artemis’s benchmarks [8] (ball_pool and 3dmodel). The rest are JavaScript games [1]. In total, there are 21 benchmark applications with 18,559 lines of code total. We ran all experiments on an Intel Quad-Core i5-4440 3.10 GHz CPU with 12 GB of RAM running 64-bit Linux. 5.4.2 Results: DependencyAnalysis Table 5.1 shows the result of the static DOM event dependency analysis. Columns 1–2 show the name of the benchmark program and the number of lines of code. Column 3 shows the maximum number of possible DOM event dependencies, i.e.,N 2 whereN is the number of DOM events in the application. Conceptually, this is the dependency relation used by default in Artemis: every DOM event is dependent on every DOM event. Column 4, in contrast, shows the number of DOM event dependencies found by our analysis. Columns 5–6 show the statistics of our analysis: the size of the Datalog program, and the time spent on the analysis. The time includes parsing, normalizing, and transforming the code, generating the Datalog program, and callingZ to solve the program. Overall, our analysis can very quickly generate DOM event dependency results: the time ranges from 0.5 to 13 seconds. The total time spent on analyzing the 21 benchmarks is less than 1 minute. Also, the 106 results are much better than the theoretical worst case. Next, we show our analysis results are useful: they can signicantly improve the performance of Artemis. Table 5.1: Results of the static DOM-event dependency analysis. Name LOC Total Deps. Calculated Deps. Constraints Time (s) case1 59 16 2 166 0.11 case2 72 16 3 187 0.11 case3 165 36 6 517 0.15 case4 196 64 8 618 0.16 frog 567 361 264 2,398 4.34 cosmos 363 169 144 1,000 0.20 hanoi 246 576 324 1,026 0.23 ipop 525 36 25 2,445 0.34 sokoban 3,056 361 256 2,116 0.35 wormy 570 81 64 3,683 0.42 chinabox 338 49 16 1,281 0.63 3dmodel 5,414 25 19 3,813 13.83 cubuild 1,014 36 25 5,684 6.83 pearlski 960 144 100 4,129 7.17 speedyeater 784 361 64 4,170 0.61 gallony 300 196 72 1,372 0.25 fullhouse 528 64 49 1,007 0.20 ball_pool 1,745 81 30 1,709 0.28 lady 820 121 81 4,564 7.88 harehound 468 529 168 1,976 1.53 match 369 576 400 6,385 4.49 Total 18,559 3,898 2,120 50,246 50.11 5.4.3 Results: ImprovingArtemis Table 5.2 shows the results of running Artemis with and without leveraging the JSdep. Column 1 shows the benchmark’s name. Columns 2–4 show statement coverage after running Artemis andArtemis+JSdep for 100 iterations. Columns 5–7, 8–10, 11–13, and 14–16 show the statement coverages achieved by running them up to 200, 300, 400, and 500 iterations, respectively. Artemis is the algorithm as described by Artzi et 107 Table 5.2: Results of comparing Artemis andArtemis+JSdep with a xed number of iterations. Art. Art. +JSdep Art. Art. +JSdep Art. Art. +JSdep Art. Art. +JSdep Art. Art. +JSdep Name Iter. Cov. (%) Cov. (%) Iter. Cov. (%) Cov. (%) Iter. Cov. (%) Cov. (%) Iter. Cov. (%) Cov. (%) Iter. Cov. (%) Cov. (%) case1 100 70.59 100 200 70.59 100 300 70.59 100 400 70.59 100 500 70.59 100 case2 100 43.90 100 200 43.90 100 300 43.90 100 400 43.90 100 500 43.90 100 case3 100 38.10 62.86 200 38.10 74.29 300 38.10 74.29 400 38.10 95.24 500 38.10 95.24 case4 100 47.12 59.62 200 47.12 72.12 300 47.12 72.12 400 47.12 72.12 500 47.12 72.12 frog 100 86.79 89.29 200 88.93 96.43 300 88.93 96.43 400 88.93 96.79 500 88.93 97.86 cosmos 100 57.48 72.44 200 57.48 77.17 300 57.48 77.17 400 57.48 77.95 500 57.48 77.95 hanoi 100 77.19 76.32 200 77.19 76.32 300 77.19 82.46 400 77.19 82.46 500 77.19 82.46 ipop 100 97.05 95.94 200 97.05 97.05 300 97.05 97.05 400 97.05 97.05 500 97.05 97.05 sokoban 100 73.09 76.46 200 73.09 76.46 300 73.09 76.46 400 73.09 76.46 500 73.09 76.46 wormy 100 39.76 40.95 200 39.76 40.95 300 39.76 40.95 400 39.76 40.95 500 39.76 40.95 chinabox 100 79.88 82.32 200 79.88 83.54 300 79.88 84.15 400 79.88 84.15 500 79.88 84.15 3dmodel 100 64.01 71.50 200 64.01 71.50 300 64.01 71.50 400 64.01 71.98 500 64.01 71.98 cubuild 100 61.30 68.15 200 61.30 73.46 300 61.30 78.42 400 61.30 85.79 500 61.30 85.79 pearlski 100 52.52 52.72 200 52.52 53.72 300 52.52 53.72 400 52.52 53.92 500 52.52 56.54 speedyeater 100 45.93 46.41 200 45.93 53.11 300 45.93 53.35 400 45.93 53.35 500 45.93 54.78 gallony 100 69.86 93.15 200 69.86 94.52 300 69.86 94.52 400 69.86 94.52 500 69.86 94.52 fullhouse 100 77.38 83.33 200 77.38 83.33 300 77.38 83.33 400 77.38 87.50 500 77.38 87.50 ball_pool 100 71.43 89.75 200 73.16 91.24 300 73.16 93.09 400 73.16 93.43 500 73.16 93.43 lady 100 76.13 77.25 200 76.13 79.50 300 76.13 79.50 400 76.13 79.50 500 76.13 79.50 harehound 100 80.28 88.07 200 80.28 91.28 300 80.28 91.28 400 80.28 92.20 500 80.28 92.20 match 100 61.45 50.28 200 61.45 62.01 300 61.45 73.18 400 61.45 73.18 500 61.45 73.18 Average 100 65.29 75.08 200 65.48 78.47 300 65.48 79.66 400 65.48 81.35 500 65.48 81.60 al. [8] with their best prioritization technique enabled. Each iteration executes one test sequence, i.e., an iteration of the loop in Figure 5. Overall, the statement coverage of Artemis+JSdep is 10–16% higher than Artemis For case1 and case2, in particular, the default Artemis algorithm cannot reach 100% coverage even after 500 iterations. Artemis+JSdep can reach 100% coverage within only 100 iterations. Furthermore, as the number of iterations increases the average coverage of Artemis remains stuck at 65%. But, the average coverage of Artemis+JSdep keeps increasing. This is because the default algorithm of Artemis explores many redundant test sequences. Our static analysis results are accurate enough to skip many of these sequences and focus on useful tests. There are some cases whereArtemis+JSdep temporarily had lower coverage than Artemis (e.g., match at 100 iterations). We believe this is mainly due to the inherent randomness of selecting items from the worklist. However, as the number of iterations increases,Artemis+JSdep becomes much better. 108 5.4.4 Results: RedundancyRemovalinTestCaseGeneration Next, we investigated how many sequencesArtemis+JSdep deemed redundant. Table 5.3 summarizes the results. We ran each benchmark for 500 iterations and counted the both the number of sequences generated (Column 3), and the number of sequences we found redundant (Column 4). Without using our method, all the redundant sequences would have been added to the worklist. Overall, we reduce the number of sequences added to the worklist by 36% on average. Examining the dependency results (Table 5.1), we can see our analysis actually nds 46% of the DOM events independent on average. The dierence in the number of reduced sequences and the actual number of independent DOM events comes from the sleep-set approach of Algorithm 6: it does not guarantee to test only one sequences from each equivalence class. This is a limitation of our POR implementation and not the static analysis. Note: Column 3 counts the total number of sequences added to the worklist; only 500 of these were actually executed. In addition to running Artemis andArtemis+JSdep for a xed number of iterations we also ran them for a xed amount of time. Table 5.4 shows the result. Here, Columns 1–2 show the benchmark name and execution time. Columns 3–4 show the number of iterations and statement coverage obtained by Artemis. Columns 5–6 show the number of iterations and statement coverage obtained byArtemis+JSdep. The runtime here, and in all tests, includes the static analysis overhead and the overhead in Artemis to perform the dependency check. So,Artemis+JSdep explores slightly, 92%, fewer iterations on average within the 10 minute bound. Also, the number of iterations explored within the bound depends on the length of the tested sequences; this depends on the length of the sequences skipped. But, we still see a signicant increase in the average of the statement coverage: from 67% achieved by Artemis to 80% achieved byArtemis+JSdep. Overall, this indicates the default Artemis search strategy spends much time on redundant tests. 109 Table 5.3: Results of blocked sequence ratio (step 500). Name Iter. Redund. Checked Redund. Found Ratio (%) case1 500 1,001 499 49.85 case2 500 1,832 1,326 72.38 case3 500 4,436 3,232 72.86 case4 500 4,009 2,976 74.23 frog 500 9,501 1,895 19.95 cosmos 500 6,501 500 7.69 hanoi 500 11,501 2,015 17.52 ipop 500 9,001 8,145 90.49 sokoban 500 9,501 1,830 19.26 wormy 500 3,033 443 14.61 chinabox 500 3,501 1,709 48.81 3dmodel 500 2,949 708 24.01 cubuild 500 3,001 768 25.59 pearlski 500 6,001 749 12.48 speedyeater 500 12,501 1,848 14.78 gallony 500 7,001 3,393 48.46 fullhouse 500 14,001 499 3.56 ball_pool 500 4,501 2,206 49.01 lady 500 5,001 290 5.80 harehound 500 11,501 4,771 41.48 match 500 12,384 5,502 44.43 Average 6,793 2,157 36.05 5.5 FurtherDiscussion In this section, we present more experimental results to quantify the size of the worklist and discuss various factors that may aect the execution time. 110 Table 5.4: Running Artemis andArtemis+JSdep for 10 minutes. Artemis Artemis+JSdep Name Time (s) Iter. Cov. (%) Iter. Cov. (%) case1 600 5,819 70.59 1,972 100 case2 600 5,018 43.90 4,208 100 case3 600 4,292 38.10 7,090 100 case4 600 3,995 47.12 4,532 72.12 frog 600 1,656 88.21 96 84.64 cosmos 600 1,663 57.48 1,123 78.74 hanoi 600 2,782 77.19 1,884 82.46 ipop 600 771 97.05 459 97.05 sokoban 600 1,225 73.09 264 76.68 wormy 600 1,179 52.23 538 40.95 chinabox 600 736 79.88 174 84.15 3dmodel 600 137 64.01 132 71.98 cubuild 600 661 61.30 242 75.51 pearlski 600 1,257 53.32 322 53.72 speedyeater 600 2,688 77.27 2,735 78.47 gallony 600 3,756 69.86 4,596 94.52 fullhouse 600 2,372 77.38 1,107 88.10 ball_pool 600 36 71.43 34 74.19 lady 600 64 75.90 55 76.58 harehound 600 2,383 80.28 2,305 94.50 match 600 2,462 62.01 7,444 73.18 Average 600 2,140 67.50 1,967 80.83 5.5.1 TheSizeoftheWorklist In our experiment in Section 5.4, we conrmed the code coverage can be achieved more than 20 % from Table 5.2 and could block around 36 % of sequences when Artemis adds new sequences. In this section, we check how much our ltering reduces the size of the worklist. 111 % 2% 4% 6% 8% 10% 12% 14% 16% 18% 100 200 300 400 500 case1 case2 case3 case4 (a) The worklist size compared to Artemis for small benchmarks. 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 100 200 300 400 500 (b) The worklist size compared to Artemis for large benchmarks. Figure 5.11: The worklist size compared to Artemis by the number of iterations. Figure 5.11 shows the relative size of worklist compared to the baseline Artemis. The numbers 100, 200, 300, 400 and 500 are the number of iterations executed, and the percentage indicates the relative size of worklist of Artemis+JSdep compared to Artemis. Therefore, a smaller percentage means a smaller size of worklist. 112 Figure 5.11(a) shows the rst group of our benchmarks, four variants of Figure 5.3, case1 to case4, which were designed to show the eectiveness of our ltering. With our lteringArtemis+JSdep reduces the size of worklist to 20% compared to the baseline Artemis over all iterations up to 500. Specically, for case1 and case2 the size of worklist becomes extremely small (less than 2%). This is because the calculated dependencies are only 2 and 3 out of 16 dependencies as shown in Table 5.1. Figure 5.11(b) considers real-world benchmarks where the reduction of worklist is less eective than the rst group. That is, our ltering reduces the size of worklist from 90% to 30% of the existing worklist. We can see the size of worklist depends on the ratio of the calculated dependencies from Table 5.1. For example, for fullhouse and ipop, the worklist sizes are around 90% while the ratio of their computed dependencies are high: 24/36 (69%) and 49/64 (76%). On the other hand, for gallony and ball_pool, the worklist sizes are around 30% while the ratio of their computed dependencies are low: 30/81 (37%) and 72/196 (36.7%). Moreover, note that the ratio of worklist size does not change based on the number of iterations: across all benchmarks, the ratio of worklist does not change drastically. A possible explanation is that, as we lter out specic portion of dependencies the growth of worklist which grows by adding new event sequences tends to be linear. For example, if there are four events available, the worklist size grows by four after adding the four new events. And, if our method lters out two of the four events, the size of worklist grows by two. Therefore, the ratio of worklist size does not change as both worklists grow linearly. Based on the observation, we hypothesize that if we reduce the size of worklist further by getting smaller size as the number of iterations increases, we would drastically increase the code coverage. While it is out of the scope of this dissertation work, one possible way is to use subsumption relations between event sequences in worklist, which helps to avoid executing the same sequence again. 113 5.5.2 FactorsthatMayAecttheExecutiontime In this section, we focus on factors that may aect the execution time of Artemis and discuss what approach can be used to improve the automated testing of web applications. As shown in Table 5.4, the number of iterations executed for 10 minutes varies between the baseline Artemis and Artemis augmented by our method. For example, for 10 minutes in case1 benchmark, Artemis executed 5,819 iterations and Artemis+JSdep executed 1,972 iterations, which might aect the code coverage achieved in the same amount of running time. Table 5.5 shows the execution time of Artemis and Artemis+JSdep after 500 iterations. Column 2 shows the execution time of Artemis and column 3 shows the execution time of Artemis+JSdep. As the execution times vary by dierent benchmarks, we provide a normalized execution time of Artemis+JSdep compared to Artemis in Figure 5.12. In the gure, the red-dashed line indicates where the execution time of Artemis+JSdep is same as the one of Artemis. And if the point is located above the line, it means Artemis+JSdep is slower than Artemis for a benchmark. For example,Artemis+JSdep is four times slower than Artemisor case1. Based on the gure, we are not able to conclude thatArtemis+JSdep has ltering overhead that always makes the testing slower than Artemis sinceArtemis+JSdep is faster in some of the benchmarks. Also, ltering overheads in our experiments are less than 0.01 second for 500 iterations, meaningArtemis+JSdep’s ltering overhead does not signicantly aect the execution time. We consider two possible factors that might aect the execution time: the length of test sequences and code coverage with read/written properties. For the rst factor, we accumulate the length of test sequences and execution time after 500 iterations over all benchmark programs. For the second factor, we use statistical results from Webkit in Artemis after running 500 iterations. Figure 5.13 shows a scatter plot of the execution time of Artemis by length of test sequences executed. In the gure, y-axis indicates the execution time and x-axis indicates the length of sequences executed. The 114 Table 5.5: Execution time of Artemis andArtemis+JSdep after 500 iterations. Name Artemis (s) Artemis+JSdep (s) case1 7.59 30.22 case2 8.39 7.98 case3 11.76 3.99 case4 11.88 5.24 frog 192.67 403.14 cosmos 77.12 95.45 hanoi 21.36 20.27 ipop 273.61 338.66 sokoban 160.84 321.67 wormy 140.25 335.69 chinabox 173.1 1819.4 3dmodel 2163.7 2200.53 cubuild 183.51 1095.99 pearlski 185.69 239.38 speedyeater 29.32 19.33 gallony 16.59 11.37 fullhouse 35.24 25.98 ball_pool 8166.98 8479.86 lady 1188.81 9977.03 harehound 18.28 31.8 match 26.74 12.86 plot shows that in some benchmarks, executing short test sequences takes much longer time than executing long test sequences. Thus, we are not able to conclude that there is a correlation between the length of test sequences and the execution time since the plot does not show any tendency. Let us consider Figure 5.14 which shows scatter plots of execution times with code coverage and read/written properties. Figure 5.14(a) shows the relation between execution time and code coverage. Here, 115 Figure 5.12: Normalized execution time of Artemis+JSdep compared to Artemis after 500 iterations. Length of a test sequence Time (s) 0 2000 4000 6000 8000 10000 12000 0 10000 20000 30000 40000 50000 60000 70000 Figure 5.13: The execution time of Artemis by the length of test sequences executed. the number of coverage indicates the lines of code covered during tests by counting overlapped lines. Therefore, higher code coverage means more lines of code are executed by a test. It is obvious that the plot shows the lines of code covered is proportional to the execution time as a test takes more time to 116 Coverage (Lines of code) Time (s) 0 2000 4000 6000 8000 10000 12000 0 50000000 100000000 150000000 200000000 250000000 300000000 350000000 (a) The execution time by covered lines of code including overlaps. # of read/written properties Time (s) 0 2000 4000 6000 8000 10000 12000 0 100000000 200000000 300000000 400000000 500000000 600000000 700000000 800000000 900000000 (b) The execution time by the number of read/written properties. Figure 5.14: The execution time of Artemis by covered lines of code and read/written properties. execute more lines of code. Figure 5.14(b) which shows a scatter plot of execution time and the number of read/written properties during testing, has a similar trend as Figure 5.14(a). Since executing more lines of code directly contributes to reading and writing more variables, the two scatter plots in Figure 5.14 show similar trends in the execution time. 117 To sum up, based on the three scatter plots from Figure 5.13 and Figure 5.14, we would conclude that the length of test sequences does not directly contribute to executing more lines of code, and thus it has a weak correlation with the execution time. On the other hand, testing more lines of code by reading and writing more variables aects the testing time. Therefore, instead of executing a test sequence every time, if we cache some program states corresponding to some of the common test subsequences in Artemis, the testing time can be drastically reduced. 5.6 Summary We have presented a constraint-based method for statically computing DOM event dependencies in client- side web applications by formulating the static analysis as a Datalog program. We have also presented a method for leveraging the result of our dependency analysis to improve the performance of a popular web application testing tool named Artemis. We implemented our methods and evaluated them on real web applications. Our experiments show that the new methods can compute DOM event dependencies with reasonable accuracy and at a negligible cost. Furthermore, they allow Artemis to signicantly reduce test sequence redundancies and therefore improve the test coverage. 118 Chapter6 ConclusionandFutureWork In this chapter, I summarize the main contributions of my dissertation work and then discuss directions of future work. 6.1 Summary As concurrent software becomes ubiquitous in our daily life, the number of concurrent applications grows rapidly. However, writing correct and ecient concurrent software is notoriously hard. To address the limitations, the goal of this dissertation is to develop an analysis of interferences in concurrent programs for improving performance of automated testing and verication techniques for concurrent software. The hypothesis statement of the dissertation is: Analyzinginterferencesbetweenconcurrenteventswithconstraint-basedprogram analysishelpstoimprovetheperformanceofautomatedtestingandvericationtechniques forconcurrentsowaresignicantly To evaluate the hypothesis of this dissertation, I have implemented a constraint-based program analysis framework by designing rules to analyze feasible and infeasible interference between concurrent events. The goal is to leverage the analysis results to improve the performance of automated testing and verication techniques for concurrent software. To test the hypothesis, I focused on three representative concurrent 119 reasoning techniques: modular abstract interpretation of interrupt-driven software, testing of web applica- tions and semantic ding of multi-threaded programs. For each of the three techniques, I compared the baseline performance with the performance of the technique after it is augmented with our analysis results in scalability and accuracy. The rst application is modular abstract interpretation for verifying interrupt-driven programs. Without the knowledge of interrupt-driven program semantics, the baseline method generates a lot of false positives. To obtain a more accurate method, I developed a constraint-based program analysis for soundly and eciently identifying and pruning infeasible interference between interrupts. Then, I integrated the method into modular abstract interpretation which achieves more accurate verication results by reducing false positives. Through an experimental evaluation with a large number of interrupt-driven programs, I demonstrated the eectiveness of my method. The second application is semantic ding of concurrent programs. I proposed a fast and approximate analysis to compute semantic dierences in multi-threaded programs. To improve scalability, I designed and implemented rules in a constraint-based program analysis to analyze infeasible interference between threads. Then, I leveraged the analysis results to compute semantic dierences between two versions of multi-threaded programs. In this application, I conrmed the method achieves show the high accuracy and low overhead in semantic ding compared to the model checking based semantic ding technique. The third application is testing web applications. I proposed a method to to eliminate redundant tests and improve test coverage with DOM-event dependencies in web application testing. I implemented a constraint-based analysis framework to statically analyze DOM-event dependencies which show feasible interferences between UI events. The analysis results are imported to a popular web application tool. Based on the analysis results, it soundly prunes redundant tests by utilizing partial order reduction. In this application, I conrmed the eciency of the static analysis method and its eectiveness in improving automated testing. 120 Overall, the use of constraint-based program analysis for feasible and infeasible interference analysis and the three applications have demonstrated it is eective in improving automated testing and verication for concurrent programs while having negligible overhead. Specically, the analysis helps to overcome the state explosion in concurrency reasoning by improving both accuracy and scalability. 6.2 FutureWork Although the main topic of the dissertation is improving concurrency reasoning techniques with constraint- based program analysis, the contributions made has a broader impact various future researches. This section discusses the broader impact and possible future research directions. 6.2.1 OptimizingRegressionTestSelection One possible research direction in the future is to apply our interference analysis to test case selection. Regression test selection (RTS) has been widely used for testing evolving software. Since running all available tests upon each program revision is costly, RTS aims to identify only those tests that are aected by the code changes in order to reduce the testing time. For example,Dejavu [121], a widely used RTS algorithm, selects a test only if a changed program point in the control ow graph, named the dangerous edge, is covered by executing the test. There have been many follow-up works that improve uponDejavu for general-purpose software using better change-impact estimation techniques from statement-level with control ows [121, 122, 52], data ows [51, 110, 67, 136], program slicing [10, 49, 48], to method and class-level [20, 119, 78, 86, 95, 109, 29, 57, 131] and le-level dependencies [38, 39, 19]. Although there are many regression testing approaches, they do not focus on leveraging concurrency related information, especially for testing event-driven applications such as mobile applications. It may 121 be a fruitful endeavor to leverage interference analysis between events can be used to optimize RTS of event-driven applications. x = 1 Testcase1 y = 1 a = x b = y event1 event2 event3 event4 x = 1 Testcase2 a = x z = a b = 3 event1 event3 event5 event6 Figure 6.1: Example test cases with events. Figure 6.1 shows two test cases as examples with four UI events for each test case. Each circle in the gure indicates an event with a label below of the circle and an instruction in it. Specically, test case 1 triggersevent1,event2,event3 andevent4 in order. It means four instructions are executed in the test case in order: x = 1,y = 1,a = x andb = y. Test case 2 triggersevent1,event3,event5 andevent6, and four instructions are executed in the test case in order: x = 1,a = x,z = a andb = z. x=2 Testcase1 y = 1 a = x b = y event1 event2 event3 event4 Interfere onx x=2 Testcase2 a = x z = a b = 3 event1 event3 event5 event6 Interfere onx Interfere ona Figure 6.2: Impacted events based on interference after updatingevent1 in Figure 6.1. Let us consider a case where a developer updated the program and change the instruction fromx = 1 tox = 2 inevent1. Do we need both test cases to test the program if the change made by the developer still oers correct functionality? Based on traditional test case selection, both test cases are likely to be 122 selected as they testevent1. On the other hand, when we consider interference we are able to select test case 2 only. Figure 6.2 shows the two test cases from Figure 6.1 with impacted events based on interferences after the change in event1. The impacted events and interferences are red colored in the gure. For the test case1,event3 is impacted by the change fromevent1 as it reads the shared variablex fromevent1. Other events are not impacted. For the test case 2,event3 andevent5 are impacted by the change fromevent1. Specically, event3 is impacted as it reads the shared variable x from event3, and event5 reads the impacted shared variablea fromevent3. Based on the gure, we can skip the test case 1 for regression testing once we test the test case 2 since the shared variables access onx betweenevent1 andevent3 are already tested in the test case 2. Although the example shows only two test cases with relatively simple interference between events, we believe the interference analysis helps to optimize regression test selection process further, especially, for event-driven applications. 6.2.2 QueryOptimizationintheDatalogSolver In the dissertation, the constraint-based analysis techniques were all implemented using Datalog. The Datalog solver, however, can be a bottleneck in an analysis due to a large number of facts and complex rules. Thus, there is a need to develop more ecient algorithms to support an analysis on large-sized programs. For example, computing transitivity of two program points in an o-the-shelf Datalog solver shows many redundant relations during xed point computation. Consider the rules computing reachability over program nodes, which are commonly used in program analysis. flow(x,y) edge(x,y) flow(x,y) edge(x,z)^flow(z,y) 123 Let us assume a set of edge relations,S =fedge(i;i + 1)j0i 99g is given as facts. And, we would like to checkflow(1, _ ) which shows the all nodes reachable from node 1. Without the query information, the bottom-up evaluation which is usually used for Datalog solvers applies the rules on relations until it reaches the xed point. Therefore, it eventually generates multiple reachability relations proportional to O(N 2 ) where N is the number of nodes. However, to check flow(1, _ ) exists, we do not need all flow relations starting from other than 1 (e.g., edge(2, 5)), which might drastically blow up the number of iterations in computation to reach the xed point with large-sized programs. There have been many studies in optimizing query evaluations especially for Datalog. One of the dominant strategies is Magic Set Transformation (MST) [9, 11]. MST rewrites the code by inserting magic predicates which infer only relevant arguments in predicates based on target query. Since the initial work, many variants of MST have been proposed such as supplementary magic sets [125], generalized counting [124], magic template [114], variant demand transformation [138] and subsumptive demand transformation [137] to improve time and space complexity of computing and generating magic predicates. Filtering [73, 75, 74] can also optimize some types of queries in xed point computation. However, all the works are not generally applicable as the performances of re-written rules heavily depend on dierent types of rules and queries. To generally support query optimizations, Soué [68] keeps updating its features such as relation reordering. So, we think various static and dynamic analysis on rules can be applied to improve internal xed point computation of Datalog engines, which is directly related to the performance of constraint-based program analysis. 124 References [1] 100 Online JavaScript Games. URL: http://www.lutanho.net/stroke/online.html. [2] Martín Abadi and Leslie Lamport. “The Existence of Renement Mappings”. In: Theoretical Computer Science 82.2 (May 1991), pp. 253–284.issn: 0304-3975. [3] Vikram Adve, Chris Lattner, Michael Brukman, Anand Shukla, and Brian Gaeke. “LLVA: A Low-level Virtual Instruction Set Architecture”. In: ACM/IEEE international symposium on Microarchitecture. Dec. 2003. [4] Aws Albarghouthi, Paraschos Koutris, Mayur Naik, and Calvin Smith. “Constraint-Based Synthesis of Datalog Programs”. In: International Conference on Principles and Practice of Constraint Programming. 2017, pp. 689–706. [5] Saba Alimadadi, Ali Mesbah, and Karthik Pattabiraman. “Hybrid DOM-Sensitive Change Impact Analysis for JavaScript”. In: 29th European Conference on Object-Oriented Programming, ECOOP 2015, July 5-10, 2015, Prague, Czech Republic. 2015, pp. 321–345. [6] Esben Andreasen and Anders Møller. “Determinacy in static analysis for jQuery”. In:ACMSIGPLAN Conference on Object Oriented Programming, Systems, Languages, and Applications. 2014, pp. 17–31. [7] Stephan Arlt, Andreas Podelski, and Martin Wehrle. “Reducing GUI Test Suites via Program Slicing”. In: Proceedings of the 2014 International Symposium on Software Testing and Analysis. ISSTA 2014. San Jose, CA, USA: Association for Computing Machinery, 2014, pp. 270–281.isbn: 9781450326452.doi: 10.1145/2610384.2610391. [8] Shay Artzi, Julian Dolby, Simon Holm Jensen, Anders Moller, and Frank Tip. “A framework for automated testing of JavaScript web applications”. In: International Conference on Software Engineering. 2011, pp. 571–580. [9] Francois Bancilhon, David Maier, Yehoshua Sagiv, and Jerey D Ullman. “Magic Sets and Other Strange Ways to Implement Logic Programs (Extended Abstract)”. In: Proceedings of the Fifth ACM SIGACT-SIGMOD Symposium on Principles of Database Systems. PODS ’86. Cambridge, Massachusetts, USA: Association for Computing Machinery, 1985, pp. 1–15.isbn: 0897911792.doi: 10.1145/6012.15399. 125 [10] Samuel Bates and Susan Horwitz. “Incremental program testing using program dependence graphs”. In: Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages. ACM. 1993, pp. 384–396. [11] Catriel Beeri and Raghu Ramakrishnan. “On the power of magic”. In: The Journal of Logic Programming 10.3 (1991). Special Issue: Database Logic Progamming, pp. 255–299.issn: 0743-1066. doi: https://doi.org/10.1016/0743-1066(91)90038-Q. [12] Dirk Beyer. “Software Verication and Veriable Witnesses”. In: International Conference on Tools and Algorithms for Construction and Analysis of Systems. 2015, pp. 401–416. [13] Sandeep Bindal, Sorav Bansal, and Akash Lal. “Variable and Thread Bounding for Systematic Testing of Multithreaded Programs”. In: International Symposium on Software Testing and Analysis. 2013, pp. 145–155. [14] Roderick Bloem, Georg Hoerek, Bettina Könighofer, Robert Könighofer, Simon Außerlechner, and Raphael Spörk. “Synthesis of Synchronization Using Uninterpreted Functions”. In: International Conference on Formal Methods in Computer-Aided Design. 2014, 11:35–11:42. [15] Ahmed Bouajjani, Constantin Enea, and Shuvendu K. Lahiri. “Abstract Semantic Ding of Evolving Concurrent Programs”. In: International Symposium on Static Analysis. Cham: Springer International Publishing, 2017, pp. 46–65. [16] Martin Bravenboer and Yannis Smaragdakis. “Strictly Declarative Specication of Sophisticated Points-to Analyses”. In: ACM SIGPLAN Conference on Object Oriented Programming, Systems, Languages, and Applications. 2009, pp. 243–262. [17] Doina Bucur and Marta Kwiatkowska. “On software verication for sensor nodes”. In: Journal of Systems and Software 84.10 (2011), pp. 1693–1707. [18] Cristian Cadar, Daniel Dunbar, and Dawson Engler. “KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs”. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation. OSDI’08. San Diego, California: USENIX Association, 2008, pp. 209–224. [19] Ahmet Celik, Marko Vasic, Aleksandar Milicevic, and Milos Gligoric. “Regression Test Selection Across JVM Boundaries”. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. ESEC/FSE 2017. Paderborn, Germany: ACM, 2017, pp. 809–820.isbn: 978-1-4503-5105-8.doi: 10.1145/3106237.3106297. [20] Yih-Farn Chen, David S. Rosenblum, and Kiem-Phong Vo. “TestTube: A System for Selective Regression Testing”. In: Proceedings of the 16th International Conference on Software Engineering. ICSE ’94. Sorrento, Italy: IEEE Computer Society Press, 1994, pp. 211–220.isbn: 0-8186-5855-X.url: http://dl.acm.org/citation.cfm?id=257734.257769. [21] Lin Cheng, Jialiang Chang, Zijiang Yang, and Chao Wang. “GUICat: GUI Testing as a Service”. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. ASE 2016. Singapore, Singapore: Association for Computing Machinery, 2016, pp. 858–863.isbn: 9781450338455.doi: 10.1145/2970276.2970294. 126 [22] Ravi Chugh, Jerey A. Meister, Ranjit Jhala, and Sorin Lerner. “Staged information ow for javascript”. In: ACM SIGPLAN Conference on Programming Language Design and Implementation. 2009, pp. 50–62. [23] E. Clarke, D. Kroening, and F. Lerda. “A Tool for Checking ANSI-C Programs”. In: International Conference on Tools and Algorithms for Construction and Analysis of Systems. 2004, pp. 168–176. [24] Patrick Cousot and Radhia Cousot. “Abstract Interpretation: A Unied Lattice Model for Static Analysis of Programs by Construction or Approximation of Fixpoints”. In: ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages. 1977, pp. 238–252. [25] Patrick Cousot, Radhia Cousot, Jérôme Feret, Laurent Mauborgne, Antoine Miné, David Monniaux, and Xavier Rival. “The ASTREÉ Analyzer”. In: European Symposium on Programming Languages and Systems. 2005, pp. 21–30. [26] Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck. “Eciently Computing Static Single Assignment Form and the Control Dependence Graph”. In: ACM Trans. Program. Lang. Syst. 13.4 (1991), pp. 451–490. [27] Steven Dawson, C. R. Ramakrishnan, and David S. Warren. “Practical Program Analysis Using General Purpose Logic Programming Systems&Mdash;a Case Study”. In: ACM SIGPLAN Conference on Programming Language Design and Implementation. 1996, pp. 117–126. [28] Leonardo De Moura and Nikolaj Bjørner. “Z3: An Ecient SMT Solver”. In: International Conference on Tools and Algorithms for the Construction and Analysis of Systems. 2008, pp. 337–340. [29] Emelie Engström and Per Runeson. “A qualitative survey of regression testing practices”. In: International Conference on Product Focused Software Process Improvement. Springer. 2010, pp. 3–16. [30] Azadeh Farzan and Zachary Kincaid. “Verication of parameterized concurrent programs by modular reasoning about data and control”. In: ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages. 2012, pp. 297–308. [31] Asger Feldthaus, Max Schäfer, Manu Sridharan, Julian Dolby, and Frank Tip. “Ecient construction of approximate call graphs for JavaScript IDE services”. In: International Conference on Software Engineering. 2013, pp. 752–761. [32] Jeanne Ferrante, Karl J. Ottenstein, and Joe D. Warren. “The Program Dependence Graph and Its Use in Optimization”. In: ACM Transactions on Programming Languages and Systems 9.3 (July 1987), pp. 319–349. [33] GCC bug 21334. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21334. [34] GCC bug 24430. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25330. [35] GCC bug 3584. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=3584. [36] GCC bug 40518. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40518. 127 [37] Glib bug 51264. https://bugzilla.gnome.org/show_bug.cgi?id=512624. [38] Milos Gligoric, Lamyaa Eloussi, and Darko Marinov. “Ekstazi: Lightweight Test Selection”. In: 2015 IEEE/ACM37thIEEEInternational ConferenceonSoftwareEngineering. Vol. 2. May 2015, pp. 713–716. doi: 10.1109/ICSE.2015.230. [39] Milos Gligoric, Lamyaa Eloussi, and Darko Marinov. “Practical Regression Test Selection with Dynamic File Dependencies”. In: Proceedings of the 2015 International Symposium on Software Testing and Analysis. ISSTA 2015. Baltimore, MD, USA: ACM, 2015, pp. 211–222.isbn: 978-1-4503-3620-8.doi: 10.1145/2771783.2771784. [40] P. Godefroid. Partial-Order Methods for the Verication of Concurrent Systems - An Approach to the State-Explosion Problem. Springer, 1996.isbn: 3-540-60761-7. [41] Benny Godlin and Ofer Strichman. “Time for Verication”. In: ed. by Zohar Manna and Doron A. Peled. Berlin, Heidelberg: Springer-Verlag, 2010. Chap. Inference Rules for Proving the Equivalence of Recursive Procedures, pp. 167–184.isbn: 3-642-13753-9, 978-3-642-13753-2. [42] Salvatore Guarnieri and V. Benjamin Livshits. “GATEKEEPER: Mostly Static Enforcement of Security and Reliability Policies for JavaScript Code”. In: USENIX Security Symposium. 2009, pp. 151–168. [43] Arjun Guha, Shriram Krishnamurthi, and Trevor Jim. “Using static analysis for Ajax intrusion detection”. In: International Conference on World Wide Web. 2009, pp. 561–570. [44] Arjun Guha, Claudiu Saftoiu, and Shriram Krishnamurthi. “Typing Local Control and State Using Flow Analysis”. In: European Symposium on Programming. 2011, pp. 256–275. [45] Shengjian Guo, Markus Kusano, and Chao Wang. “Conc-iSE: Incremental Symbolic Execution of Concurrent Software”. In: IEEE/ACM International Conference On Automated Software Engineering. 2016, pp. 531–542. [46] Shengjian Guo, Markus Kusano, Chao Wang, Zijiang Yang, and Aarti Gupta. “Assertion Guided Symbolic Execution of Multithreaded Programs”. In: ACM SIGSOFT Symposium on Foundations of Software Engineering. 2015, pp. 854–865. [47] Shengjian Guo, Meng Wu, and Chao Wang. “Symbolic execution of programmable logic controller code”. In: ACM SIGSOFT Symposium on Foundations of Software Engineering. 2017, pp. 326–336. [48] Rajiv Gupta, Mary Jean Harrold, and Mary Lou Soa. “An approach to regression testing using slicing”. In: Software Maintenance, 1992. Proceerdings., Conference on. IEEE. 1992, pp. 299–308. [49] Rajiv Gupta, Mary Jean Harrold, and Mary Lou Soa. “Program slicing-based regression testing techniques”. In: Software Testing, Verication and Reliability 6.2 (1996), pp. 83–111. [50] Elnar Hajiyev, Mathieu Verbaere, and Oege de Moor. “CodeQuest: Scalable Source Code Queries with Datalog”. In: European Conference on Object-Oriented Programming. 2006, pp. 2–27. 128 [51] M. J. Harrold and M. L. Soua. “An incremental approach to unit testing during maintenance”. In: Proceedings. Conference on Software Maintenance, 1988. Oct. 1988, pp. 362–367.doi: 10.1109/ICSM.1988.10188. [52] Mary Jean Harrold, James A. Jones, Tongyu Li, Donglin Liang, Alessandro Orso, Maikel Pennings, Saurabh Sinha, S. Alexander Spoon, and Ashish Gujarathi. “Regression Test Selection for Java Software”. In: Proceedings of the 16th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications. OOPSLA ’01. Tampa Bay, FL, USA: ACM, 2001, pp. 312–326. isbn: 1-58113-335-9.doi: 10.1145/504282.504305. [53] Nevin Heintze and Olivier Tardieu. “Demand-driven Pointer Analysis”. In: ACM SIGPLAN Conference on Programming Language Design and Implementation. 2001, pp. 24–34. [54] Maurice Herlihy and Nir Shavit. The Art of Multiprocessor Programming. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2008.isbn: 0123705916, 9780123705914. [55] Makoto Higashi, Tetsuo Yamamoto, Yasuhiro Hayase, Takashi Ishio, and Katsuro Inoue. “An Eective Method to Control Interrupt Handler for Data Race Detection”. In: Proceedings of the 5th Workshop on Automation of Software Test. 2010, pp. 79–86. [56] Krystof Hoder, Nikolaj Bjørner, and Leonardo de Moura. “muZ - An Ecient Engine for Fixed Points with Constraints”. In: International Conference on Computer Aided Verication. 2011, pp. 457–462. [57] Pei Hsia, Xiaolin Li, David Chenho Kung, Chih-Tung Hsu, Liang Li, Yasufumi Toyoshima, and Cris Chen. “A technique for the selective revalidation of OO software”. In: Journal of Software Maintenance: Research and Practice 9.4 (1997), pp. 217–233. [58] Benchmark programs for evaluating intAbs. URL: https://github.com/sch8906/intAbs. [59] Franjo Ivančić, I. Shlyakhter, Aarti Gupta, M.K. Ganai, V. Kahlon, Chao Wang, and Z. Yang. “Model Checking C Program Using F-Soft”. In: International Conference on Computer Design. 2005, pp. 297–308. [60] Daniel Jackson and David A. Ladd. “Semantic Di: A Tool for Summarizing the Eects of Modications”. In: International Conference on Software Maintenance. 1994, pp. 243–252. [61] Vilas Jagannath, Qingzhou Luo, and Darko Marinov. “Change-aware Preemption Prioritization”. In: International Symposium on Software Testing and Analysis. 2011, pp. 133–143. [62] Bertrand Jeannet and Antoine Miné. “Apron: A Library of Numerical Abstract Domains for Static Analysis”. In: International Conference on Computer Aided Verication. 2009, pp. 661–667. [63] Casper Svenning Jensen, Anders Møller, Veselin Raychev, Dimitar Dimitrov, and Martin T. Vechev. “Stateless model checking of event-driven applications”. In: ACM SIGPLAN Conference on Object Oriented Programming, Systems, Languages, and Applications. 2015, pp. 57–73. [64] Simon Holm Jensen, Peter A. Jonsson, and Anders Møller. “Remedying the eval that men do”. In: International Symposium on Software Testing and Analysis. 2012, pp. 34–44. 129 [65] Simon Holm Jensen, Magnus Madsen, and Anders Møller. “Modeling the HTML DOM and browser API in static analysis of JavaScript web applications”. In: ACM SIGSOFT Symposium on Foundations of Software Engineering. 2011, pp. 59–69. [66] Jetty bug 1187. https://jira.codejaus.org/browse/JETTY-1187. [67] S. Ji, B. Li, and P. Zhang. “Test Case Selection for Data Flow Based Regression Testing of BPEL Composite Services”. In: 2016 IEEE International Conference on Services Computing (SCC). June 2016, pp. 547–554.doi: 10.1109/SCC.2016.77. [68] Herbert Jordan, Bernhard Scholz, and Pavle Subotić. “Soué: On synthesis of program analyzers”. In: International Conference on Computer Aided Verication. Springer. 2016, pp. 422–430. [69] Saurabh Joshi, Shuvendu K. Lahiri, and Akash Lal. “Underspecied Harnesses and Interleaved Bugs”. In: ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages. 2012, pp. 19–30. [70] Corey Kallenberg and Rafal Wojtczuk. “Speed racer: Exploiting an Intel ash protection race condition”. In: Bromium Labs (January 6. 2015) URL: https://bromiumlabs. les. wordpress. com/2015/01/speed_racer_whitepaper. pdf (data obrashcheniya: 07.01. 2016) (2015). [71] John B. Kam and Jerey D. Ullman. “Monotone Data Flow Analysis Frameworks”. In: Acta Inf. 7 (1977), pp. 305–317. [72] Sepideh Khoshnood, Markus Kusano, and Chao Wang. “ConcBugAssist: Constraint Solving for Diagnosis and Repair of Concurrency Bugs”. In: International Symposium on Software Testing and Analysis. 2015, pp. 165–176. [73] M. Kifer and E. L. Lozinskii. “Implementing logic programs as a database system”. In: 1987 IEEE Third International Conference on Data Engineering. 1987, pp. 375–385. [74] Michael Kifer and Eliezer L. Lozinskii. “On Compile-Time Query Optimization in Deductive Databases by Means of Static Filtering”. In: ACM Trans. Database Syst. 15.3 (Sept. 1990), pp. 385–426. issn: 0362-5915.doi: 10.1145/88636.87121. [75] Micheal Kifer and Eliezer L. Lozinskii. “SYGRAF: Implementing Logic Programs in a Database Style”. In: IEEE Trans. Softw. Eng. 14.7 (July 1988), pp. 922–935.issn: 0098-5589.doi: 10.1109/32.42735. [76] Jonathan Kotker, Dorsa Sadigh, and Sanjit A. Seshia. “Timing Analysis of Interrupt-driven Programs Under Context Bounds”. In: International Conference on Formal Methods in Computer-Aided Design. 2011, pp. 81–90. [77] Daniel Kroening, Lihao Liang, Tom Melham, Peter Schrammel, and Michael Tautschnig. “Eective Verication of Low-level Software with Nested Interrupts”. In: Proceedings of the Design, Automation & Test in Europe Conference. 2015, pp. 229–234. [78] David Chenho Kung, Jerry Gao, Pei Hsia, Jeremy Lin, and Yasufumi Toyoshima. “Class rewall, test order, and regression testing of object-oriented programs”. In: JOOP 8.2 (1995), pp. 51–65. 130 [79] Markus J. Kusano and Chao Wang. “Thread-modular static analysis for relaxed memory models”. In: ACM SIGSOFT Symposium on Foundations of Software Engineering. 2017. [80] Markus Kusano and Chao Wang. “Assertion Guided Abstraction: A Cooperative Optimization for Dynamic Partial Order Reduction”. In: Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering. ASE ’14. Vasteras, Sweden: Association for Computing Machinery, 2014, pp. 175–186.isbn: 9781450330138.doi: 10.1145/2642937.2642998. [81] Markus Kusano and Chao Wang. “Flow-sensitive Composition of Thread-modular Abstract Interpretation”. In: ACM SIGSOFT Symposium on Foundations of Software Engineering. 2016, pp. 799–809. [82] Shuvendu K. Lahiri, Chris Hawblitzel, Ming Kawaguchi, and Henrique Rebêlo. “SYMDIFF: A Language-agnostic Semantic Di Tool for Imperative Programs”. In: International Conference on Computer Aided Verication. 2012, pp. 712–717. [83] Shuvendu K. Lahiri, Kenneth L. McMillan, Rahul Sharma, and Chris Hawblitzel. “Dierential Assertion Checking”. In: ACM SIGSOFT Symposium on Foundations of Software Engineering. 2013, pp. 345–355. [84] Monica S. Lam, John Whaley, V. Benjamin Livshits, Michael C. Martin, Dzintars Avots, Michael Carbin, and Christopher Unkel. “Context-sensitive Program Analysis As Database Queries”. In: ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. 2005, pp. 1–12. [85] Leslie Lamport. “Time, Clocks, and the Ordering of Events in a Distributed System”. In: Commun. ACM 21.7 (1978), pp. 558–565. [86] Owolabi Legunsen, Farah Hariri, August Shi, Yafeng Lu, Lingming Zhang, and Darko Marinov. “An Extensive Study of Static Regression Test Selection in Modern Software Evolution”. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. FSE 2016. Seattle, WA, USA: ACM, 2016, pp. 583–594.isbn: 978-1-4503-4218-6.doi: 10.1145/2950290.2950361. [87] Steen Lehnert. “A Taxonomy for Software Change Impact Analysis”. In: International Workshop on Principles of Software Evolution and Annual ERCIM Workshop on Software Evolution. 2011, pp. 41–50. [88] Guodong Li, Esben Andreasen, and Indradeep Ghosh. “SymJS: automatic symbolic testing of JavaScript web applications”. In: ACM SIGSOFT Symposium on Foundations of Software Engineering. 2014, pp. 449–459. [89] Benjamin Livshits, Manu Sridharan, Yannis Smaragdakis, Ondřej Lhoták, J. Nelson Amaral, Bor-Yuh Evan Chang, Samuel Z. Guyer, Uday P. Khedker, Anders Møller, and Dimitrios Vardoulakis. “In Defense of Soundiness: A Manifesto”. In: Commun. ACM 58.2 (Jan. 2015), pp. 44–46.issn: 0001-0782.doi: 10.1145/2644805. [90] V. Benjamin Livshits and Monica S. Lam. “Finding Security Vulnerabilities in Java Applications with Static Analysis”. In: USENIX Security Symposium. 2005, pp. 18–18. [91] LLVM bug 8441. http://llvm.org/bugs/show_bug.cgi?id=8441. 131 [92] Shan Lu, Soyeon Park, Eunsoo Seo, and Yuanyuan Zhou. “Learning from Mistakes: A Comprehensive Study on Real World Concurrency Bug Characteristics”. In:InternationalConference on Architectural Support for Programming Languages and Operating Systems. 2008, pp. 329–339. [93] Magnus Madsen, Benjamin Livshits, and Michael Fanning. “Practical static analysis of JavaScript applications in the presence of frameworks and libraries”. In: ACM SIGSOFT Symposium on Foundations of Software Engineering. 2013, pp. 499–509. [94] Magnus Madsen, Frank Tip, and Ondrej Lhoták. “Static analysis of event-driven Node.js JavaScript applications”. In: ACM SIGPLAN Conference on Object Oriented Programming, Systems, Languages, and Applications. 2015, pp. 505–519. [95] Nashat Mansour and Wael Statieh. “Regression Test Selection for C# Programs”. In: Adv. Soft. Eng. 2009 (Jan. 2009), 1:1–1:16.issn: 1687-8655.doi: 10.1155/2009/535708. [96] Paul Dan Marinescu and Cristian Cadar. “KATCH: High-coverage Testing of Software Patches”. In: ACM SIGSOFT Symposium on Foundations of Software Engineering. 2013, pp. 235–245. [97] Leo A. Meyerovich and V. Benjamin Livshits. “ConScript: Specifying and Enforcing Fine-Grained Security Policies for JavaScript in the Browser”. In: IEEE Symposium on Security and Privacy. 2010, pp. 481–496. [98] Antoine Miné. “Relational Thread-Modular Static Value Analysis by Abstract Interpretation”. In: International Conference on Verication, Model Checking, and Abstract Interpretation. 2014, pp. 39–58. [99] Antoine Miné. “Static analysis by abstract interpretation of sequential and multi-thread programs”. In: Proc. of the 10th School of Modelling and Verifying Parallel Processes. 2012, pp. 35–48. [100] Antoine Miné. “Static Analysis of Embedded Real-Time Concurrent Software with Dynamic Priorities”. In: Electr. Notes Theor. Comput. Sci. 331 (2017), pp. 3–39. [101] Antoine Miné. “Static Analysis of Run-Time Errors in Embedded Critical Parallel C Programs”. In: Programming Languages and Systems. 2011, pp. 398–418. [102] Antoine Miné. “Static Analysis of Run-Time Errors in Embedded Real-Time Parallel C Programs”. In: Logical Methods in Computer Science 8.1 (2012). [103] Mozilla. Firefox bug 61369. https://bugzilla.mozilla.org/show_bug.cgi?id=61369. [104] Mozilla. Firefox bug 756036. https://bugzilla.mozilla.org/show_bug.cgi?id=756036. [105] Mozilla. Firefox bug 763013. https://bugzilla.mozilla.org/show_bug.cgi?id=763013. [106] Madanlal Musuvathi, Shaz Qadeer, Thomas Ball, Gérard Basler, Piramanayagam Arumuga Nainar, and Iulian Neamtiu. “Finding and Reproducing Heisenbugs in Concurrent Programs”. In: USENIX Symposium on Operating Systems Design and Implementation. 2008, pp. 267–280. [107] Mayur Naik, Alex Aiken, and John Whaley. “Eective Static Race Detection for Java”. In: ACM SIGPLAN Conference on Programming Language Design and Implementation. 2006, pp. 308–319. 132 [108] Cuong Nguyen, Hiroaki Yoshida, Mukul R. Prasad, Indradeep Ghosh, and Koushik Sen. “Generating Succinct Test Cases Using Don’t Care Analysis”. In: IEEE International Conference on Software Testing, Verication and Validation. 2015, pp. 1–10. [109] Alessandro Orso, Nanjuan Shi, and Mary Jean Harrold. “Scaling Regression Testing to Large Software Systems”. In: Proceedings of the 12th ACM SIGSOFT Twelfth International Symposium on Foundations of Software Engineering. SIGSOFT ’04/FSE-12. Newport Beach, CA, USA: ACM, 2004, pp. 241–251.isbn: 1-58113-855-5.doi: 10.1145/1029894.1029928. [110] T. J. Ostrand and E. J. Weyuker. “Using data ow analysis for regression testing”. In: Proceedings of Sixth Annual Pacic Northwest Software Quality Conference. Sept. 1988, pp. 57–71. [111] Carlos Pacheco and Michael D. Ernst. “Randoop: feedback-directed random testing for Java”. In: ACM SIGPLAN Conference on Object Oriented Programming, Systems, Languages, and Applications. 2007, pp. 815–816. [112] Brandon Paulsen, Chungha Sung, Peter A. H. Peterson, and Chao Wang. “Debreach: Mitigating Compression Side Channels via Static Analysis and Transformation”. In: 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019, San Diego, CA, USA, November 11-15, 2019. IEEE, 2019, pp. 899–911.doi: 10.1109/ASE.2019.00088. [113] Suzette Person, Matthew B. Dwyer, Sebastian Elbaum, and Corina S. Pˇ asˇ areanu. “Dierential Symbolic Execution”. In: ACM SIGSOFT Symposium on Foundations of Software Engineering. 2008, pp. 226–237. [114] Raghu Ramakrishnan. “Magic Templates: A Spellbinding Approach to Logic Programs”. In: J. Log. Program. 11.3–4 (Aug. 1991), pp. 189–216.issn: 0743-1066.doi: 10.1016/0743-1066(91)90026-L. [115] G. Ramalingam. “Context-sensitive synchronization-sensitive analysis is undecidable”. In: ACM Trans. Program. Lang. Syst. 22.2 (2000), pp. 416–430. [116] John Regehr. “Random Testing of Interrupt-driven Software”. In: International Conference on Embedded Software. 2005, pp. 290–298. [117] John Regehr and Nathan Cooprider. “Interrupt Verication via Thread Verication”. In: Electron. Notes Theor. Comput. Sci. 174.9 (June 2007), pp. 139–150. [118] John Regehr, Alastair Reid, and Kirk Webb. “Eliminating stack overow by abstract interpretation”. In: ACM Trans. Embedded Comput. Syst. 4.4 (2005), pp. 751–778. [119] Xiaoxia Ren, Fenil Shah, Frank Tip, Barbara G Ryder, and Ophelia Chesley. “Chianti: a tool for change impact analysis of java programs”. In: ACM Sigplan Notices. Vol. 39. 10. ACM. 2004, pp. 432–448. [120] Gregor Richards, Christian Hammer, Brian Burg, and Jan Vitek. “The Eval That Men Do - A Large-Scale Study of the Use of Eval in JavaScript Applications”. In: European Conference on Object-Oriented Programming. 2011, pp. 52–78. 133 [121] Gregg Rothermel and Mary Jean Harrold. “A Safe, Ecient Algorithm for Regression Test Selection”. In: Proceedings of the Conference on Software Maintenance. ICSM ’93. Washington, DC, USA: IEEE Computer Society, 1993, pp. 358–367.isbn: 0-8186-4600-4.url: http://dl.acm.org/citation.cfm?id=645542.658172. [122] Gregg Rothermel, Mary Jean Harrold, and Jeinay Dedhia. “Regression test selection for C++ software”. In: Software Testing, Verication and Reliability 10.2 (2000), pp. 77–109. [123] Atanas Rountev, Ana Milanova, and Barbara G. Ryder. “Points-To Analysis for Java using Annotated Constraints”. In: ACM SIGPLAN Conference on Object Oriented Programming, Systems, Languages, and Applications. 2001, pp. 43–55. [124] Domenico Saccá and Carlo Zaniolo. “The Generalized Counting Method for Recursive Logic Queries”. In: Proceedings on International Conference on Database Theory. Rome, Italy: Springer-Verlag, 1986, pp. 31–53.isbn: 0387171878. [125] Domenico Saccà and Carlo Zaniolo. “On the Implementation of a Simple Class of Logic Queries for Databases”. In: Proceedings of the Fifth ACM SIGACT-SIGMOD Symposium on Principles of Database Systems. PODS ’86. Cambridge, Massachusetts, USA: Association for Computing Machinery, 1985, pp. 16–23.isbn: 0897911792.doi: 10.1145/6012.6013. [126] Prateek Saxena, Devdatta Akhawe, Steve Hanna, Feng Mao, Stephen McCamant, and Dawn Song. “A symbolic execution framework for javascript”. In: IEEE Symposium on Security and Privacy. 2010, pp. 513–528. [127] Bastian Schlich, Thomas Noll, Jörg Brauer, and Lucas Brutschy. “Reduction of interrupt handler executions for model checking embedded software”. In: Haifa Verication Conference. 2009, pp. 5–20. [128] Martin D. Schwarz, Helmut Seidl, Vesal Vojdani, Peter Lammich, and Markus Müller-Olm. “Static Analysis of Interrupt-driven Programs Synchronized via the Priority Ceiling Protocol”. In: ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages. 2011, pp. 93–104. [129] Koushik Sen, Swaroop Kalasapur, Tasneem G. Brutch, and Simon Gibbs. “Jalangi: a selective record-replay and dynamic analysis framework for JavaScript”. In: ACM SIGSOFT Symposium on Foundations of Software Engineering. 2013, pp. 488–498. [130] Dennis Shasha and Marc Snir. “Ecient and Correct Execution of Parallel Programs that Share Memory”. In: ACM Trans. Program. Lang. Syst. 10.2 (1988), pp. 282–312. [131] Mats Skoglund and Per Runeson. “Improving class rewall regression test selection by removing the class rewall”. In: International journal of software engineering and knowledge engineering 17.03 (2007), pp. 359–378. [132] Manu Sridharan, Julian Dolby, Satish Chandra, Max Schäfer, and Frank Tip. “Correlation tracking for points-to analysis of JavaScript”. In: European Conference on Object-Oriented Programming. 2012, pp. 435–458. 134 [133] Chungha Sung, Markus Kusano, Nishant Sinha, and Chao Wang. “Static DOM Event Dependency Analysis for Testing Web Applications”. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. FSE 2016. Seattle, WA, USA: ACM, 2016, pp. 447–459.isbn: 9781450342186.doi: 10.1145/2950290.2950292. [134] Chungha Sung, Markus Kusano, and Chao Wang. “Modular Verication of Interrupt-Driven Software”. In: Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. ASE 2017. Urbana-Champaign, IL, USA: IEEE Press, 2017, pp. 206–216.isbn: 9781538626849. [135] Chungha Sung, Shuvendu K. Lahiri, Constantin Enea, and Chao Wang. “Datalog-Based Scalable Semantic Ding of Concurrent Programs”. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. ASE 2018. Montpellier, France: ACM, 2018, pp. 656–666.isbn: 9781450359375.doi: 10.1145/3238147.3238211. [136] A-B Taha, Stephen M Thebaut, and S-S Liu. “An approach to software fault localization and revalidation based on incremental data ow analysis”. In: Computer Software and Applications Conference, 1989. COMPSAC 89., Proceedings of the 13th Annual International. IEEE. 1989, pp. 527–534. [137] K. Tuncay Tekle and Yanhong A. Liu. “More Ecient Datalog Queries: Subsumptive Tabling Beats Magic Sets”. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. SIGMOD ’11. Athens, Greece: Association for Computing Machinery, 2011, pp. 661–672.isbn: 9781450306614.doi: 10.1145/1989323.1989393. [138] K. Tuncay Tekle and Yanhong A. Liu. “Precise Complexity Analysis for Ecient Datalog Queries”. In: Proceedings of the 12th International ACM SIGPLAN Symposium on Principles and Practice of Declarative Programming. PPDP ’10. Hagenberg, Austria: Association for Computing Machinery, 2010, pp. 35–44.isbn: 9781450301329.doi: 10.1145/1836089.1836094. [139] Therac-25. https://en.wikipedia.org/wiki/Therac-25. [140] Thilo Vórtler, Benny H’"ockner, Petra Hofstedt, and Thomas Klotz. “Formal Verication of Software for the Contiki Operating System Considering Interrupts”. In: IEEE International Symposium on Design and Diagnostics of Electronic Circuits & Systems. 2015, pp. 295–298. [141] Chao Wang and Kevin Hoang. “Precisely Deciding Control State Reachability in Concurrent Traces with Limited Observability”. In: International Conference on Verication, Model Checking, and Abstract Interpretation. 2014, pp. 376–394. [142] Chao Wang, Yu Yang, Aarti Gupta, and Ganesh Gopalakrishnan. “Dynamic Model Checking with Property Driven Pruning to Detect Race Conditions”. In: International Symposium on Automated Technology for Verication and Analysis. 2008, pp. 126–140. [143] Chao Wang, Z. Yang, Franjo Ivancic, and Aarti Gupta. “Disjunctive Image Computation for Emebedded Software Verication”. In: Proceedings of the Design, Automation & Test in Europe Conference. 2006. 135 [144] Chao Wang, Zijiang Yang, Vineet Kahlon, and Aarti Gupta. “Peephole Partial Order Reduction”. In: International Conference on Tools and Algorithms for Construction and Analysis of Systems. 2008, pp. 382–396. [145] Jingbo Wang, Chungha Sung, and Chao Wang. “Mitigating Power Side Channels during Compilation”. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ESEC/FSE 2019. Tallinn, Estonia: ACM, 2019, pp. 590–601.isbn: 9781450355728.doi: 10.1145/3338906.3338913. [146] Yu Wang, Junjing Shi, Linzhang Wang, Jianhua Zhao, and Xuandong Li. “Detecting Data Races in Interrupt-Driven Programs Based on Static Analysis and Dynamic Simulation”. In: Asia-Pacic Symposium on Internetware. 2015, pp. 199–202. [147] Yu Wang, Linzhang Wang, Tingting Yu, Jianhua Zhao, and Xuandong Li. “Automatic detection and validation of race conditions in interrupt-driven embedded software”. In: International Symposium on Software Testing and Analysis. 2017, pp. 113–124. [148] Shiyi Wei and Barbara G Ryder. “State-sensitive points-to analysis for the dynamic behavior of JavaScript objects”. In: European Conference on Object-Oriented Programming. 2014, pp. 1–26. [149] John Whaley and Monica S. Lam. “Cloning-based Context-sensitive Pointer Alias Analysis Using Binary Decision Diagrams”. In: ACM SIGPLAN Conference on Programming Language Design and Implementation. 2004, pp. 131–144. [150] Xueguang Wu, Liqian Chen, Antoine Miné, Wei Dong, and Ji Wang. “Static Analysis of Runtime Errors in Interrupt-Driven Programs via Sequentialization”. In: ACM Trans. Embedded Comput. Syst. 15.4 (2016), 70:1–70:26. [151] Xueguang Wu, Yanjun Wen, Liqian Chen, Wei Dong, and Ji Wang. “Data Race Detection for Interrupt-Driven Programs via Bounded Model Checking”. In: International Conference on Software Security and Reliability. 2013, pp. 204–210. [152] Yu Yang, Xiaofang Chen, and Ganesh Gopalakrishnan. Inspect: A Runtime Model Checker for Multithreaded C Programs. Tech. rep. University of Utah, 2008. [153] Yu Yang, Xiaofang Chen, Ganesh Gopalakrishnan, and Chao Wang. “Automatic Discovery of Transition Symmetry in Multithreaded Programs Using Dynamic Analysis”. In: International SPIN Workshop on Model Checking Software. 2009, pp. 279–295. [154] Z. Yang, Chao Wang, Franjo Ivančić, and Aarti Gupta. “Mixed Symbolic Representations for Model Checking Software Programs”. In: International Conference on Formal Methods and Models for Codesign. 2006, pp. 17–24. [155] Zuoning Yin, Ding Yuan, Yuanyuan Zhou, Shankar Pasupathy, and Lakshmi Bairavasundaram. “How Do Fixes Become Bugs?” In: ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering. 2011, pp. 26–36. [156] Jie Yu and Satish Narayanasamy. “A Case for an Interleaving Constrained Shared-memory Multi-processor”. In: International Symposium on Computer Architecture. 2009, pp. 325–336. 136 [157] Tingting Yu, Witawas Srisa-an, and Gregg Rothermel. “SimRT: An Automated Framework to Support Regression Testing for Data Races”. In: International Conference on Software Engineering. 2014, pp. 48–59. [158] Naling Zhang, Markus Kusano, and Chao Wang. “Dynamic Partial Order Reduction for Relaxed Memory Models”. In: SIGPLAN Not. 50.6 (June 2015), pp. 250–259.issn: 0362-1340.doi: 10.1145/2813885.2737956. [159] Naling Zhang, Markus Kusano, and Chao Wang. “Dynamic partial order reduction for relaxed memory models”. In: ACM SIGPLAN Conference on Programming Language Design and Implementation. 2015, pp. 250–259. [160] Xin Zhang, Ravi Mangal, Radu Grigore, Mayur Naik, and Hongseok Yang. “On abstraction renement for program analyses in Datalog”. In: ACM SIGPLAN Conference on Programming Language Design and Implementation. 2014, pp. 239–248. [161] Yunhui Zheng, Tao Bao, and Xiangyu Zhang. “Statically Locating Web Application Bugs Caused by Asynchronous Calls”. In: Proceedings of the 20th International Conference on World Wide Web. WWW ’11. Hyderabad, India: Association for Computing Machinery, 2011, pp. 805–814.isbn: 9781450306324.doi: 10.1145/1963405.1963517. 137
Abstract (if available)
Abstract
Concurrent software is ubiquitous in modern computer systems: it is used everywhere from small computing devices such as smartphones to large systems such as clouds. However, writing correct concurrent software is notoriously difficult as the number of program states to reason about can be extremely large due to interleavings between concurrent events. Moreover, testing and verifying concurrent software by enumerating the enormous number of interleavings is unrealistic. Thus, automated reasoning techniques for concurrent software that rely on exhaustive enumeration of interleavings often lack scalability, and those that do not rely on exhaustive enumeration often lack accuracy. This dissertation shows that substantial improvements of the existing testing and verification techniques for concurrent software can be made by constraint-based analysis of interleavings between concurrent events. The key insight is that an analysis of feasible/infeasible interleavings can be integrated into the existing techniques to achieve more efficient reasoning. Following the insight, I designed and implemented a constraint-based program analysis for the interleavings based on a declarative program analysis framework. With the analysis, I improved three testing and verification techniques. The first technique is a modular abstract interpretation for interrupt-driven software, which drastically improved its accuracy with the analysis of infeasible interleavings between interrupt handlers. The second technique is a semantic diffing method for concurrent programs, whose scalability can be signficantly enhanced with the help of the analysis of feasible interleavings. The third technique is a systematic web application testing method, for which the analysis of feasible interleavings between events can help achieve a significantly higher testing coverage. We conducted experimental evaluation of all three techniques, and our result shows that the proposed constraint-based analysis can improve their effectiveness in terms of both scalability and accuracy.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Automatic detection and optimization of energy optimizable UIs in Android applications using program analysis
PDF
Automated repair of presentation failures in Web applications using search-based techniques
PDF
Techniques for methodically exploring software development alternatives
PDF
Analysis of embedded software architecture with precedent dependent aperiodic tasks
PDF
Automatic test generation system for software
PDF
Detecting anomalies in event-based systems through static analysis
PDF
Assessing software maintainability in systems by leveraging fuzzy methods and linguistic analysis
PDF
Side-channel security enabled by program analysis and synthesis
PDF
Detecting SQL antipatterns in mobile applications
PDF
Deriving component‐level behavior models from scenario‐based requirements
PDF
Efficient and effective techniques for large-scale multi-agent path finding
PDF
Reducing user-perceived latency in mobile applications via prefetching and caching
PDF
Energy optimization of mobile applications
PDF
Software quality understanding by analysis of abundant data (SQUAAD): towards better understanding of life cycle software qualities
PDF
Differential verification of deep neural networks
PDF
Improving binary program analysis to enhance the security of modern software systems
PDF
Data-driven and logic-based analysis of learning-enabled cyber-physical systems
PDF
Robot trajectory generation and placement under motion constraints
PDF
Static program analyses for WebAssembly
PDF
Supporting faithful and safe live malware analysis
Asset Metadata
Creator
Sung, Chungha
(author)
Core Title
Constraint-based program analysis for concurrent software
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
03/24/2021
Defense Date
03/22/2021
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
automated reasoning for concurrent software,concurrent programs,concurrent software,constraint-based program analysis,Datalog,interrupt-driven programs,multi-threaded programs,OAI-PMH Harvest,program analysis,software testing,software verification,web applications
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Wang, Chao (
committee chair
), Gupta, Sandeep (
committee member
), Medvidović, Nenad (
committee member
)
Creator Email
chunghas@usc.edu,sch8906@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-431587
Unique identifier
UC11668604
Identifier
etd-SungChungh-9348.pdf (filename),usctheses-c89-431587 (legacy record id)
Legacy Identifier
etd-SungChungh-9348.pdf
Dmrecord
431587
Document Type
Dissertation
Rights
Sung, Chungha
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
automated reasoning for concurrent software
concurrent programs
concurrent software
constraint-based program analysis
Datalog
interrupt-driven programs
multi-threaded programs
program analysis
software testing
software verification
web applications