Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Side-channel security enabled by program analysis and synthesis
(USC Thesis Other)
Side-channel security enabled by program analysis and synthesis
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Side-Channel Security Enabled by Program Analysis and Synthesis by Jingbo Wang A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) December 2023 Copyright 2023 Jingbo Wang Acknowledgements Embarking on this PhD journey, I initially grappled with self-doubt and apprehension, questioning my ability to complete this academic voyage. As the journey unfolded, I realized it was one of the best decisions I have ever made, blending enlightenment, challenges, anxiety, exhilaration, and inspiration. Throughout, I have been immensely fortunate to benet from the unwavering support of numerous great people, and without them, I would not stand where I am today. First of all, I would like to thank my advisor Chao Wang who has supported and guided me along this journey. In my rst year, as I was not familiar with the technical concepts, Chao was quite patient to teach me and guide me through both the high-level picture and low-level details. Despite the hurdles I faced in comprehending the issues, delineating their root causes, or devising solutions, he never wavered in his faith in me. As time went by, I started to make progress on my own and I truly enjoyed meeting with him. While he consistently posed challenging questions and critiqued my ideas, these rigorous discussions enhanced my grasp on the subjects, clarifying murky areas. More importantly, I am deeply appreciative of his advising style that emphasizes encouragement and fosters autonomy. Whenever my research reached an impasse, he never got discouraged. Instead, he kept working with me to reect on the problems and encouraging me to explore from dierent perspectives. It seems like that he always had a black magic; after meeting with him, I became more condent and enthusiastic about the problems. In addition, he always guided us to think and work independently from which I beneted greatly. I cannot thank him enough for what he did for me, and I will carry this advising style through my future career. ii I would also like to thank all my dissertation committee members: Nenad Medvidovic, Jyotirmoy Deshmukh, Mukund Raghothaman, and Pierluigi Nuzzo. I’d like to thank them for serving on my committee, providing insightful feedback on my thesis and presentations, and proposing incisive and thoughtful questions, which have been invaluable in elevating the quality of my research outputs. Here, I want to give special thanks to Neno, who not only provided me with suggestions from a comprehensive perspective, but also pointed out the gap between me and the readers. These thought-provoking feedbacks greatly helped me reshape my research statement and inspired me to think from a holistic view. Meanwhile, I would also like to thank Mukund for those insightful Zoom discussions, providing guidance on potential research directions, and oering invaluable advice during my job search. His expertise and great suggestions have tremendously helped me. Next, I would especially like to thank my other research collaborators: Aarti Gupta, Michael Emmi, Liana Hadarean, and Peixuan Li. My collaboration with Aarti, whom I encountered at Princeton, was both delightful and immensely fruitful. She could quickly pinpoint the part I overlooked and challenged me to think from a dierent perspective. She provided a wealth of technical expertise and theoretical insights that greatly enriched our work together. I met Michael, Liana, and Peixuan during my internship at Amazon. Working with them was an unforgettable memory as they were super responsive and knowledgable. That internship stands out as one of my most cherished summer experiences, a period during which I made lasting friendships, and gained some insights about how to scale the research prototype into practice. Additionally, my gratitude extends to Antonio Filieri, Linghui Luo, Subarno Banerjee, Nicolás Rosner, Omer Tripp, Ben Liblit, and Martin Schäf for their invaluable mentorship and guidance during my internship. Throughout my PhD journey at USC, I was fortunate to be in the company of exceptional peers who enriched my experience and added memorable moments to this academic voyage. I extend my sincere gratitude to my labmates: Brandon Paulsen, Chungha Sung, Daniel Guo, Meng Wu, Zunchen Huang, Yannan Li, Chenggang Li, Tianqin Zhao, Brian Hyeongseok Kim and colleagues from our neighboring iii labs Yixue Zhao, Duc Le, Adriana Seja, Paul Chiou, Nikola Lukic, Daye Nam, Marcelo Schmitt Laser, Yinjun Lyu, Mian Wan, Zhaoxu Zhang, Xin Qin, Yuan Xia, Amirmohammad Nazari, Yifei Huang, Qinyi Luo, Sulagna Mukherjee, Satyaki Das, and Ali Alotaibi. They helped make my PhD enjoyable experience. Chungha has provided endless support and guidance for both my research and life, especially during my rst few years. Brandon, Daniel, Meng and Yannan were a delight to work with. I’d also like to thank many close friends outside USC. To Jie Wang and Jie Hu, it has been a decade since we knew each other and I always miss the old times we spent together and you two are like my "siblings" who always provide unconditional caring, and unwavering support for me. To Ningyu Ding, and Hengjie Yang, the company from you have enriched my PhD life. I was also lucky to meet many wonderful friends who have provided enormous mentoring and encouragement throughout my PhD life: Suguman Bansal, Yu Huang, Zitong He, Hengda He, Soneya Binta Hossain, Haojian Jin, Lingjie Liu, Yi Li, Fan Lai, Shiqing Ma, Manuel Rigger, Haifeng Xu, Doris Xin, Wenxi Wang, Yanjun Wang, Shurui Zhou, Hengshuang Zhao, Tianyi Zhang, and Yizhou Zhang. Especially, I am grateful to Yizhou; the discussions with him have always been insightful and helped me grow. Finally, I want to acknowledge my staunchest advocates: my parents and Hanhan. I cannot persist this far without their long-standing and unconditional support and company. Whenever I struggled with my research, they consistently stood by me, urging me to persevere. Rather than mere success, their true wish for me was to nd joy in my endeavors, encouraging me to invest my eorts in pursuits I’m passionate about. I am forever grateful to them. iv TableofContents Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Limitations of Current Program Analysis and Synthesis Tools . . . . . . . . . . . . . . . . 3 1.2 Hypotheses and Insights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3.1 Detecting Side-channel Leaks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3.2 Mitigating Side-channel Leaks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3.3 Verifying the Correctness of Mitigated Programs . . . . . . . . . . . . . . . . . . . 8 1.3.4 Quantifying Side-channel Leaks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.4 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Chapter 2: Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1 The Leakage Model of Power Side-Channels . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 Masking Countermeasure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3 The Data Dependency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4 The Type Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.5.1 Detecting Power Side-channel Leaks . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.5.2 Mitigating Power Side-channel Leaks . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.5.3 Other Side-channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.5.4 Secure Compilation for Side-channel Security . . . . . . . . . . . . . . . . . . . . . 15 2.5.5 Learning and Synthesizing Analyzers from Examples . . . . . . . . . . . . . . . . . 15 2.5.6 Optimizing Program Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Chapter 3: Detecting Side-channel Leaks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.1.1 Type Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.1.2 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2.1 Problem Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 v 3.2.2 Feature Synthesis and Rule Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2.3 Proving Soundness of Learned Analysis Rules . . . . . . . . . . . . . . . . . . . . . 26 3.2.4 Overall Architecture of theGPS System . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3 Learning the Type-inference Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3.1 The Decision Tree Learning Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.3.2 The On-Demand Feature Synthesis Algorithm . . . . . . . . . . . . . . . . . . . . . 29 3.4 Proving the Type-inference Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.4.1 Representation of the Learned Rule () . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.4.2 Representation of the Knowledge Base (KB) . . . . . . . . . . . . . . . . . . . . . 33 3.4.3 Proving the Soundness of UsingKB . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.4.4 Generating Abstract Counter-Examples . . . . . . . . . . . . . . . . . . . . . . . . 38 3.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.5.1 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.5.2 Performance and Accuracy of the Learned Analyzer . . . . . . . . . . . . . . . . . 40 3.5.3 Eectiveness of Rule Induction and Soundness Verication . . . . . . . . . . . . . 42 3.5.4 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Chapter 4: Mitigating Side-channel Leaks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.1.1 The HW and HD Leaks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.1.2 Identifying the HD Leaks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.1.3 Mitigating the HD Leaks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.1.4 Leaks in High-order Masking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.2 Type-based Static Leak Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.2.1 The Type Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.2.2 Datalog based Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.2.3 Basic Type Inference Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.2.4 Inference Rules to Improve Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.2.5 Detecting HD-sensitive Pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.3 Mitigation during Code Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.3.1 HandlingSEN_HD D Pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.3.2 HandlingSEN_HD S Pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.4 Domain-specic Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.4.1 Leveraging the Backend Information . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.4.2 Pre-computing Datalog Facts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.4.3 Ecient Encoding of Datalog Relations . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.5.1 Leak Detection Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.5.2 Eectiveness of Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.5.3 Leak Mitigation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.5.4 Comparison to High-Order Masking . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.5.5 Threat to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Chapter 5: Verifying the Correctness of Mitigated Programs . . . . . . . . . . . . . . . . . . . . . . 74 5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 vi 5.1.2 Limitations of Existing Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.1.3 How Our Baseline Method Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.1.4 Our Learning-based Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.2 Our Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.2.1 Top-level Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.2.2 Domain-Specic Language (DSL) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.2.3 Abstract Syntax Tree (AST) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.2.3.1 Constructing an AST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.2.3.2 Modifying an AST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.2.4 Generating Invariant Candidates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 5.2.4.1 Syntactic Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 5.2.4.2 Verication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.3 LR based Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.3.1 Constructing the UNSAT Core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.3.1.1 The Mirror Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 5.3.1.2 The UNSAT Core Example . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.3.2 Non-chronological Backtracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.3.3 The Strengthening Predicate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.4 RL based Prioritization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.4.1 The PolicyP RL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.4.2 The Reward Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.4.3 Generating More Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 5.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5.5.1 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.5.2 Results for Evaluating the Eectiveness . . . . . . . . . . . . . . . . . . . . . . . . 99 5.5.3 Results for Evaluating the Optimizations . . . . . . . . . . . . . . . . . . . . . . . . 100 5.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 Chapter 6: Quantifying Side-channel Leaks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.1.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.1.2 Limitations of ProbLog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6.1.3 Advantages of Our Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6.1.4 Another Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 6.2 Preliminary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 6.2.1 Datalog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 6.2.2 Semi-naive Evaluation of Soué . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 6.3 Our Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 6.3.1 High-level Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 6.3.2 Probabilistic Interval Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 6.3.3 Handling Cycles in the Derivation Graph . . . . . . . . . . . . . . . . . . . . . . . 119 6.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 6.4.1 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 6.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Chapter 7: Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 vii 7.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 7.2.1 Side-channel security for cyber-physical systems . . . . . . . . . . . . . . . . . . . 125 7.2.2 Quantitative analysis of probabilistic programs . . . . . . . . . . . . . . . . . . . . 125 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 viii ListofTables 3.1 Statistics of the benchmark programs inD test . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.2 Comparing the learned analyzer with the tool from [210]. . . . . . . . . . . . . . . . . . . 41 3.3 Comparing the learned analyzer with SCInfer [227]. . . . . . . . . . . . . . . . . . . . . . . 42 3.4 Decision Tree Learning with Feature Synthesis (Dierent Iterations with #AST = 531). . . 43 4.1 Truth table showing that (1) there is no HW leak int1,t2,t3 but (2) there is an HD leak when t1,t2 share a register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.2 Statistics of the benchmark programs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.3 Results of type-based HD leak detection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.4 Results of quantifying impact of optimizations. . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.5 Results of our HD leak mitigation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.6 Comparison with order-d masking techniques [20]. . . . . . . . . . . . . . . . . . . . . . . 72 5.1 The list of relational verication benchmarks. . . . . . . . . . . . . . . . . . . . . . . . . . 98 5.2 Comparing the performance of our method (Code2RelInv) and the existing method Code2Inv [193]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.3 Comparing the number of invariant candidates explored by our method with dierent optimizations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 6.1 Running time and results of QDatalog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 ix ListofFigures 3.1 The overall ow ofGPS, our data-driven synthesis method. . . . . . . . . . . . . . . . . . 18 3.2 The program on the left is a perfectly masked function from MAC-Keccak. The decision tree on the right represents the static analyzer that the user would like to synthesize. Here, x is a program variable, whose type is being computed;L andR are its left and right operands, respectively, andf(x) is a synthesized feature shown in Fig. 3.3a (represented by recursive Datalog program). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.3 Comparing the rules learned byGPS (Fig. 3.3a) to manually crafted rules from SCInfer (Fig. 3.3b). Observe that the learned rules aresound, i.e., every variable which potentially leaks information is assigned the distribution type UKD, while still managing to draw non-trivial conclusions such asRUD(b4). The learned rules (R 2 —R 8 ) in Fig. 3.3a are used to dene the new featuref(x) in Fig. 3.2 (b). . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.4 The classier of Fig. 3.4b is learned only using the features in Fig. 3.4a. Because of the limited expressive power of these features, the learned analysis necessarily misclassies either b4 or n1. Fig. 3.4c denotes the candidate analyzer produced after one round of feature synthesis. The blue paths corresponds to the rule RUD(x) XOR(x)^XOR(R)^RUD(L)^:f(x)^LC(x;L)^RC(x;R). Unfortunately, even though this analysis (Fig. 3.4c) achieves 100% training accuracy, the leaf nodes highlighted in red correspond to unsound predictions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.5 The table denotes the abstract counter-examples produced during the soundness verication of the candidate analyzer shown in Fig. 3.4c. . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.6 It shows candidate analysis learned after one round of feedback from the soundness verier. The leaves shown in green and red correspond to sound and unsound analysis rules respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.7 Syntax of the DSL for synthesizing new features. . . . . . . . . . . . . . . . . . . . . . . . . 29 3.8 Proof rules for propositional logic, to simplify the logic formula and deduce Boolean constants (true andfalse). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.9 ExampleAST from which is learned. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 x 3.10 Proof rules for distribution types, gathered from prior works [227, 77, 20]. Here,v denotes the type of variablex,and is of the following types: UKD,SID andRUD. NOUKD denotes the secure type (eitherRUD orSID). All the predened relations inKB are the same as in. . 34 4.1 Overview of our secure compilation method . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.2 Implementations of an XOR computation in the presence of HW and HD power side-channel leaks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.3 The assembly code before and after mitigation. . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.4 Second-order masking of multiplication in a nite eld, and the LLVM-generated x86 assembly code of Line 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.5 The remaining inference rules used in our type system (in addition toRule 14 ). . . . . . . 61 4.6 Code snippet from the Byte Masked AES [225]. . . . . . . . . . . . . . . . . . . . . . . . . . 65 5.1 Code2RelInv: Our invariant synthesis method. . . . . . . . . . . . . . . . . . . . . . . . . 76 5.2 Given two programsP 1 andP 2 , we merge them into a single programP to execute the instructions in lockstep. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.3 Constructing the next AST by modifying the current AST. Thei-th candidate shown in (a), with and without the conict predicate c . The (i + 1)-th candidate shown in (b), with and without c -based pruning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.4 The DSL for relational invariants, wherec is a set of constants,var is a set of variables, and A is a set of arrays. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.5 Step-by-step construction of invariant candidate. . . . . . . . . . . . . . . . . . . . . . . . . 85 5.6 Illustrating the use ofP RL in our method. . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 xi Abstract The objective of my dissertation research is to develop rigorous methods and analysis tools for improving the security of software systems. My primary focus is on an emerging class of security threats termed as side- channel attacks. During a side-channel attack, the adversary relies on exploiting statistical dependencies between the secret data (e.g., passwords or encryption keys) and seemingly unrelated non-functional properties (e.g., power consumption or execution time) of the computer. In particular, power side-channel leaks are caused by statistical dependencies instead of syntactic or semantic dependencies between sources and sinks; thus, existing techniques that focus primarily on classic information-ow security (e.g., taint analysis) would not work. I have designed and implemented an automated framework leveraging program analysis and synthesis to help detect and mitigate statistical dependencies. First, I developed a set of type inference rules to capture and detect these dependencies, and then a set of transformation-based methods to mitigate them. Second, to adapt these type inference rules to constantly evolving program characteristics, I proposed a data-driven method for learning provably sound side-channel analysis rules from annotated programs. Third, to ensure the correctness of the mitigation, I developed new methods to help prove the equivalence of the original and mitigated programs. Finally, I developed an extension of the side-channel analysis framework, to add the capability of quantifying the security risk. Experimental evaluations demonstrated that the eciency and precision of these methods in detecting and eliminating side-channel related statistical dependencies, which in turn leads to more secure software for critical applications. xii Chapter1 Introduction The ubiquity of software across diverse facets of society has intensied the ramications of information breaches. Research has indicated that side-channel attack is one of the most covert and potent forms of information leakage. Consider cryptography, which underpins a multitude of security protocols employed by a plethora of applications. While they provide robust theoretical assurances, real-world cryptosystems can be undermined by side-channel attacks. These attacks rely on exploiting statistical dependencies between the secret data and seemingly unrelated non-functional properties (e.g., power consumption or execution time) of the computer [121, 52, 145, 59, 169, 144, 44, 231, 201, 112]. Among dierent side-channel leaks, the power side channel is notably ecient in extracting condential information. Recent research indicates that power consumption can be quantied by timing and frequency information, eliminating the necessity for power measurement instruments [215]. This evolution makes condential information from power side channels even more attainable for malicious entities, even without direct access to the targeted device. For example, if the power consumption of an encryption device depends on the secret key, techniques such as dierential power analysis (DPA) may be used to perform attacks reliably [120, 59, 44, 143, 146]. This motivates the question: how can we ensure side-channel security? Before answering this research question, I rst dene the threat model. I assume the attacker has access to the software code, but not 1 the secret data, and the attacker’s goal is to gain information about the secret data. The attacker may measure the power consumption of a device that executes the software, at the granularity of each machine instruction. A set of measurement traces is aggregated to perform statistical analysis, e.g., as in DPA attacks. In mitigation, my goal is to eliminate the statistical dependencies between secret data and the (aggregated) measurement data. LetP be the program under attack and the triplet (x;k;r) be the input: setsx,k, andr consist ofpublic, secret, and random (mask) variables, respectively. Letx,k 1 ,k 2 , andr be valuations of these input variables. Then, t (P;x;k 1 ;r) denotes, at time stept, the power consumption of a device executing programP under inputx,k 1 , andr. Similarly, t (P;x;k 2 ;r) denotes the power consumption of the device executingP under inputx,k 2 , andr. Between stepst andt + 1, one instruction inP is executed. P has a leak if there aret,x,k 1 , andk 2 such that the distribution of t (P;x;k 1 ;r) diers from that of t (P;x;k 2 ;r). Let random variables inr be uniformly distributed in the domainR, and let the probability of eachr2R bePr(r), I expect8t;x;k 1 ;k 2 ; X r2R t (P;x;k 1 ;r)Pr(r) = X r2R t (P;x;k 2 ;r)Pr(r) (1.1) Statistical dependencies exist between the (aggregated) measurement and the secret data, if Formula 1.1 does not hold. While program analysis and synthesis are promising techniques for detecting and mitigating power side-channel attacks, e.g., by checking and then automatically removing statistical dependencies in software code, existing methods and tools for program analysis and synthesis fall short in terms of both scalability and accuracy. That is, they are eitherfastbutextremelyinaccurate [20, 30, 148] oraccuratebut extremely slow [228, 78]. This dissertation aims to address these challenges by developing scalable, accurate algorithms that can detect and mitigate power side-channel vulnerabilities, and then verify the security and correctness of the mitigated programs. 2 1.1 LimitationsofCurrentProgramAnalysisandSynthesisTools I now delve deeper into the limitations of current program analysis and synthesis tools, as understanding these limitations provides insights for my method. As previously mentioned, current tools fall short in ensuring side-channel security due to their lack of accuracy and scalability. In this section, I elucidate the reasons behind these shortcomings. First, when it comes to detecting side-channel leaks, traditional approaches based on type inference lack accuracy. Although these methods can quickly prove if a computation is free of side-channel leaks — specically when the result is syntactically independent of the secret data — they are predicated on the idea that if the result is syntactically independent of the secret, it is also statistically independent. Unfortunately, this syntactic type inference can be inaccurate, often leading to false positives. While recent research has introduced renements to these syntactical inference rules [20, 35, 76], these adjustments are either computationally expensive or fail to account for non-linear operators. In contrast, another approach to detecting leaks utilizes model counting [228, 78, 79]. While these methods are accurate, they are considerably less scalable. To determine if a computation is statistically independent of the secret, these methods count the satisfying solutions of logical formulas for varying secret keys. However, the size of the logical formulas they must construct is exponential in the number of random variables. This inherent complexity makes model counting exceptionally slow and not scalable for large-scale applications. Crafting static analyzers manually to strike a balance between accuracy and scalability is challenging. Even for domain experts, this process can be labor-intensive, error-prone, and may lead to suboptimal implementations. Additionally, while analysis rules might be nely tuned to an initial code corpus, they might not adapt to evolving characteristics of target programs, potentially becoming less eective over time. 3 Second, when it comes to mitigating leaks, manually repairing software code to eradicate side-channel leaks is both labor-intensive and susceptible to errors. Hence, there is a signicant interest in automating the mitigation of side-channel leaks [5, 94, 77, 21, 20]. Traditional strategies [77, 20] mitigate side-channel leaks by masking secret data with random variables, thereby severing statistical dependencies between secret data and side-channel emissions. However, these strategies imposed too many security constraints and are notscalable in practice. Moreover, my work in Chapter 4 has shown that, despite being masked, these mitigated programs are still vulnerable to a type of power side-channel leaks, introduced by the code generation modules inside modern compilers. 1.2 HypothesesandInsights Given these identied limitations, I posit four hypotheses aimed at enhancing the accuracy and scalability of existing tools in detecting and mitigating side-channel vulnerabilities. I start with the hypotheses and the underlying insights specic to leak detection. Hypothesis1 Through data-driven techniques, program analyses can achieve better accuracy – scala- bility trade-os in detecting leaks and dynamically adapt to the distribution of target programs. Insight 1 Data driven synthesis learns the unique characteristics of side-channel leaks inside the software code, optimizing the program analyses to address accuracy–scalability challenges. Hypothesis 1 proposes a data-driven method for synthesizing static analyses to detect side-channel information leaks in cryptographic software. To the best of our knowledge, existing methods are either fast but extremely inaccurate (e.g., type systems) or accurate but extremely slow (e.g., model-counting SMT solvers). Our key observation is that, to overcome the accuracy–scalability bottlenecks, we must leverage theuniquecharacteristicsofside-channelinformationleaks inside the software code to optimize the program analysis and synthesis methods, as described byInsight1. Thus, I propose to automate the optimization process using a synergistic combination of rigorous logical-reasoning and machine-learning techniques. 4 Hypothesis2 By focusing mitigation eorts on the register allocation module, it is possible to eliminate the side-channel leaks introduced by compilation procedures, achieving scalability with little run-time overhead. Insight2.1 When two masked and hence desensitized values are put into the same register, the masking countermeasure may be removed accidentally, and thus leak the secret. Insight2.2 Mitigation can leverage the existing production-quality modules in LLVM to ensure that the compiled code is secure by construction. Hypothesis2 suggests a mitigation technique based on compiler’s backend to eliminate side-channel leaks. This suggestion stems from an observation: compilers must use a limited number of CPU registers to store a potentially large number of program variables, and they may introduce additional statistical dependencies by accident, as laid out by Insight 2.1. Importantly, such dependency cannot be eliminated by repairing the source code alone. In addition, the backend modules within LLVM maintain a production- quality standard, oering an opportunity to constrain register allocation modules and then propagate the eect to subsequent modules. This can be achieved without the need to implement any new backend module from scratch, as mentioned in Insight 2.2. Hypothesis3 Synthesizing high-quality relational invariants can assist in ensuring the correctness of the mitigated program. Insight 3 During the mitigation of side-channel leaks, a vital yet demanding task is to arm the functional equivalence of programs pre- and post-mitigation. Equivalence verication requires the use of invariants. Both Hypothesis 3 and Insight 3 center on the correctness of mitigated programs, ensuring that the original and mitigated programs are functionally equivalent. For this equivalence verication, relational invariants are required. These are logical assertions made over multiple programs or program executions. Such invariants are essential for programs with loops. As articulated in Insight 3, expediting invariant 5 generation can streamline equivalence verication, subsequently accelerating the generation of the mitigated program. Hypothesis4 The use of quantication can enhance accuracy in detecting side-channel leaks. Insight4 Quantication computes the probability of leaking information, conrms actual leaky cases, and subsequently reduces approximation errors in program analyses. To achieve high eciency, program analyses do not directly compute and discernstatisticaldependencies. Instead, they utilize abstract interpretation [64] to dene distribution types, aiming to approximate these statistical dependencies. While using over-approximation is sound, it is not always complete. Such approxi- mations might inaccurately classify a secure variable as leaky. Hypothesis 4 suggests a new technique for further improving the accuracy of the program analyses highlighted in Hypothesis 1. The rationale behind this, as articulated in Insight 4, is that quantication can determine the probability of an information leak. It can more accurately discern truely leaky cases from the potentially leaky cases, reducing the unknown cases and improving accuracy. 1.3 Contributions This dissertation evaluates four innovative methods for the detection and mitigation of side-channel leaks based on the aforementioned hypotheses. The rst technique implements the idea suggested byHypothesis1, which introduces the rst data-driven method for learning provably sound side-channel analysis rules from annotated programs. This allows for improvedaccuracy–scalability trade-os and enables the rules to adapt to constantly evolving program characteristics. The second technique, drawn from Hypothesis 2, develops a mitigation approach for the compiler’s backend, especially focusing on the register allocation modules. This ensures that leaky intermediate computation results are stored in dierent CPU registers or memory locations. The third technique, based on Hypothesis 3, seeks to speed up the equivalence verication of the original and mitigated programs, enhancing its scalability. The fourth technique, in line with Hypothesis 4, 6 renes side-channel analyses by introducing the ability to quantify security risks. Below we summarize the contributions and evaluation results of each technique. 1.3.1 DetectingSide-channelLeaks I develop GPS, the rst data-driven approach to learning a provably sound side-channel analyzer. The analyzer consists of type-inference rules learned from example code snippets annotated with ground truths. The key innovation inGPS is a synergistic combination of rigorous logical-reasoning and machine-learning techniques. The learning algorithm uses syntax-guided synthesis (SyGuS) to generate new features and uses decision-tree learning (DTL) to generate new analysis rules based on these features. Soundness is guaranteed by formally proving each learned rule before adding it to the learned analyzer. The analyzers learned by GPS have been evaluated on 568 programs that implement cryptographic protocols. Empirically, our learned analyzers are several hundred times faster than state-of-the-art, hand-crafted tools while maintaining the same empirical accuracy, thus conrming Hypothesis 1. 1.3.2 MitigatingSide-channelLeaks I develop a program transformation based approach for mitigating side-channel leaks. Since the compilers must use a limited number of CPU registers to store a potentially large number of program variables, they may introduce additional statistical dependencies by accident. My mitigation technique is implemented in the compiler’s backend, more specically the register allocation module, to ensure that leaky intermediate computation results are always stored in dierent CPU registers or memory locations. I have implemented the technique in the popular LLVM compiler for the x86 instruction set architecture and evaluated it on a number of cryptographic software benchmarks, including implementations of well-known ciphers such as AES and MAC-Keccak. These benchmark programs are all protected by classic masking countermeasures 7 but, still, I detected leaks related to the register reuse behavior in their LLVM compiled code. The code produced by my mitigation, also based on LLVM, is always leak-free, thus conrming Hypothesis 2. 1.3.3 VerifyingtheCorrectnessofMitigatedPrograms I develop a method for synthesizing high-quality relational invariants that can be used to speed up equiv- alence verication of the programs before and after mitigation. My method rst generates invariant candidates using syntax-guided synthesis (SyGuS) and then lters them using an SMT solver based verier. Two learning-based techniques are used to improve performance: a logical reasoning (LR) technique to prune the search space, and a reinforcement learning (RL) technique to prioritize the search. My experiments show that these learning based techniques can drastically reduce the search space and, as a result, produce invariants of a signicantly higher quality than state-of-the-art invariant synthesis tools. 1.3.4 QuantifyingSide-channelLeaks I develop a method for quantifying side-channel leaks that computes the probability of leaking information. This technique rests on two principal intellectual contributions. First, I assign a probability interval to each variable, indicating the lower and upper bounds of its likelihood to leak information. I then enhance existing program analysis techniques, specically Datalog-based declarative program analysis, to track the probability intervals of symbolic variables. This tracking process is executed through a comprehensive and sound interval analysis. Second, I introduce novel techniques to compute the probability intervals of two predicates that are mutually dependent. This approach is a synergistic combination of symbolic solving and concrete computation. When compared with the existing probabilistic Datalog engine, this technique exhibits signicant improvements in accuracy. Furthermore, my method introduces little performance overhead in contrast to the original Datalog engine Soué. 8 1.4 Overview This dissertation is structured as follows. In Chapter 2, I present the technical background on power side-channel leaks and the most closely related works to ours. The next four chapters elaborate on the main contributions of this dissertation. Chapter 3 presents a new technique for automatically learning the program analyses to detect side-channel leaks, which has been published [209] in the main track of the IEEE/ACM International Conference on Software Engineering (ICSE’21). Chapter 4 presents our transformation based technique for mitigating side-channel leaks, and has been published [210] in the main track of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’19). Chapter 5 presents our synthesis approach for quickly producing relational invariants to prove the equivalence between the original and the mitigated programs, which has been published [211] in the main track of IEEE/ACM International Conference on Automated Software Engineering (ASE’22). Chapter 6 presents a quantication approach for computing the probability of leaking information from side channels, which is under preparation for submission to the ACM International Conference on the Foundations of Software Engineering (FSE’24). 9 Chapter2 Background In this chapter, we rst present the technical background of power side-channel leaks along with the classic masking countermeasures. Then we review the related work for side-channel security and the general concept of program analysis and synthesis. 2.1 TheLeakageModelofPowerSide-Channels Prior works [113, 16, 61] show that variance in the power consumption of a computing device may leak secret information; for example, when a secret value is stored in a physical register, its number of logical-1 bits may aect the power consumption of the CPU. For each program variable, the leak is quantied using the well-known Hamming Weight (HW) and Hamming Distance (HD) leakage models [136, 135]. In the Hamming Weight (HW) model [136, 135], the leakage associated with a register value, which corresponds to an intermediate variable in the program, depends on the number of 1-bits. Let the value be D = P n1 i=0 d i 2 i whered 0 is the least signicant bit,d n1 is the most signicant bit, and each bitd i , where 0i<n, is either 0 or 1. The Hamming Weight ofD isHW (D) = P n1 i=0 d i . In the Hamming Distance (HD) model [136, 135], the leakage depends not only on the current register valueD but also a reference valueD 0 . LetD 0 = P n1 i=0 d 0 i 2 i . We dene the Hamming Distance between D andD 0 asHD(D;D 0 ) = P n1 i=0 d i d 0 i , which is equal toHW (DD 0 ), the Hamming Weight of the 10 bit-wise XOR ofD andD 0 . Another interpretation is to regardHW (D) as a special case ofHD(D;D 0 ), where all bits in the reference valueD 0 are set to 0. The HW/HD models have been validated on real devices [120, 59, 44, 143, 146]. The correlation between power variance and number of 1-bits may be explained using the leakage current of a CMOS transistor, which is the foundation of modern computing devices. Broadly speaking, a CMOS transistor has two kinds of leakage currents: static anddynamic. Static leakage current exists all the time but the volume depends on whether the transistor is on or o, i.e., a logical 1. Dynamic leakage current occurs only when a transistor is switched (0-1 or 1-0 ip). While static leakage current is captured by the HW model, dynamic leakage current is captured by the HD model (for details refer to Mangard [135].) 2.2 MaskingCountermeasure Such side-channel vulnerabilities are typically mitigated by masking [94, 5], e.g., using random bits (r 1 ;::: ;r d ) to split a secretkey bit intod + 1 shares:key 1 =r 1 ,key 2 =r 2 ,:::,key d+1 =r 1 r 2 :::r d key, where denotes the logical operation exclusive or (XOR). Since alld + 1 shares are uniformly distributed in thef0; 1g, in theory, thisorder-dmasking scheme is provably secure in that any combination of less than d shares cannot reveal the secret, but combining alld + 1 shares,key 1 key 2 :::key d+1 =key, recovers the secret. In practice, masking countermeasures must also be implemented properly to avoid de-randomizing any of the secret shares accidentally. Considert =t L t R = (r 1 key) (r 1 b) =keyb. While syntactically dependent on the two randomized valuest L andt R ,t is in fact leaky because, semantically, it does not depend on the random inputr 1 . In this work, we aim to learn a static analyzer that can soundly prove that all intermediate program variables are free of such leaks. 11 2.3 TheDataDependency We consider two dependency relations: syntactical and statistical. Syntactical dependency is dened over the program structure: a functionf(k;:::) syntactically depends on the variablek, denotedD syn (f;k), if k appears in the expression off; that is,k is in the support off, denotedk2supp(f). Statistical dependency is concerned with scenarios where random variables are involved. For example, whenf(k;r) =kr, the probability off being logical 1 (always 50%) is not dependent onk. However, whenf(k;r) =k_r, wherer is uniformly distributed in [0; 1], the probability off being logical 1 is 100% whenk is 1, but 50% whenk is 0. In the latter case, we say thatf is statistically dependent onk, denoted D sta (f;k). The relative strengths of the dependency relations are as follows::D syn (f;k) =) :D sta (f;k), i.e., iff is syntactically independent ofk, it is statistically independent ofk. In this thesis, we rely onD syn to inferD sta during type inference, since the detection of HD leaks must be both fast and sound. 2.4 TheTypeHierarchy We use a type system that starts from the input annotation (IN PUBLIC ,IN SECRET andIN RANDOM ) and computes adistributiontype for all variables. The type indicates whether a variable may statistically depend on the secret input. The distribution type of variablev, denotedTYPE(v), may be one of the following kinds: • RUD, which stands forrandomuniformdistribution, meansv is either a random inputm2 IN RANDOM or perfectly randomized [38] bym, e.g.,v = km. • SID, which stands for secret independent distribution, means that, while not RUD,v is statistically independent of the secret variable inIN SECRET . 12 • UKD, which stands for unknown distribution, indicates that we are not able to prove thatv isRUD or SID and thus have to assume thatv may have a leak. The three types form a hierarchy: UKD is the least desired because it means that a leak may exist. SID is better: although it may not beRUD, we can still prove that it is statistically independent of the secret, i.e., no leak. RUD is the most desired because the variable not only is statistically independent of the secret (same as inSID), but also can be used like a random input, e.g., to mask other (UKD) variables. For leak mitigation purposes, it is always sound to treat an RUD variable as SID, or an SID variable as UKD, although it may force instructions to be unnecessarily mitigated. In general, the sets of variables marked as the three types form a hierarchy:S RUD S SID S UKD . In practice, we want to infer as manySID andRUD variables as possible. For example, ifk2 IN SECRET , m2 IN RANDOM andk m =km, thenTYPE(k) = UKD andTYPE(k m ) = RUD. Ifx2 IN PUBLIC and xk m =x^k m , thenTYPE(xk m ) = SID because, althoughx may have any distribution, sincek m isRUD, xk m is statistically independent of the secret. We preferRUD overSID, when both are applicable to a variablex 1 , because ifx 1 is XOR-ed with aUKD variablex 2 , we can easily prove thatx =x 1 x 2 is RUD using local inference, as long asx 1 is RUD and x 2 is not randomized by the same input variable. However, ifx 1 is labeled not as RUD but as SID, local inference rules may not be powerful enough to prove thatx is RUD or even SID; as a result, we have to treatx asUKD (leak), which is less accurate. 2.5 RelatedWork 2.5.1 DetectingPowerSide-channelLeaks Existing methods for detecting power side channels fall into three categories: static analysis, formal verication, and hybrid approach. Static analysis relies on compile-time information to check if masking is 13 implemented correctly [20, 30, 158, 21, 36]. They are faster than formal verication, which often relies on model counting [78, 80, 88]. However, formal verication is more accurate than static analysis. The hybrid approach [228] aims to combine the two types of techniques to obtain the best of both worlds. However, none of these methods focused on the leaks caused by register reuse inside a compiler. Barthe et al. [20] proposed a relational analysis technique to check the correctness of high-order masking. When applied to a pair of variables, however, it has to consider all possible ways in which second-order leaks may occur, as opposed to the specic type involved in register reuse. Thus, mitigation has to be more expensive in terms of the code size and the execution speed. Furthermore, as we have shown in Chapter 4, it is not eective in preventing leaks caused by register reuse. 2.5.2 MitigatingPowerSide-channelLeaks The practical security against side-channel leakages via masking can be evaluated using the ISW model [114] and subsequent extensions [61, 16] with transitions. However, they do not consider leaks that are specic to register use in modern compilers. They do not consider constraints imposed by the instruction set architecture either. Furthermore, they need to double the masking order [16] to deal with leaks with transitions, but still do not prevent leaks introduced by compilation. 2.5.3 OtherSide-channels Beyond power side channels, there are techniques for analyzing other side channels using logical reason- ing [54, 200, 13, 202], abstract interpretation [73, 26, 220, 221, 212], symbolic execution [162, 17, 164, 42, 133, 98] and dynamic analysis [154, 213]. As for mitigation, there are techniques based on compilers [29, 148, 4, 220] or program synthesis tools [207, 77, 39]. However, these techniques focus on side-channel leaks in the input program. None of them focuses on leaks introduced by register reuse during the compilation. 14 2.5.4 SecureCompilationforSide-channelSecurity It is known that security guarantees of software countermeasures may become invalid after compila- tion [139, 161, 25, 95]. In this context, Barthe et al. [25] showed that the compilation process could maintain the constant-time property for timing side-channel leaks, while our work addresses potential leaks through power side channels. Marc [95] also investigated potential vulnerabilities in power side-channel countermeasures during compiler optimizations, but did not provide a systematic method for mitigating them. 2.5.5 LearningandSynthesizingAnalyzersfromExamples While there are prior works on learning static analyzers [34, 226], they do not guarantee soundness. For example, the analyzer learned by Bielik et al. [34] is only sound with respect to the programs in the training set, not all programs written in the same programming language (JavaScript). They also need to manually modify the training programs to generate counter-examples. There are several prior techniques using machine learning to conduct static program analyses [116, 170, 206, 100]. Such techniques focus on nding a suitable program-to-feature embedding. However, they require the user to perform feature engineering, which is known to be laborious. Some of these techniques [134, 170, 104, 102] do not even take advantage of new features that may be learned from data; instead, they build classiers based solely on existing features. Syntax-GuidedSynthesis has also been widely used for generating programs. While SyGuS has been used in various applications [178, 141, 83, 167, 1, 28, 222, 126, 111, 119, 128], none of them aims to synthesize a provably sound static analyzer from data. While some of these existing techniques can synthesize Datalog rules [192, 230, 194], the focus has been on eciency, e.g., pruning the search space based on syntactic structures, instead of guaranteeing the soundness of the analyzer. 15 2.5.6 OptimizingProgramAnalyses It is also possible to optimize an existing static analyzer [96, 103, 156, 105, 195, 116], which can be achieved by adjusting the level of abstraction [96, 103], learn heuristics and parameters [156], make soundness- accuracy trade-os [105], or select sound transformers [195]. However, such techniques fundamentally dier from our method because they assume the analyzer is already given, and their focus is on optimizing its performance, whereas we focus on synthesizing a new analyzer. 16 Chapter3 DetectingSide-channelLeaks Static analyses are being increasingly used to detect security vulnerabilities such as side channels [210, 227, 55, 43]. However, manually crafting a static analyzer to balance between accuracy and eciency is not an easy task: even for domain experts, it can be labor intensive, error prone, and result in suboptimal implementations. For example, we may be tempted to add expensive analysis rules for specic sanitized patterns without realizing they are rare in target programs. Even if the analysis rules are carefully tuned to a corpus of code initially, they are unresponsive to changing characteristics of the target programs and thus may become suboptimal over time; manually updating them to keep up with the new programs would be dicult. One way to make better accuracy-eciency trade-os and to dynamically respond to the distribution of target programs is to use data-driven approaches [34, 142] that automatically synthesize analyses from labeled examples provided by the user. However, checking soundness or compliance with user intent (generalization) has always formed a signicant challenge for example-based synthesis techniques [97, 127, 197, 198, 199]. The lack of soundness guarantees, in particular, hinders the application of such learned analyzers in security-critical applications. While several existing works [10, 138, 196, 69] try to address this problem, rigorous soundness guarantees have remained elusive. 17 Feature Synthesis (SyGuS) Decision Tree Learning Query Containment Checking Knowledge Base (KB) Training Programs Type Annotations Learner Prover Analyzer Counter- example R Veried R Rejected R Figure 3.1: The overall ow ofGPS, our data-driven synthesis method. To overcome this problem, we propose a learning-based method for synthesizing a provably-sound static analyzer that detects side channels in cryptographic software, by inferring a distribution type for each program variable that indicates if its value is statistically dependent on the secret. The overall ow of our method, namedGPS, is shown in Fig. 4.1. The input is a set of training data and the output is a learned analyzer. The training data are small programs annotated with the ground truth, e.g., which program variables have leaks. Internally,GPS consists oflearner andprover. Thelearner uses syntax guided synthesis (SyGuS) to generate recursive features and decision tree learning (DTL) to generate type-inference rules based on these features; it returns a setR of Datalog formulas that codify these rules. Theprover checks the soundness of each learned rule, i.e., it is not only consistent with the training examples but also valid for any other unseen programs. This is formulated by solving a query containment checking problem, i.e., each rule must be justied by existing proof rules called the knowledge base (KB). Since only proved rules are added to the analyzer, the analyzer is guaranteed to be sound. If a rule cannot be proved, we add a counter-example to prevent the learner from generating it again. We have implementedGPS in LLVM and evaluated it on 568 programs that implement cryptographic protocols and algorithms [20, 38, 30]. Together, they have 2,691K lines of C code. We compared our learned analyzer with two state-of-the-art, hand-crafted side-channel analysis tools [227, 210]. Our experiments show that the learned analyzer achieves the same empirical accuracy as the two state-of-the-art tools, while being several orders-of-magnitude faster. Specically,GPS is, on average, 300 faster than the analyzer from [210] and 900 faster than the analyzer from [227]. 18 To summarize, this paper makes the following contributions: • We propose a data-driven method for learning a provably sound static analyzer using syntax guided synthesis (SyGuS) and decision tree learning (DTL). • We guarantee soundness by formulating and solving a Datalog query containment checking problem. • We demonstrate the eectiveness of our method for detecting side channels in cryptographic software. In the remainder of this paper, we begin by presenting the technical background in Section 3.1 and our motivating example in Section 3.2. We then describe the learner in Section 3.3 and the prover in Section 3.4, followed by the experimental results in Section 3.5. Finally, we conclude in Section 3.6. 3.1 Preliminaries 3.1.1 TypeSystems Type systems prove to be eective in analyzing power side channels [227, 210], e.g., by certifying that all intermediate variables of a program arestatisticallyindependent of the secret. Typically, the program inputs are marked as public (INPUB), secret (INKEY) or random (INRAND), and then the types of all other program variables are inferred automatically. The type of a variablev, denoted TYPE(v), may be RUD, SID, or UKD. Here, RUD stands for random uniform distribution, meaningv is either a random bit or being masked by a random bit. SID stands for secret independent distribution, meaningv does not depend on the secret. UKD stands for unknown distribution, or potentially leaky; if the analyzer cannot provev to beRUD orSID, then it isUKD. Type systems are sound but not necessarily complete. They are sound in that they never miss real leaks. For example, by default, they may safely assume that all variables areUKD, unless a variable is specically elevated toSID orRUD by an analysis rule. Similarly, they may conservatively classifySID variables asUKD, 19 or classifyRUD variables asSID, without missing real leaks. In general, the sets of variables marked as the three types form a hierarchy:S RUD S SID S UKD . 3.1.2 Relations The program (in SSA format) is represented as the abstract syntax tree (AST). Static analyzers infer the type of each nodex of the program’s AST based on various features ofx. In this work, pre-dened features are represented as relations. • Unary relationsINPUB(x),INKEY(x), andINRAND(x) denote the given security level of a program inputx, which may be public, secret, or random. • Unary relations RUD(x), SID(x), and INRAND(x) denote the inferred type of a program variablex, which may be uniformly random, secret independent, or unknown. • Unary relationOP(x) denotes the operator type of the AST nodex, e.g.,OP(x) :=ANDOR(x)jXOR(x), whereANDOR(x) means thatx’s operator type is eitherlogicaland orlogicalor, andXOR(x) means thatx’s operator type is exclusive or. • Binary relationsLC(x;L) andRC(x;R) indicate that the AST nodesL andR are the left and right operands ofx, respectively. • Binary relation supp(x;y) indicates that the AST nodey is used in the computation ofx syntactically, while dom(x;y) indicates that random variabley is used in the computation ofx semantically. 3.2 Motivation Consider the program in Fig. 3.2a, which computes the function from Keccak, a family of cryptographic primitives for the SHA-3 standard [155, 32]. It ultimately computes the functionn1 = i1 (: i2 ^ i3 ), where means XOR. Unfortunately, a straightforward implementation could potentially leak knowledge 20 UserLabel RUD(b1) RUD(b2) RUD(b3) RUD(b4) SID(n9) SID(n8) SID(n7) RUD(n6) SID(n5) SID(n4) RUD(n3) RUD(n2) UKD(n1) bool mChi(bool i1, bool i2, bool i3, bool r1, bool r2, bool r3) { bool b1 = i1 r1; bool b2 = i2 r2; bool b3 = i3 r3; bool b4 = b2 b3; bool n9 = b3 ^ r2; bool n8 = r3 ^ r2; bool n7 = b2 _ r3; bool n6 = r1 n9; bool n5 = n7 n8; bool n4 = b2 _ b3; bool n3 = n5 n6; bool n2 = n4 b1; bool n1 = n2 n3; return n1; } (a) f(x) true false UKD OP(x) XOR(x) ANDOR(x) TYPE(R) TYPE(L) SID(R) RUD(R) TYPE(L) RUD :RUD(L) RUD(L) UKD TYPE(R) RUD(L) :RUD(L) RUD UKD :RUD(R) RUD(R) UKD SID fn1;n5g fn1;n5g fn6g fn6g fb1;b2;b3;b4;n2;n3g fn4;n7;n8;n9g (b) Figure 3.2: The program on the left is a perfectly masked function from MAC-Keccak. The decision tree on the right represents the static analyzer that the user would like to synthesize. Here,x is a program variable, whose type is being computed;L andR are its left and right operands, respectively, andf(x) is a synthesized feature shown in Fig. 3.3a (represented by recursive Datalog program). of the sensitive inputsi1,i2 andi3 if the attacker were able to guess the intermediate results: i2 and : i2 ^ i3 via the power side-channels [77, 22]. The masking countermeasures in the implementation therefore use three additional random bitsr1,r2 andr3 to prevent exposure of the private inputs while still computing the desired function. 3.2.1 ProblemSetting Given such masked programs, users want to determine if they succeed in eliminating side-channel vulner- abilities: in particular, if each intermediate result is uniformly distributed (RUD) or at least independent of the sensitive inputs (SID). The desired static analysis thus associates each variablex (e.g.,n 1 ) with the elements of a three-level abstract domain,RUD,SID orUKD, indicating thatx is uniformly distributed (RUD), secret independent (SID), or unknown (UKD) and therefore potentially vulnerable. The decision tree in Fig. 3.2 (b) represents the desired static analyzer, which accurately classies most variables of the training corpus, and is also sound when applied to new programs. Given variablex, the decision tree leverages the features ofx—such as the operator type ofx (OP(x) := ANDOR(x)jXOR(x)) and 21 R1 : RUD(x) XOR(x)^RC(x;R)^RUD(R)^:f(x) R2 : f(x) LC(x;L)^RC(x;R)^g1(L;rL)^ g2(R;rR)^rL =rR R3 : g1(r;r) INRAND(r) R4 : g1(x;r) LC(x;y)^g1(y;r) R5 : g1(x;r) RC(x;y)^g1(y;r) R6 : g2(r;r) INRAND(r) R7 : g2(x;r) LC(x;y)^g2(y;r)^XOR(x) R8 : g2(x;r) RC(x;y)^g2(y;r)^XOR(x) (a) Excerpt of rules learned by theGPS tool. M1 : RUD(x) XOR(x)^ dom(x; res)^ res6=; M2 : supp(x;x) INRAND(x)_INKEY(x)_INPUB(x) M3 : supp(x; res) LC(x;L)^RC(x;R)^ supp(L;SL)^ supp(R;SR)^ res =SL[SR M4 : dom(x;x) INRAND(x) M5 : dom(x;;) INKEY(x)_INPUB(x) M6 : dom(x; res) XOR(x)^LC(x;L)^RC(x;R)^ dom(L;SDL)^ dom(R;SDR)^ supp(L;SL)^ supp(R;SR)^ res = (SDL[SDR)n (SL[SR) (b) Corresponding expert written rules from SCInfer [227]. Figure 3.3: Comparing the rules learned byGPS (Fig. 3.3a) to manually crafted rules from SCInfer (Fig. 3.3b). Observe that the learned rules aresound, i.e., every variable which potentially leaks information is assigned the distribution type UKD, while still managing to draw non-trivial conclusions such as RUD(b4). The learned rules (R 2 —R 8 ) in Fig. 3.3a are used to dene the new featuref(x) in Fig. 3.2 (b). the types ofx’s operands (e.g. TYPE(L),TYPE(R))—and mapsx to its corresponding distribution type. The white nodes of Fig. 3.2 (b) represent pre-dened features, while the grey nodes represent output classes (associated types). Each path from the root to leaf node corresponds to one analysis rule. The complete set of pre-dened features are demonstrated in Fig. 3.4a. Designing such side-channel analyses has been the focus of intense research, see for example [210, 227, 22, 208, 21, 20, 55, 205]. Unfortunately, it requires expert knowledge in both computer security and program analysis, and invariably involves delicate trade-os between accuracy and scalability. Our goal in this work is to assist the analysis designer in automating the development. This problem has also been the subject of exciting research [65, 34]; however, these approaches typically either require computationally intensive deductive synthesis or cannot guarantee soundness and thus produce errors in both directions, including false alarms and missed bugs. 22 OP(v) v ::=xjLjR AND OR NOT XOR MUL LEAF TYPE(v) v ::=LjR RUD SID UKD INRANDINPUBINKEY RUD(x) SID(x) UKD(x) (a) OP(x) ANDOR(x) XOR(x) SID OP(R) LEAF(R) ANDOR(R) XOR(R) RUD TYPE(L) TYPE(L) SID(L) RUD(L) SID RUD SID(L) RUD(L) RUD RUD?UKD fn4;n7;n8;n9g fn4;n7;n8;n9g fn5g fn5g fn6g fn6g fn2;n3g fn2;n3g fb4;n1g fb4;n1g fb1;b2;b3g fb1;b2;b3g (b) OP(x) ANDOR(x) XOR(x) SID OP(R) LEAF(R) ANDOR(R) XOR(R) RUD TYPE(L) TYPE(L) SID(L) RUD(L) SID RUD SID(L) RUD(L) RUD f(x) true false UKD RUD (c) Figure 3.4: The classier of Fig. 3.4b is learned only using the features in Fig. 3.4a. Because of the limited expressive power of these features, the learned analysis necessarily misclassies eitherb4 orn1. Fig. 3.4c denotes the candidate analyzer produced after one round of feature synthesis. The blue paths corresponds to the ruleRUD(x) XOR(x)^XOR(R)^RUD(L)^:f(x)^LC(x;L)^RC(x;R). Unfortunately, even though this analysis (Fig. 3.4c) achieves 100% training accuracy, the leaf nodes highlighted in red correspond to unsound predictions. In contrast,GPS combines inductive synthesis from user annotations with logical entailment checking against a more comprehensive, known-to-be-correct set of proof rules that form the knowledge base (KB). It takes as input training programs like the one in Fig. 3.2a, where the labels correspond to the types of program variables (RUD/SID/UKD for intermediate results and INRAND/INPUB/INKEY for inputs). The users are free to annotate as many or as few of these types as they wish: this aects only the precision of the learned analyzer and not its soundness. Second, GPS also takes as input the knowledge base KB, consisting of proof rules that describe axioms of propositional logic (Fig. 3.8) and properties of the distribution types (Fig. 3.10). In return,GPS produces as output a set of Datalog rules which simultaneously achieves high accuracy on the training data and provably sound with respect toKB. The proof rules forKB were collected from published papers on masking countermeasures [227, 210, 20]. We emphasize thatKB is not necessarily an executable static analyzer since repeated application of these proof rules need not necessarily reach a xpoint and terminate in nite time; Furthermore, even in cases where it does terminate,KB may be computationally expensive and infeasible for application to large programs. As a concrete example, we compare excerpts of the rules learned byGPS in Fig. 3.3a to 23 the corresponding rules from SCInfer [227]—a sophisticated human-written analysis—in Fig. 3.3b. LC(x;L) andRC(x;R) arises in both versions, indicating that the variableL andR are the left and right operands ofx respectively. Specically, in Fig. 3.3b, supp(x;y) indicates thaty is used in the computation ofx syntactically while dom(x;y) denotes that random variabley is used in the computation ofx semantically. Observe the computationally expensive set-theoretic operations in the human-written version to the simpler rules learned byGPS without loss of soundness or perceptible loss in accuracy. These points are also borne out in our experiments in Table 3.2, where SCInfer takes>45 minutes on some Keccak benchmarks, while our learned analysis takes<5 seconds. GPS consists of two phases: First, it learns a set of type-inference rules—alternatively represented either as Datalog programs or as decision trees—that are consistent with the training data. Second, it proves these rules against the knowledge base. In the next two subsections, we will explain the rule learning and soundness proving processes respectively. 3.2.2 FeatureSynthesisandRuleLearning The learned analyzer associates each nodex of a program’s abstract syntax tree (AST) with an element of the distribution typefUKD(x);SID(x);RUD(x)g. We may therefore interpret the analyzer as a decision tree that, by considering various features of an AST node, maps it to a type. With a pre-dened set of features, such as those shown in Fig. 3.4a, analyzers of this form can be learned with classical decision tree learning (DTL) algorithms. Fig. 3.4b shows such an analyzer, learned from the labeled program of Fig. 3.2a. Unfortunately, the pre-dened features may not be strong enough to distinguish between nodes with dierent training labels, e.g.,b4 andn1 from the training program, which have distinct labelsRUD(b4) and UKD(n1), but after being sifted into the node highlighted in red in Fig. 3.4b, cannot be separated by any of the features from Fig. 3.4a. To ensure soundness, the learner would be forced to conservatively assign the labelUKD(x), which sacrices the accuracy. 24 OP(x) OP(L) OP(R) TYPE(L) TYPE(R) f(x) CE 1 ANDOR -1 -1 -1 -1 -1 CE 2 XOR -1 LEAF -1 -1 -1 CE 3 XOR -1 XOR SID -1 -1 CE 4 XOR -1 ANDOR RUD -1 -1 CE 5 XOR -1 ANDOR SID -1 -1 Figure 3.5: The table denotes the abstract counter-examples produced during the soundness verication of the candidate analyzer shown in Fig. 3.4c. GPS thus includes a feature synthesis engine, triggered whenever the learner fails to distinguish between two dierently labeled variables. In tandem with recursive feature synthesis,GPS overcomes the limited expressiveness of DTL by enriching syntax space to capture more desired patterns. Observe that paths of a decision tree can be represented as Datalog rules, e.g., the red path in Fig. 3.4b is equivalent to UKD(x) XOR(x)^XOR(R)^RUD(L)^LC(x;L)^RC(x;R): Viewing this in Datalog also allows us to conveniently describe recursive features, and reduce feature synthesis to an instance of syntax-guided synthesis (SyGuS). Syntactically, the analysis rules corresponding to new features are instances of a pre-dened set ofmeta-rules, and the target specication is to produce a Datalog program for a relationf(x) that has strictly positive information gain for the variables under consideration (see Section 3.3 for details). In our running example, the synthesizer produces the featuref(x) shown in Fig. 3.3a, which intuitively indicates that some random inputr is used to compute both operands ofx. With this new feature, the learner can distinguish between b4 and n1, and produce the rule shown in Fig. 3.4c, which correctly classies all variables of the training program. Observe that the rules deningf(x) in Fig. 3.3a involve a newly introduced predicateg(x;r) and recursive structure that can classify variables based on arbitrarily deep properties of the abstract syntax tree. 25 TYPE(L) RUD(L) UKD(L) SID(L) OP(x) RUD OP(R) XOR(x) ANDOR(x) f(x) SID ANDOR(R) XOR(R) SID RUD true false UKD RUD Figure 3.6: It shows candidate analysis learned after one round of feedback from the soundness verier. The leaves shown in green and red correspond to sound and unsound analysis rules respectively. 3.2.3 ProvingSoundnessofLearnedAnalysisRules While the learned analysis rules are correct by construction for the training examples, they may still be unsound when applied to unseen programs. We observe this, for example, in the leaves highlighted in red in Fig. 3.4c. Thus,GPS tries to conrm their soundness against the domain-specic knowledge baseKB. In the context of our running example—conrming soundness means proving that every variablex that is assigned typeRUD(x) (resp. SID(x)) by the learned analysis rule is also certiedRUD(x) (resp. SID(x)) byKB. We formalize the soundness proof as a Datalog query containment problem, and propose an algorithm based on bounded unrolling andk-induction to check it. When applied to the candidate analysis of Fig. 3.4c, the check results in the ve counter-examples CE 1 ;:::;CE 5 with distribution type UKD(CE i ) shown in Fig. 3.5. Each counter-example indicates the unsoundness of one path from the root of the decision tree to a classication node. These counter-examples containmissingfeatures and consequently do not dene concrete ASTs. Thus, each of theseabstract counter- examples is a set of feature valuations =ff 1 7!v 1 ;f 2 7!v 2 ;:::;f k 7!v k g that the current candidate analysis misclassies, and feeding them back to the learner prohibits subsequent candidate analyses from classifying variables that satisfy. With these new constraints from abstract counter-examples, the learner learns the new candidate analysis shown in Fig. 3.6. This new candidate analysis still has four unsound candidate rules, which results in additional abstract counter-examples when it is subjected to the soundness check. We repeat this 26 back-and-forth between the rule learner and the soundness prover: after 11 iterations and after processing 27 counter-examples in all,GPS learns the rules initially presented in Fig. 3.2 (b), all of which have been certied to be sound. 3.2.4 OverallArchitectureoftheGPS System We summarize the architecture ofGPS in Fig. 4.1. The learner repeatedly applies DTL and SyGuS to learn candidate analyses that correctly classify training samples and are consistent with newly-added abstract counter-examples. Next, prover checks the soundness of the learned analysis. Each subsequent counter-example is fed back to the learner which restarts the rule learning process on augmented dataset, untill either all synthesized rules are sound or a time bound is exhausted. 3.3 LearningtheType-inferenceRules We formally describe the analysis rule learner in Algorithm 1. The input consists of a set of labeled examples, E, and a set of pre-dened features,F, and the outputT is a set of type-inference rules consistent with training examples. Each training example (x;TYPE(x))2E consists of an AST nodex in a program and its distribution typeTYPE(x). At the top level, the learner uses the standard decision tree learning (DTL) algorithm [177] as the baseline. However, if it nds that the current setF of classication features is insucient, it invokes a syntax-guided synthesis (SyGuS) algorithm to synthesize a new featuref with strictly positive information gain to augmentF. This allows the learner to combine the eciency of techniques that learn decision trees with the expressiveness of syntax guided synthesis; similar ideas have been fruitfully used in other applications of program synthesis, see for example [9]. While the top-level classier (e.g., Fig. 3.2 (b), 3.4b, 3.4c and 3.6) has a bounded number of decision points, the synthesized features (e.g., Fig. 3.3a) may be recursive. Furthermore, the newly synthesized 27 Algorithm1DTL(E;F) — Decision Tree Learning. Input: Examples,E =f(x1;TYPE(x1));:::; (xn;TYPE(xn))g Input: Pre-dened features,F =ff1;f2;:::;f k g Output: ClassierT which is consistent with provided examples 1: if all examples (x;TYPE(x))2E have the same labelTYPE(x) =tthen 2: returnT = LeafNode(t) 3: endif 4: if69f2F such thatH(Ejf)<H(E)then 5: F :=F[FeatureSyn(E) 6: endif 7:T = DecisionNode(f ), wheref = arg min f2F H(Ejf) 8: for valuationi of featurefdo 9: Ti = DTL(Ej f (x)=i ;Fnff g) 10: Add edge fromT toTi with labelf (x) =i 11: endfor 12: returnT featuresf are inducted as rst-class citizens ofF, and can subsequently be used at any level of the decision tree (see, for example Fig. 3.2 (b) and 3.6). Next, we present the DTL and SyGuS subroutines respectively. 3.3.1 TheDecisionTreeLearningAlgorithm Recall that our pre-dened features (Fig. 3.4a) include properties of the AST node, such as OP(x), and properties referring its left and right children, such asOP(L)^LC(x;L). The choice requires some care: having very few features will cause the learning algorithm to fail, while having too many features will increase the risk of overtting. Our synergistic combination of DTL with SyGuS-based on-demand feature synthesis can be seen as a compromise between these extremes. DTL(E;F) is an entropy-guided greedy learner [177]. The entropy of a set is a measure of the diversity of its labels: H(E) = P t2TYPE Pr(TYPE(x) =t) log(Pr(TYPE(x) =t)) H(Ejf) = P i2Range(f) H(Ejf(x) =i) Algorithm 1 thus divides the set of training examplesE using the featuref = f that minimizes the conditional entropyH(Ejf) (lines 7–12), and recursively invokes the learning algorithm on each subset, DTL(Ej f (x)=i ;Fnff g). 28 Algorithm2FeatureSyn(E). Input: Examples,E =f(x1;TYPE(x1));:::; (xn;TYPE(xn))g Output: Featuref with positive information gain, or? to indicate failure 1: LetS be the meta-rules dened in Figure 3.7, i.e. the hypothesis space 2: foreach relation schemar dened inS do 3: foreach subsetS of meta-rules corresponding to the schemado 4: foreach choicepin,qin, and nested relational predicatesdo 5: Letf be the corresponding instantiation of the meta-rules inS 6: ifh(Ejf)h(E)then 7: returnf 8: endif 9: endfor 10: endfor 11: endfor 12: return? R f = 8 > > > > < > > > > : f(x) pin(x); f(x) qin(x;y); f(x) pin(x;y)^qin(x;y); f(x) qin(x;y)^f(y); f(x) qin(x;y)^pin(x)^f(y) 9 > > > > = > > > > ; Rg = 8 > > < > > : g(x;y) qin(x;y); g(x;y) pin(x)^qin(x;y); g(x;y) qin(x;z)^g(z;y); g(x;y) qin(x;z)^pin(x)^g(z;y) 9 > > = > > ; R h = 8 < : h(x) f(x)^pin(x)^qin(x;y); h(x) g(x;y)^pin(x)^qin(x;y); h(x) f(x)^g(x;y)^pin(x)^qin(x;y) 9 = ; pin(x) ::= AND(x)j OR(x)j NOT(x)j XOR(x)j MUL(x)j LEAF(x) j INRAND(x)j INKEY(x)j INPUB(x) j pin^pinjpin_pinj:pin qin(x;y) ::= LC(x;y)j RC(x;y)jx =y j qin(x;y)^qin(x;y)jqin(x;y)_qin(x;y) j :qin(x;y) Figure 3.7: Syntax of the DSL for synthesizing new features. Observe thatH(E) = 0 if Pr(TYPE(x) =t) = 100%, meaningpurity or all examplesx2E share the same typeTYPE(x) =t. The dierence betweenH(E) andH(Ejf) is also referred to as the information gain. If the learner cannot nd a feature with strictly positive information gain (line 4), it will invoke the feature synthesis algorithm on line 5. 3.3.2 TheOn-DemandFeatureSynthesisAlgorithm Representing new features. We represent the newly synthesized features as Datalog programs. Datalog is an increasingly popular medium to express complex static analyses [172, 218, 115, 40], and its recursive nature 29 b_:btrue (B 1 ) b^:bfalse (B 2 ) ::bb (B 3 ) :a_:b:(a^b) (B 4 ) :a^:b:(a_b) (B 5 ) b_falseb (B 6 ) b_truetrue (B 7 ) b^trueb (B 8 ) b^bb (B 9 ) b^falsefalse (B a ) b_bb (B b ) a^ (a_b)a (B c ) a_ (a^b)a (B d ) ab (a^:b)_ (:a^b) (B e ) (a_b)_ca_c_b (B f ) (a^b)^ca^c^b (B 10 ) a_ (b_c)a_b_c (B 11 ) a^ (b^c)a^b^c (B 12 ) Figure 3.8: Proof rules for propositional logic, to simplify the logic formula and deduce Boolean constants (true andfalse). enables the newly learned features to represent arbitrarily deep properties of AST nodes. A Datalog rule is a constraint of the form h(x) b 1 (y 1 )^b 2 (y 2 )^^b n (y n ); (3.1) whereh,b 1 ...b n are relations with pre-specied arities and schemas, and wherex,y 1 ...y n are vectors of typed variables. Each rule can be interpreted as a logical implication: ifb 1 . ..b n are true, then so ish. The semantics of a Datalog program is dened as the least-xed point of rule application [3]: the solver starts with empty output relations, and repeatedly derives new output tuples until no new tuples can be derived. Program synthesis commonly restricts the space of target concepts and biases the search to speed up computation and improve generalization. One form of bias has been to constrain the syntax: this has been formalized as the SyGuS problem [8] and as meta-rules in inductive logic programming [150, 192]. A meta-rule is construct of this form X h (x) X 1 (y 1 )^X 2 (y 2 )^^X n (y n ) (3.2) Here,X h ,X 1 ,X 2 , ...,X n are relationvariables whose instantiation yields a concrete rule. Fig. 3.7 shows the meta-rules used in our work. For example, instantiating the meta-rulef(x) q in (x;y)^p in (x)^f(y) 30 withq in (x;y) = RC(x;y) andp in (x) = XOR(x) yieldsf(x) RC(x;y)^XOR(x)^f(y). There are three variations of the nal target relation schema,f(x),g(x;y) andh(x), wherex andy denote AST nodes. We formalize the synthesis problem as that of choosing a relationR2ff(x);g(x;y);h(x)g and nding a subsetP D of its instantiated meta-rules from Fig. 3.7 such that the resulting Datalog programP D has strictly positive information gain on the provided training examplesE. Algorithm 2 shows the procedure, which repeatedly instantiates the meta-rules from Figure 3.7 and computes their information gain. It successfully terminates when it discovers a feature that can improve classication. Otherwise, it returns failure (upon timeout) and invokesDTL(E;F) to conservatively classify the decision tree node as being of typeUKD. Example3.3.1. GivenE =f(b4;RUD); (n1;UKD)g shown in Fig. 3.2a, the synthesizer may alternatively learn the rules in Equations 3.3, 3.4 and 3.5. f(x) INRAND(x); (3.3) f(y) LC(y;x)^f(x); f(y) RC(y;x)^f(x); RUD(x) XOR(x)^LC(x;L)^RC(x;R)^RUD(L)^f(R): g(x;x) INRAND(x); (3.4) g(y;z) LC(y;x)^g(x;z); g(y;z) RC(y;x)^g(x;z); h(x) LC(x;L)^RC(x;R)^g(L;xL)^g(R;xR)^xL =xR; RUD(x) XOR(x)^RUD(L)^RUD(R)^LC(x;L)^RC(x;R)^:h(x): g(x;x) INKEY(x); (3.5) g(y;z) LC(y;x)^g(x;z); g(y;z) RC(y;x)^g(x;z): h(x) LC(x;L)^RC(x;R)^g(L;xL)^g(R;xR)^xL =xR; RUD(x) XOR(x)^RUD(L)^RUD(R)^:h(x): 31 _ _ g 1(L;k 1) k 1 LC r1 RC : g 2(R;k 2) k 2 LC k 1 =k 2 SID(x) Figure 3.9: ExampleAST from which is learned. Since the information gain of Rule 3.3 applying to {b4;n1} is zero, it gets discarded (Line 6 in Algorithm 2). In contrast, the information gains of Rules 3.4 and 3.5 are both positive. Rule 3.4 intuitively requires that both the left and right operands ofx are of type RUD, and that they do not share any random inputs in computing:h(x). Rule 3.5 requires that the same secret key be used in the computation of both operands. While Rule 3.4 is sound when applied to arbitrary programs, Rule 3.5 is unsound. In the next section, we will present an algorithm that can check the soundness of these learned rules. 3.4 ProvingtheType-inferenceRules We wish to prove that a learned rule, denoted, never reaches unsound conclusions when applied to any program, by showing that it can be deduced from a known-to-be-correct knowledge base (KB). More specically, we wish to conrm that every AST nodex marked as RUD (or SID) by can be certied to be RUD (or SID) byKB. When both andKB are expressed in Datalog, the problem reduces to one of determining query containment, e.g., for every valuation of the input relations, RUD RUD KB (or SID SID KB ). We will now describe a semi-decision procedure to verify the soundness of the learned rules, which forms the second phase of the synthesis loop inGPS. 3.4.1 RepresentationoftheLearnedRule() Let be a set of Datalog rules, each of which has a head relation and a body of the following form: (x) 1 (x 1 )^ 2 (x 2 )^^ n (x n ) (3.6) 32 It means holds only when all of 1 ;:::; k hold. Here, may be a distribution type, e.g.,SID(x), or a recursive featureg(x;y), e.g., representing thatx depends ony. 3.4.2 RepresentationoftheKnowledgeBase(KB) OurKB consists of two sets of proof rules, one for propositional logic and the other for distribution types. Proof Rules for Propositional Logic Fig. 3.8 shows the proof rules that represent axioms of propositional logic [181]; they can be used to reduce any valid (resp. invalid) Boolean formula to constanttrue (resp. false). Thus, they are useful in showing results such astrue_P andfalse^Q are secret-independent (SID), for arbitrary logical sentencesP andQ. Consider the example rule below, whereg 1 andg 2 are synthesized features shown as dashed boxes in Fig. 3.9: SID(x) OR(x)^LC(x,L)^RC(x,R)^OR(L)^NOT(R)^ g1(L,k1)^g2(R,k2)^EQ(k1;k2) g1(L,k1) INKEY(k1)^INRAND(r1)^LC(L,k1)^RC(L,r1) g2(R,k2) INKEY(k2)^LC(R,k2) Sincek 1 =k 2 , we transform into an equivalent logic formula: SID(x) EQ(x; (k1_r1)_ (:k1)) RulesB1,B7 andBf in Fig. 3.8 show that (k 1 _r 1 )_ (:k 1 ) is alwaystrue. Thus,x is alwaystrue. Since x is a constant, we haveSID(x), meaningx is secret-independent. Such seemingly simpleSID rules, learned by our method automatically, and yet overlooked by state-of- the-art (hand-crafted) analyzers [227, 210], can signicantly improve the accuracy of side-channel analysis on many programs. Proof Rules for Distribution Types. Fig. 3.10 shows the proof rules that represent properties of the distribution types. They were collected from published papers [227, 77, 20] that focus on verifying masking 33 `x : INRAND supp(x;fxg) (D 1:1 ) `x : INKEY supp(x;fxg) (D 1:2 ) `x : INPUB supp(x;fxg) (D 1:3 ) `x : INRAND dom(x;fxg) (D 2:1 ) `x : INKEY dom(x;;) (D 2:2 ) `x;y : v; `S : Setv;RC(y;x1)^LC(y;x2)^supp(x1;S1)^supp(x2;S2) supp(y;S1[S2) (D 1:4 ) `x : INPUB dom(x;;) (D 2:3 ) `x;y : v; `S : Setv;RC(y;x1)^LC(y;x2)^XOR(y)^dom(x1;S1)^dom(x2;S2) dom(x; (S1[S2)=(S1\S2)) (D 2:4 ) `x : v; `S : Setv;dom(x;Sx)^Sx6=; `x : RUD (D 3 ) `x : INKEY; `S : Set INKEY `x ::S : Set INKEY (D 4 ) `x : v; `S k : Set INKEY; `S d : SetRUD; `Ss : Setv; dom(x;S d )^S d =;^supp(x;Ss)^Ss\S k =; `x : SID (D 5 ) `x1 : SID; `x2 : RUD; `S1;S2 : Setv LC(y;x1)^RC(y;x2)^AND(y) ^supp(x1;S1)^supp(x2;S2)^S1\S2=; `y : SID (D 6 ) `x1 : SID; `x2 : RUD; `S1;S2 : Setv LC(y;x1)^RC(y;x2)^OR(y)^ supp(x1;S1)^supp(x2;S2)^S1\S2=; `y : SID (D 7 ) `x1 : SID; `x2 : SID; `S1;S2 : Setv LC (y;x1)^RC(y;x2)^ supp(x1;S1)^supp(x2;S2)^S1\S2=; `y : SID (D 8 ) `x1 : SID; `x2 : SID; `S1 : SetRUD; `S2 : Setv; AND(y)^LC(y;x1)^RC(y;x2)^dom(x1;S1)^supp(x2;S2)^S1\S26=; `y : SID (D 9 ) `x1 : SID; `x2 : SID; `S1 : SetRUD; `S2 : Setv; OR(y)^LC (y;x1)^RC(y;x2)^ dom(x1;S1)^supp(x2;S2)^S1=S26=; `y : SID (D a ) `x1 : RUD; `x2 : RUD; `S1 : SetRUD; `S2 : Setv; AND (y)^LC (y;x1)^RC (y;x2)^ dom(x1;S1)^supp(x2;S2)^S2=S16=; `y : SID (D b ) `x1 : RUD; `x2 : RUD; `S1 : SetRUD; `S2 : Setv; OR (y)^ LC (y;x1)^RC (y;x2)^dom(x1;S1)^supp(x2;S2)^S2=S16=; `y : SID (D c ) `x : RUD `x : NOUKD (D d ) `x : SID `x : NOUKD (D e ) `x :v; NOT (y)^LC (y;x) `y :v (D f ) `x : bool; x=true `x : SID (D 10 ) `x : bool; x=false `x : SID (D 11 ) `x : v; `S k : Set INKEY; `S : Setv; supp(x;Ss)^Ss\S k =; `x : NOUKD (D 12 ) `x1 : RUD; `x2 : RUD; `S1;S2 : SetRUD; LC (y;x1)^RC (y;x2)^MUL(y)^ (y)^dom(x1;S1)^dom(x2;S2)^S2=S16=; `y : SID (D 13 ) `x1 : RUD; `x2 : SID; `S1 : SetRUD ` S2 : Setv; LC (y;x1)^RC (y;x2) MUL (y)^ dom(x1;S1)^supp(x2;S2)^S1=S26=; `y : SID (D 14 ) `x1 : SID; `x2 : RUD; `S1 : SetRUD; S2 : Setv; LC (y;x1)^RC (y;x2)^MUL (y)^dom(x1;S1)^supp(x2;S2)^S2=S16=; `y : SID (D 15 ) Figure 3.10: Proof rules for distribution types, gathered from prior works [227, 77, 20]. Here,v denotes the type of variablex,and is of the following types: UKD,SID andRUD. NOUKD denotes the secure type (either RUD orSID). All the predened relations inKB are the same as in. 34 countermeasures, which also provided the soundness proofs of these rules. For brevity, we omit the detailed explanation. Instead, we use RuleD 2:3 as an example to illustrate the rationale behind these proof rules. In RuleD 2:3 , thedom(x,S) relation meansx is masked by some input from the setS of random inputs. For example, iny =x 1 x 2 , wherex 1 =kr 1 r 2 andx 2 =br 2 , we say thatx 2 is masked byr 2 , andx 1 is masked by bothr 1 andr 2 . However, sincer 2 r 2 =false,y is masked only byr 1 . Thus,dom(y, {r 1 }) holds, butdom(y;fr 2 g) does not hold. In this sense, RuleD 2:3 denes amaskingset. Fory, it isS y = ({r 1 ,r 2 }[ {r 2 })n ({r 1 ,r 2 }\ {r 2 }) = {r 1 }, which containsr 1 only. The masking set dened byD 2:3 is useful in that, as long as the set is not empty, the corresponding variable is guaranteed to be of theRUD type. 3.4.3 ProvingtheSoundnessofUsingKB To prove that for every AST nodex marked asRUD (x) (resp. SID (x)) by, it is also marked asRUD KB (x) (resp. RUD KB (x)) byKB, we show that the following relationInd(x) is empty for any valuation of the input relations: Ind(x) (x)^: KB (x); (3.7) where the relation may be instantiated to eitherRUD orSID. In theory, this amounts to provingquery containment, which is undecidable for Datalog in general [48, 46], but there are Datalog fragments, such as UC2RPQ, for which it is decidable [48, 47, 19] and our meta-rules in Fig. 3.7 were designed specically to produce rules within this fragment. In the remainder of this section, we present a semi-decision procedure to prove the emptiness ofInd(x). Derivation trees, and unrolling a Datalog program First, we observe that every tuple t = (x) produced by a Datalog program is associated with one or more derivation trees. The heights of these 35 derivation trees correspond to the depth of rule inlining at which the program discoverst. In particular, for each inlining depthk2N, each rule h (x h ) 1 (x 1 )^ 2 (x 2 )^^ n (x n ) can be transformed to (k+1) (x h ) (k) 1 (x 1 )^ (k) 2 (x 2 )^^ (k) n (x n ); (3.8) where each (k) contains exactly those tuples (k) (x) which have a derivation tree of depthk. Observe that = S k2N (k) . Provingentailmentateachunrollingdepth. Our insight is to prove that at each unrolling depthk, (k) (k) KB . In other words, we dene the relationInd (k) as follows: Ind (k) (x) (k) (x)^: (k) KB (x); (3.9) and extrapolate from the emptiness ofInd (k) for eachk: Proposition3.4.1. IfInd (k) (x) isan emptyrelation foreach inliningdepthk2N,thenInd(x) isalso an empty relation. Proof. Assume otherwise, so theInd relation contains some AST nodex. By denition then, the candidate analysis derives (x), say at proof tree depthl, whileKB does not derive KB (x). It follows that (l) also containsx, and that (l) KB is an empty relation. We conclude thatx is an element ofInd (l) , which contradicts our hypothesis thatInd (l) was empty. Note 3.4.2. The converse of the above proposition need not hold. In particular, it may be the case that Ind(x) is empty, even thoughInd (k) is inhabited. This is a curious consequence of connecting the inlining depths of and KB in Equation 3.9: there may be a tuple (k) (x) which is derived byKB at some other inlining depthk 0 6=k. In that case, sincex would then be absent from (k) KB (x), it would inhabitInd (k) (x), but not occur inInd(x). 36 Specically, we prove the emptiness ofInd(x) byk-induction [189, 72, 87]. Observe that unrolling the rules of a Datalog program to any specic depth yields a formula which can be interpreted within propositional logic. For example, unrollingf(x) from Equation 3.3 at depths 1 and 2 gives us f (1) (x) = INRAND(x); and f (2) (y) = (LC(y;x)^f 1 (x))_ (RC(y;x)^f 1 (x)): For any specic value ofk, we can therefore use an SMT solver to verify the emptiness ofInd (k) for all k = 1; 2;:::. For the inductive step, in particular, we ask the SMT solver to determine ifInd (i+k) can be non-empty while thek preceding relationsInd (i) ,Ind (i+1) ,Ind (i+2) , ...,Ind (i+k1) are all empty. For this purpose, letV (i) be free variables introduced by unrolling the rules at depthi. The following formula asserts the non-emptiness ofInd (i) : (i) = _ x2V (i) Ind (i) (x): (3.10) Thus, we formalize the inductive step of the proof by constructing the following formula: (k) =: (i) ^: (i+1) ^^: (i+k1) ^ (i+k) (3.11) The following proposition formalizes our intuition: Proposition3.4.3. Ifforsomek2N,therelationsInd (1) ,Ind (2) ,...,Ind (k) areallempty,andtheformula (k) is unsatisable, thenInd (i) is empty for alli2N. 37 Starting fromk = 1, we use the SMT solver to check Proposition 3.4.3 for increasingly largerk until a timeout is reached. If the SMT solver is ever successful in proving the proposition, it follows that the learned rule is sound. 3.4.4 GeneratingAbstractCounter-Examples When the proof fails, however, we need to prevent the same rule from being learned again. Let = ff 1 = v 1 ;f 2 = v 2 ;:::;f k = v k g be the feature valuation in the failing ruleR . We then construct the counter-example, CE =ff7!vj (f;v)2g[ff7!1jf2Fng (3.12) with labelUKD(CE ). Recall thatF is the set of all features currently under consideration. Therefore, the feedbackCE provided toDTL(E;F) is an abstract counter-example, with the missing featuresf2Fn set to the unknown value1. Consider the subsequent iteration of the decision tree learner, DTL(E[fCE g;F). Observe that whenever it is in a decision context which is also a prex pre of the counter-exampleCE , the information gain of each featuref2 is strictly less than that encountered in the previous invocation. Therefore, at some level of the decision tree, it will either choose a dierent feature, or invoke the feature synthesis algorithm to growF. By formalizing this argument, we say that: Proposition 3.4.4. Given a counter-exampleCE to a learned ruleR , the subsequent invocation of the learnerDTL(E[fCE g;F) is guaranteed to no longer produceR . We stress that the proof rules inKB should not be confused with analysis rules used in the learned analyzer, since they are way more computationally expensive. Consider RuleD 1:4 , whose Datalog encoding size forsupp(x;S) would bejVj 2 jINj . For the benchmark named B19 in Table 4.2, it owns 1250 input 38 variables and thereby causing the exponential explosion with 2 1250 . The learned rule, in contrast, does not rely on these expensive set (union and intersection) operations. 3.5 Experiments Our experiments were designed to answer the following research questions (RQs): • RQ1: How eective is our learned analyzer in terms of the analysis speed and accuracy? • RQ2: How eective is our method for learning inference rules from training data? • RQ3: How eective is our method for proving the learned inference rules? We implementedGPS in LLVM 3.6. GPS relies on LLVM to parse the C programs and construct the internal representation (IR). Then, it learns a static analyzer in two steps. The rst step, which is SyGuS-guided decision tree learning, is implemented in 4,603 lines of C++ code. The second step, which proves the learned inference rules, is implemented using the Z3 [66] SMT solver. Furthermore, the learned analyzer (for detecting power side channels in cryptographic software) is implemented in LLVM as an optimization (opt) pass. We ran all experiments on a computer with 2.9 GHz Intel Core i5 CPU and 8 GB RAM. Our artifacts, includingGPS and all benchmarks, will be published with the paper. 3.5.1 Benchmarks Our benchmarks are 568 programs with 2,691K lines of C code in total. They implement well-known cryptographic algorithms such as AES and SHA-3. Some of these programs are hardened by countermeasures, such as reordered MAC-Keccak computation [32], masked AES [38, 20], masked S-box calculation [63] and masked multiplication [174], to eliminate side-channel information leaks. 39 Table 3.1: Statistics of the benchmark programs inD test . Name LoC I pub I priv I rand Name LoC I pub I priv I rand B1 11 0 2 2 B2 12 0 2 2 B3 12 0 1 2 B4 25 1 1 3 B5 25 1 1 3 B6 32 1 1 3 B7 81 1 1 7 B8 84 1 1 7 B9 104 1 1 7 B10 964 1 16 32 B11 1,130 1 16 32 B12 1,256 0 25 75 B13 2,506 0 25 125 B14 3,764 0 25 175 B15 8,810 0 25 349 B16 13,810 0 25 575 B17 18,858 0 25 775 B18 23,912 0 25 975 B19 30,228 0 25 1,225 B20 34,359 16 16 1,232 B21 79 0 16 16 B22 67 0 8 16 B23 21 0 2 2 B24 23 0 2 2 B25 27 0 1 2 B26 32 0 2 2 B27 40 0 2 3 B28 59 0 3 4 B29 60 0 3 4 B30 66 0 3 4 B31 66 0 3 4 B32 426k 288 288 3205 B33 426k 288 288 3205 B34 426k 288 288 3205 B35 429k 288 288 3205 B36 426k 288 288 3205 B37 442k 288 288 3205 We partition the benchmarks into two sets:D train forGPS, andD test for the learned analyzer. The training setD train consists of 531 small programs gathered from various public sources, including byte- masked AES [225], random reduction of S-box [229], common shares [62], and leak examples [77]. Each benchmark is a pair consisting of program AST and its distribution type, i.e, the ground truth annotated by developers. The testing setD test consists of 37 large programs, whose statistics (the number of lines of code and inputs labeled public, private, and random) are shown in Table 4.2. Since these programs are large, it is no longer practical to manually annotate the ground truth; instead, we relies on the results of published tools: a (manually-crafted) static analyzer [210] for B1-B20 and a formal verication tool [227] for B21-B37. 3.5.2 PerformanceandAccuracyoftheLearnedAnalyzer To demonstrate the advantage of our learned analyzer (answer RQ1), we compared our learned analyzer with the two existing tools [227, 210] on the programs inD test . Only our analyzer can handle all of the 37 programs (we encountered errors while running the existing tools on benchmarks from other source), therefore, we compared the results of our analyzer with the tool from [210] on B1-B20, and with the tool from [227] on B21-B37. The results are shown in Table 3.2 and Table 3.3, respectively. 40 Table 3.2: Comparing the learned analyzer with the tool from [210]. Name # AST Manually Designed Analyzer [210] Our Learned Analyzer Time (s) UKD SID RUD Time (s) UKD SID RUD B1 7 0.061 4 0 22 0.001 4 0 22 B2 6 0.105 7 0 20 0.001 6 1 20 B3 8 0.099 1 3 31 0.001 1 3 31 B4 11 0.208 6 12 31 0.001 17 12 20 B5 11 0.216 1 10 29 0.001 11 10 19 B6 14 0.276 1 15 48 0.001 8 15 41 B7 39 0.213 2 25 151 0.002 2 25 151 B8 39 0.147 4 42 249 0.002 4 42 249 B9 47 0.266 2 61 153 0.001 2 61 153 B10 522 0.550 31 12 2334 0.008 31 12 2334 B11 522 0.447 31 0 2334 0.029 31 0 2334 B12 426 0.619 52 300 2062 0.001 52 300 2062 B13 827 1.102 49 600 4030 0.006 49 600 4030 B14 1,228 1.998 49 900 5995 0.065 49 900 5995 B15 2,832 16.999 49 2,100 13861 0.107 49 2,100 13861 B16 4,436 24.801 49 3,300 21,723 2.663 49 3,300 21,723 B17 6,040 59.120 49 4,500 29,587 1.956 49 4,500 29,587 B18 7,644 121.000 47 5,700 37,449 3.258 47 5,700 37,449 B19 9,649 202.000 49 7200 47,280 5.381 49 7200 47,280 B20 13,826 972.000 127 26,330 38,070 3.650 127 26,330 38,070 In both tables, Columns 1-2 show the benchmark name and the number of AST nodes. Columns 3-6 show the existing tool’s analysis time and result, including a breakdown in the three types. Similarly, Columns 7-10 show our learned analyzer’s time and result. Note that in [210], theUKD/SID/RUD numbers they collected were the number of variables of the LLVM IR, and thus larger than the number of variables in the original programs. To be consistent, we compared with their results in the same manner in Table 3.2. Table 3.2 and Table 3.3 contain the results of comparing our tool with the state-of-the-art tools [210] and [227] respectively, where our learned analyzer is much faster, especially on larger programs such as B20 (3.6 seconds versus 16 minutes). The reason why our analyzer is faster is because the manually-crafted analyses [210, 227] rely on evaluating set-relations (e.g. dierence and intersection of sets of random variables), whereas our DSL syntax is designed without set operations to infer the same types, thus leading to a faster analysis. Although in general the set operation-based algorithm is more accurate, it has an extremely large computational overhead. Moreover, in practice, it does not always improve precision. Furthermore, the method in [227] uses an SMT solver based model counting technique to infer leak-free variables, which is signicantly more expensive than type inference. 41 Table 3.3: Comparing the learned analyzer with SCInfer [227]. Name # AST The SCInfer Verication Tool [227] Our Learned Analyzer Time (s) UKD SID RUD Time (s) UKD SID RUD B21 32 0.390 16 0 16 0.005 16 0 16 B22 24 0.570 8 0 16 0.002 8 0 16 B23 6 0.010 0 0 6 0.001 0 0 6 B24 6 0.060 0 0 6 0.001 0 0 6 B25 8 0.250 0 2 6 0.001 0 2 6 B26 9 0.160 2 3 4 0.002 2 3 4 B27 11 0.260 1 5 5 0.001 1 5 5 B28 18 0.290 3 4 11 0.003 3 4 11 B29 18 0.230 2 4 11 0.002 2 4 12 B30 28 0.340 2 6 20 0.001 8 0 20 B31 28 0.500 2 7 19 0.001 2 7 19 B32 197k 3.800 0 6.4k 190.4k 3.180 0 6.4k 190.4k B33 197k 2,828.000 4.8k 6.4k 185.6k 3.260 4.8k 6.4k 185.6k B34 197k 2,828.000 3.2k 6.4k 187.2k 3.170 3.2k 6.4k 187.2k B35 198k 2,828.000 1.6k 8k 188.8k 3.140 3.2k 8k 187.2k B36 197k 2,828.000 4.8k 6.4k 185.6k 3.150 4.8k 6.4k 185.6k B37 205k 2,828.000 17.6k 1.6k 185.6k 3.820 17.6k 1.6k 185.6k As shown in Table 3.2 and Table 3.3, by learning inference rules from data, we can achieve almost the same accuracy as manual analysis [210, 227] while avoiding the huge overhead. Given the same denitions of distribution types (UKD,SID andRUD), both our learned rules and manual analysis rules [210, 227] can infer the non-leaky patterns, thus recognizing the variable types correctly under most benchmarks in Table 3.2 and Table 3.3, except for B4-B6 and B30, where set operations are required to prove the leak- freedom of some variables. Recall that losing accuracy here indicates that our learned rules infer the types more conservatively, without losing soundness. Nevertheless, our analyzer also increased accuracy in some other cases (e.g., B2), due to its deeper constant propagation (which led to the proof of moreSID variables) while the existing tool [210] failed to do so, and conservatively marked them asUKD variables. 3.5.3 EectivenessofRuleInductionandSoundnessVerication To answer RQ2 and RQ3, we collected statistics while applyingGPS to the 531 small programs inD test , as shown in Table 3.4. In total,GPS took 30 iterations to complete the entire learning process. Column 1 shows the iteration number and Column 2 shows the time taken by the learner and the prover together. 42 Table 3.4: Decision Tree Learning with Feature Synthesis (Dierent Iterations with #AST = 531). Iteration Time (s) # Rules Learned # Rules Veried # Tree learn # AST CEX # Feature syn Total UKD SID RUD Total UKD SID RUD 1 1.316 9 2 2 5 5 2 1 2 23 4 5 2 0.775 8 2 2 4 4 2 1 1 17 9 7 3 1.115 8 2 2 4 5 2 2 1 24 13 9 4 0.511 8 2 2 4 5 2 1 2 18 18 10 5 0.513 8 2 2 4 7 2 2 3 27 21 11 6 0.537 8 2 2 4 6 2 2 2 24 25 12 7 0.510 8 2 2 4 6 2 2 2 26 29 13 8 0.512 8 2 2 4 6 2 2 2 28 33 14 9 0.511 8 2 2 4 6 2 2 2 30 37 15 10 0.524 8 2 2 4 5 2 2 1 32 41 16 11 0.546 8 2 2 4 4 2 2 0 34 45 17 12 0.556 8 2 2 4 4 2 2 0 36 49 18 13 0.550 8 2 2 4 5 2 2 1 38 53 19 14 0.540 8 2 2 4 6 2 2 2 40 57 20 15 0.542 8 2 2 4 4 2 2 0 42 61 21 16 0.552 8 2 2 4 6 2 2 2 44 65 22 17 0.577 8 2 2 4 5 2 2 1 46 69 23 18 0.598 8 2 2 4 6 2 2 2 48 73 24 19 0.571 8 2 2 4 6 2 2 2 50 77 25 20 0.673 8 2 2 4 5 1 2 2 52 82 26 21 0.526 8 2 2 4 3 1 2 0 54 87 27 22 0.525 8 3 2 3 6 3 2 1 35 91 27 23 0.697 9 3 2 4 7 2 2 3 37 93 27 24 0.700 9 3 2 4 8 2 2 4 38 95 28 25 0.691 7 2 2 3 6 1 2 3 36 97 29 26 0.707 7 2 2 3 6 1 2 3 37 99 30 27 0.716 7 2 2 3 6 1 2 3 38 101 31 28 0.540 7 2 2 3 6 1 2 3 39 102 32 29 0.534 7 2 2 3 6 1 2 3 39 103 32 30 0.528 7 2 2 3 7 2 2 3 39 104 32 TOTAL 18.693 237 63 60 114 167 54 57 56 1071 1833 622 Columns 3-6 show the number of inference rules learned during the iteration, together with their types (UKD,SID, andRUD). Similarly, Columns 7-10 show the number of veried inference rules. The next two columns show the following statistics: (1) the size of the learned decision tree (# Tree learn ) in terms of the number of decision nodes; (2) the number of counter-examples (CEX) added by the prover (# AST CEX ); they are added to the 531 original training programs before the next iteration starts. The last column shows the number of features generated by SyGuS; these features are added to the original feature set and then used by the learner during the next iteration. The results demonstrate the eciency of both the learner and the prover. Within thelearner, the number of rules produced in each iteration remains modest (8 on average), indicating it has successfully avoided overtting. This is because the SyGuS solver is biased towards producing small features which, by Occam’s 43 razor, are likely to generalize well. Furthermore, any learned analysis rules have to pass the soundness check, and this provides additional assurance against overtting to the training data. The prover either quickly veries a rule, or quickly drops after adding a counter-example to prevent it from being learned again. In early iterations, about half of all learned rules can be proved, but as more counter-examples are added, the quality of the learned rules improves, and thus the percentage of proved rules also increases. 3.5.4 ThreatstoValidity Our experimental evaluation focused on cryptographic software, which is structurally simple and, unlike general-purpose software, does not exercise complicated language constructs (data-dependent loop bounds, recursion, non-termination, etc.). It is an interesting direction of future work to extend our techniques to these more general classes of software code. A notable limitation in our work is the assumption of the knowledge base (KB). While the KB is readily available for our application domain (side-channel analysis), for other applications, it might be non-trivial to construct. Furthermore, an incorrect KB might compromise the soundness of the learned rules, although in this work, we have carefully mitigated this threat by curating the proof rules from previous papers [227, 77, 20] that have themselves formally veried the validity of these proof rules. 3.6 Summary I have presented a data-driven method for learning aprovablysound static analyzer, to detect power side channels in cryptographic software. It relies on SyGuS to generate new features and DTL to generate type-inference rules based on the synthesized features. It veries the soundness of the learned analysis rules by solving a Datalog query containment checking problem using an SMT solver. I have implemented and evaluated our method on C programs that implement well-known cryptographic protocols and algorithms. The experimental results show that the learning algorithm is ecient and the learned analyzer can achieve 44 the same empirical accuracy as state-of-the-art, hand-crafted analysis tools while being several orders-of- magnitudes faster. 45 Chapter4 MitigatingSide-channelLeaks Cryptography is an integral part of many security protocols, which in turn are used by numerous appli- cations. However, despite the strong theoretical guarantee, cryptosystems in practice are vulnerable to side-channel attacks when non-functional properties such as timing, power and electromagnetic radiation are exploited to gain information about sensitive data [121, 52, 145, 59, 169, 144, 44, 231, 201, 112]. For example, if the power consumption of an encryption device depends on the secret key, techniques such as dierential power analysis (DPA) may be used to perform attacks reliably [120, 59, 44, 143, 146]. Although there are methods for mitigating power side channels [228, 77, 78, 33, 6, 30, 5], they focus exclusively on the Boolean level, e.g., by targeting circuits or software code converted to a bit-level representation. This limits their usage; as a result, none of them was able to t into modern compilers such as GCC and LLVM to directly handle the word-level intermediate representation (IR). In addition, code transformations in compilers may add new side channels, even if the input program is equipped with state-of-the-art countermeasures. Specically, compilers use a limited number of the CPU’s registers to store a potentially-large number of intermediate computation results of a program. When two masked and hence de-sensitized values are put into the same register, themasking countermeasure may be removed accidentally. We will show, as part of this work, that even provably-secure techniques such as high-order masking [20, 21, 16] are vulnerable 46 User-specied Input Annotation Variable to Register Map LLVM BitCode Program P Info. Datalog Type Checking Domain- Specic Optimization Detection LLVM Backend Modication Register Allocation Mitigation Leakage-free Assembly Figure 4.1: Overview of our secure compilation method to such leaks. Indeed, we have found leaks in the compiled code produced by LLVM for both x86 and MIPS/ARM platforms, regardless of whether the input program is equipped with high-order masking. To solve the problem, we propose a secure compilation method with two main contributions. First, we introduce a type-inference system to soundly and quickly detect power side-channel leaks. Second, we propose a mitigation technique for the compiler’s backend to ensure that, for each pair of intermediate variables that may cause side-channel leaks, they are always stored in dierent registers or memory locations. Figure 4.1 illustrates our method, which takes a programP as input and returns the mitigated code as output. It has two steps. First, type inference is used to detect leaks by assigning each variable adistribution type. Based on the inferred types, we check each pair (v 1 ;v 2 ) of variables to see if they may cause leaks when stored in the same register. If the answer is yes, we constrain the compiler’s register allocation modules to ensure thatv 1 andv 2 are assigned to dierent registers or memory locations. Our method diers from existing approaches in several aspects. First, it specically targets power side-channel leaks caused by reuse of CPU registers in compilers, which have been largely overlooked by prior work. Second, it leverages Datalog, together with a number of domain-specic optimizations, to achieve high eciency and accuracy during leak detection. Third, mitigation leverages the existing production-quality modules in LLVM to ensure that the compiled code is secure by construction. Unlike existing techniques that translate the input program to a Boolean representation, our method works directly on the word-level IR and thus ts naturally into modern compilers. For each program 47 variable, the leak is quantied using the well-known Hamming Weight (HW) and Hamming Distance (HD) leakage models [136, 135]. Correlation between these models and leaks on real devices has been conrmed in prior work (see Section 4.1). We also show, via experiments, that leaks targeted by our method exist even in program equipped with high-order masking [20, 21, 16]. To detect leaks quickly, we rely on type inference, which models the input program using a set of Datalog facts and codies the type inference algorithm in a set of Datalog rules. Then, an o-the-shelf Datalog solver is used to deduce new facts. Here, a domain-specic optimization, for example, is to leverage the compiler’s backend modules to extract a map from variables to registers and utilize the map to reduce the computational overhead, e.g., by checking pairs of some (instead of all) variables for leaks. Our mitigation in the compiler’s backend is systematic: it ensures that all leaks detected by type inference are eliminated. This is accomplished by constraining register allocation modules and then propagating the eect to subsequent modules, without having to implement any new backend module from scratch. Our mitigation is also ecient in that we add a number of optimizations to ensure that the mitigated code is compact and has low runtime overhead. While our implementation focuses on x86, the technique itself is general enough that it may be applied to other instruction set architectures (ISAs) such as ARM and MIPS as well. We have evaluated our method on a number of cryptographic programs [20, 30], including well-known ciphers such as AES and MAC-Keccak. These programs are protected by masking countermeasures but, still, we have detected leaks in the LLVM compiled code. In contrast, the compiled code produced by our mitigation technique, also based on LLVM, is always leak free. In terms of runtime overhead, our method outperformed existing approaches such as high-order masking: our mitigated code not only is more secure and compact but also runs faster than code mitigated by high-order masking techniques [20, 21]. To summarize, this paper makes the following contributions: 48 • We show that register reuse implemented in modern compilers introduces new side-channel leaks even in software code already protected by masking. • We propose a Datalog based type inference system to soundly and quickly detect these side-channel leaks. • We propose a mitigation technique for the compiler’s backend modules to systematically remove the leaks. • We implement the method in LLVM and show its eectiveness on a set of cryptographic software programs. The remainder of this paper is organized as follows. First, we illustrate the problem and the technical challenges for solving it in Section 4.1. Next, we present our method for leak detection in Section 4.2 and leak mitigation in Section 4.3, followed by domain-specic optimizations in Section 4.4. We present our experimental results in Section 4.5 and give our conclusions in Section 4.6. 4.1 Motivation We use examples to illustrate why register reuse may lead to side-channel leaks and the challenges for removing them. 4.1.1 TheHWandHDLeaks Consider the program Xor() in Figure 4.2, which takes the public txt and the secret key as input and returns the Exclusive-OR of them as output. Since logical 1 and 0 bits in a CMOS circuit correspond to dierent leakage currents, they aect the power consumption of the device [135]; such leaks were conrmed by prior works [146, 44] and summarized in the Hamming Weight (HW) model. In programXor(), variablet has a power side-channel leak because its register value depends on the secret key. 49 //’txt’: PUBLIC, ’key’: SECRET and ’t’ is HW- sensitive uint32 Xor(uint32 txt, uint32 key) {uint32 t = txt ^ key; return t;} //random variable ’mask1’ splits ’key’ to secure shares {mask1,mk} uint64 SecXor(uint32 txt, uint32 key, uint32 mask1) { uint32 mk = mask1 ^ key; // mask1^key uint32 t = txt ^ mk; // txt^(mask1^key) return (mask1,t); } //’mask1’ splits ’key’ to shares {mask1,mk} a priori //’mask2’ splits the result to shares {mask2,t3} before return uint64 SecXor2(uint32 txt, uint32 mk, unit32 mask1, unit32 mask2) { uint32 t1 = txt ^ mk; // txt^(mask1^key) uint32 t2 = t1 ^ mask2; // (txt^mask1^key)^mask2 unit32 t3 = t2 ^ mask1; // (txt^mask1^key^mask2)^ mask1 return {mask2,t3}; } Name Approach HW-Sensitive HD-Sensitive Xor No Masking 3 3 SecXor First Order Masking 7 3 SecXor2 Specialized Hardware & Masking 7 3 Figure 4.2: Implementations of an XOR computation in the presence of HW and HD power side-channel leaks. The leak may be mitigated bymasking [94, 5] as shown in programSecXor(). The idea is to split a secret ton randomized shares before using them; unless the attacker has alln shares, it is theoretically impossible to deduce the secret. In rst-order masking, the secret key may be split to {mask1,mk} where mask1 is a random variable, mk=mask1key is the bit-wise Exclusive-OR of mask1 and key, and thus mask1mk=key. We say that mk is masked and thus leak free because it is statistically independent of the value of key: if mask1 is a uniform random number then so is mk. Therefore, when mk is aggregated over time, as in side-channel attacks, the result reveals no information of key. Unfortunately, there can be leaks in SecXor() when the variables share a register and thus create second-order correlation. For example, the x86 assembly code of mk=mask1key is MOV mask1 %edx; XOR key %edx, meaning the values stored in%edx are mask1 and mask1key, respectively. Since bit-ips in the register also aect the leakage current, they lead to side-channel leaks. This is captured by the Hamming Distance (HD) power model [44]: HD(mask1,mask1key) = HW(mask1 (mask1 key)) = HW(key), which reveals key. Consider, for example, where key is 0001 b and mask1 is 1111 b in binary. If 50 a register storesmask1 (=1111 b ) rst and updates its value asmask1key (=1110 b ), the transition of the register (bit-ip) is 0001 b , which is same as the key value. In embedded systems, specialized hardware [179, 132, 11] such as physically unclonable function (PUF) and true random number generator (TRNG) may produce key andmask1 and map them to the memory address space; thus, these variables are considered leak free. Specialized hardware may also directly produce the masked shares{mask1,mk} without producing the unmaskedkey in the rst place. This more secure approach is shown in programSecXor2(), where masked shares are used to compute the result (txtkey), which is also masked, but by mask2 instead of mask1. Inside SecXor2(), care should be taken to randomize the intermediate results by mask2 rst, before de-randomizing them bymask1. Thus, the CPU’s registers never hold any unmasked result. However, there can still be HD leaks, for example, when the same register holds the following pairs at consecutive time steps: (mask1,mk), (mask1,t1), or (mask2,t3). 4.1.2 IdentifyingtheHDLeaks To identify these leaks, we need to develop a scalable method. While there are techniques for detecting aws in various masking implementations [63, 110, 21, 24, 74, 37, 94, 38, 184, 49, 174, 168, 77, 171], none of them was scalable enough for use in real compilers, or targeted the HD leaks caused by register reuse. First, we check if there are sensitive, unmasked values stored in the CPU registers. Here, mask means a value is made statistically independent of the secret using randomization. We say a value is HW-sensitive if, statistically, it depends on the secret. For example, in Figure 4.2,key is HW-sensitive whereasmk=mask1key is masked. If there were nk=mask1_key, it would be HW-sensitive because the masking is not perfect. Second, we check if there is any pair of values (v 1 ;v 2 ) that, when stored in the same register, may cause an HD leak. That is,HD(v 1 ;v 2 ) =HW (v 1 v 2 ) may statistically depend on the secret. For example, in Figure 4.2, mk and mask1 form a HD-sensitive pair. 51 FormalVerication Deciding if a variable is HW-sensitive, or two variables are HD-sensitive, is hard in general, since it corresponds to model counting [228, 78]. This can be illustrated by the truth table in Table 4.1 for functions t1, t2 and t3 over secret bit k and random bits m1, m2 and m3. First, there is no HW leak because, regardless of whether k=0 or 1, there is a 50% chance of t1 and t2 being 1 and a 25% chance of t3 being 1. This can be conrmed by counting the number of 1’s in the top and bottom halves of the table. When two values (t1;t2) are stored in the same register, however, the bit-ip may depend on the secret. As shown in the columnHD(t1;t2) of the table, whenk = 0, the bit is never ipped; whereas whenk = 1, the bit is always ipped. The existence of HD leak for (t1;t2) can be decided by model counting over the functionf t1t2 (k;m1;m2;m3): the number of solutions is 0/8 fork = 0 but 8/8 fork = 1. In contrast, there is no HD leak for (t2;t3) because the number of satisfying assignments (solutions) is always 2/8 regardless of whetherk = 0 ork = 1. TypeInference Since model counting is expensive, we develop a fast, sound, and static type system to identify the HD-sensitive pairs in a program. Following Zhang et al. [228], we assign each variable one of three types: RUD,SID orUKD (details in Section 2). Briey,RUD means random uniform distribution,SID means secret independent distribution, andUKD means unknown distribution. Therefore, a variable may have a leak only if it is theUKD type. In Table 4.1, for example, givent1 m1m2, wherem1 andm2 are random (RUD), it is easy to see thatt1 is also random (RUD). Fort3 t2^m3, wheret2,m3 areRUD, however,t3 may not always be random, but we can still prove thatt3 isSID; that is,t3 is statistically independent ofk. This type of syntactical inference is fast because it does not rely on any semantic information, although in general, it is not as accurate as the model counting based approach. Nevertheless, such inaccuracy does not aect the soundness of our mitigation. Furthermore, we rely on a Datalog based declarative analysis framework [217, 230, 124, 219, 41] to implement and rene the type inference rules, which can inferHD(t2;t3) asSID instead ofUKD. We also 52 Table 4.1: Truth table showing that (1) there is no HW leak in t1,t2,t3 but (2) there is an HD leak when t1,t2 share a register. k m1 m2 m3 t1= t2= t3= HD(t1,t2) HD(t2,t3) m1m2 t1k t2^m3 =t1t2 =t2t3 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 1 0 0 1 0 0 1 1 1 1 1 0 0 0 1 0 0 1 1 0 0 1 0 1 0 1 1 1 1 0 0 0 1 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 0 0 0 0 1 0 1 1 1 0 0 1 0 1 1 1 0 1 0 1 0 1 0 0 1 0 1 0 1 1 1 0 0 1 0 1 1 0 0 1 0 0 1 0 1 1 0 1 1 0 0 1 0 1 1 1 0 0 1 0 1 1 1 1 1 1 0 1 1 1 0 UKD RUD RUD RUD RUD RUD SID UKD SID* * Our Datalog based type inference rules can infer it asSID instead ofUKD leverage domain-specic optimizations, such as precomputing certain Datalog facts and using compiler’s backend information, to reduce cost and improve accuracy. 4.1.3 MitigatingtheHDLeaks To remove the leaks, we constrain the register allocation algorithm using our inferred types. We focus on LLVM and x86, but the method is applicable to MIPS and ARM as well. To conrm this, we inspected the assembly code produced by LLVM for the example (t1,t2,t3) in Table 4.1 and found HD leaks on all three architectures. For x86, in particular, the assembly code is shown in Figure 4.3a, which uses %eax to store all intermediate variables and thus has a leak in HD(t1,t2). Figure 4.3b shows our mitigated code, where the HD-sensitive variables t1 and t2 are stored in dierent registers. Here,t1 resides in %eax and memory -20(%rbp) whereast2 resides in %ecx and memory -16(%rbp). The stack and a value of %eax are shown in Figure 4.3c, both before and after mitigation, when the leak may occur at lines 8-9. Since the value of k is used only once in the example, i.e., for computing t2, overwriting its value stored in the original memory location -16(%rbp) does not aect subsequent execution. Ifk were to be used later, our method would have made a copy in memory and direct uses ofk to that memory location. 53 1 // assembly for Table1 2 movl %edi, -4(%rbp) 3 movl %esi, -8(%rbp) 4 movl %edx, -12(%rbp) 5 movl %ecx, -16(%rbp) 6 movl -4(%rbp), %eax 7 xorl -8(%rbp), %eax 8 movl %eax, -20(%rbp) 9 xorl -16(%rbp), %eax 10 movl %eax, -24(%rbp) 11 andl -12(%rbp), %eax 12 movl %eax, -28(%rbp) 13 14 popq %rbp (a) Before Mitigation 1 // assembly for Table1 2 movl %edi, -4(%rbp) 3 movl %esi, -8(%rbp) 4 movl %edx, -12(%rbp) 5 movl %ecx, -16(%rbp) 6 movl -4(%rbp), %eax 7 xorl -8(%rbp), %eax 8 movl %eax, -20(%rbp) 9 xorl %eax, -16(%rbp) 10 movl -16(%rbp), %ecx 11 andl -12(%rbp), %ecx 12 movl %ecx, -28(%rbp) 13 movl -28(%rbp), %eax 14 popq %rbp (b) After Mitigation stack ... m1 m2 m3 key m1m2 ... -4(%rbp) -8(%rbp) -12(%rbp) -16(%rbp) -20(%rbp) %eax m1m2 Afterexecutingline8 stack ... key m1m2 ... -16(%rbp) -20(%rbp) %eax m1m2key Before Mitigation (after executing line 9) stack ... m1m2key m1m2 ... -16(%rbp) -20(%rbp) %eax m1m2 After Mitigation (after executing line 9) HD=key(leak) HD = 0 (c) Diagram for stack and register %eax Figure 4.3: The assembly code before and after mitigation. Register allocation in real compilers is a highly optimized process. Thus, care should be taken to maintain correctness and performance. For example, the naive approach of assigning all HD-sensitive variables to dierent registers does not work because the number of registers is small (x86 has 4 general- purpose registers while MIPS has 24) while the number of sensitive variables is often large, meaning many variables must be spilled to memory. The instruction set architecture also add constraints. In x86, for example, %eax is related to %ah and %al and thus cannot be assigned independently. Furthermore, binary operations such as Xor may require that the result and one operand share the same register or memory location. Therefore, for mk=mask1key, it means that either mk and mask1 share a register, which causes a leak in HD(mk, mask1)=HW(key), or mk and key share a register, which causes a leak in HW(key) itself. Thus, while modifying the backend, 54 multiple submodules must be constrained together to ensure the desired register and memory isolations (see Section 4.3). 4.1.4 LeaksinHigh-orderMasking Here, a question is whether the HD leak can be handled by second-order masking (which involves two variables). The answer is no, because even with high-order masking techniques such as Barthe et al. [20, 21, 24], the compiled code may still have HD leaks introduced by register reuse. We conrmed this through experiments, where the code compiled by LLVM for high-order masked programs from [20] was found to contain HD leaks. Figure 4.4 illustrates this problem on a second-order arithmetic masking of the multiplication of txt (public) andkey (secret) in a nite eld. Here, the symbol denotes multiplication. While there are a lot of details, at a high level, the program relies on the same idea of secret sharing: random variables are used to split the secretkey to three shares, before these shares participate in the computation. The result is a masked triplet (res0,res1,res2) such that (res0res1res2)=keytxt. The x86 assembly code in Figure 4.4 has leaks because the same register %edx stores both mask0 mask1 and mask0 mask1 key. Let the two values be denoted %edx 1 and %edx 2 , we have HD(%edx 1 , %edx 2 ) = HW(key). Similar leaks exist in the LLVM-generated assembly code of this program for ARM and MIPS as well, but we omit them for brevity. 4.2 Type-basedStaticLeakDetection We use a type system that starts from the input annotation (IN PUBLIC ,IN SECRET andIN RANDOM ) and computes adistributiontype for all variables. The type indicates whether a variable may statistically depend on the secret input. 55 1 uint8 SecondOrderMaskingMultiply(uint8 txt, uint8 key) { 2 int mask0, mask1, mask2, mask3, mask4, mask5, mask6; //random 3 int t1 = mask0 ^ mask1 ^ key; 4 int t2 = mask2 ^ mask3 ^ txt; 5 int t3 = (mask4 ^ mask0 * mask3) ^ mask1 * mask2; 6 int t4 = (mask5 ^ mask0 * t2) ^ t1 * mask2; 7 int t5 = (mask6 ^ mask1 * t2) ^ t1 * mask3; 8 res0 = (mask0 * mask2 ^ mask4) ^ mask5; 9 res1 = (mask1 * mask3 ^ t3) ^ mask6; 10 res2 = (t1 * t2 ^ t4) ^ t5; 11 return {res0, res1, res2}; 12 } movzbl -41(%rbp), %edx // mask0 is loaded to % edx movzbl -43(%rbp), %esi // mask1 is loaded to % esi xorl %esi, %edx // mask0^mask1 is stored to %edx (%edx 1 ) movzbl -44(%rbp), %esi // key is loaded to %esi xorl %esi, %edx // mask0^mask1^key is stored to %edx (%edx 2 ) movb %dl, %al movb %al, -50(%rbp) Figure 4.4: Second-order masking of multiplication in a nite eld, and the LLVM-generated x86 assembly code of Line 3. 4.2.1 TheTypeHierarchy The distribution type of variablev, denotedTYPE(v), may be one of the following kinds: • RUD, which stands forrandomuniformdistribution, meansv is either a random inputm2 IN RANDOM or perfectly randomized [38] bym, e.g.,v = km. • SID, which stands for secret independent distribution, means that, while not RUD,v is statistically independent of the secret variable inIN SECRET . • UKD, which stands for unknown distribution, indicates that we are not able to prove thatv isRUD or SID and thus have to assume thatv may have a leak. The three types form a hierarchy: UKD is the least desired because it means that a leak may exist. SID is better: although it may not beRUD, we can still prove that it is statistically independent of the secret, i.e., no leak. RUD is the most desired because the variable not only is statistically independent of the secret (same as inSID), but also can be used like a random input, e.g., to mask other (UKD) variables. For leak mitigation purposes, it is always sound to treat an RUD variable as SID, or an SID variable as UKD, although it may force instructions to be unnecessarily mitigated. 56 In practice, we want to infer as manySID andRUD variables as possible. For example, ifk2 IN SECRET , m2 IN RANDOM andk m =km, thenTYPE(k) = UKD andTYPE(k m ) = RUD. Ifx2 IN PUBLIC and xk m =x^k m , thenTYPE(xk m ) = SID because, althoughx may have any distribution, sincek m isRUD, xk m is statistically independent of the secret. We preferRUD overSID, when both are applicable to a variablex 1 , because ifx 1 is XOR-ed with aUKD variablex 2 , we can easily prove thatx =x 1 x 2 is RUD using local inference, as long asx 1 is RUD and x 2 is not randomized by the same input variable. However, ifx 1 is labeled not as RUD but as SID, local inference rules may not be powerful enough to prove thatx is RUD or even SID; as a result, we have to treatx asUKD (leak), which is less accurate. 4.2.2 DatalogbasedAnalysis In the remainder of this section, we present type inference for individual variables rst, and then for HD-sensitive pairs. We use Datalog to implement the type inference. Here, program information is captured by a set of relations called the facts, which include the annotation of input inIN PUBLIC (SID),IN SECRET (UKD) and IN RANDOM (RUD). The inference algorithm is codied in a set of relations called the rules, which are steps for deducing types. For example, whenz = xm andm is RUD,z is also RUD regardless of the actual expression that denesx, as long asm62supp(x). This can be expressed as an inference rule. After generating both the facts and the rules, we combine them to form a Datalog program, and solve it using an o-the-shelf Datalog engine. Inside the engine, the rules are applied to the facts to generate new facts (types); the iterative procedure continues until the set of facts reaches a xed point. Since our type inference is performed on the LLVM IR, there are only a few instruction types to consider. For ease of presentation, we assume that a variablev is dened by either a unary operator or a binary operator (n-ary operator may be handled similarly). 57 • v Uop(v 1 ), whereUop is a unary operator such as the Boolean (or bit-wise) negation. • v Bop(v 1 ;v 2 ), where Bop is a binary operator such as Boolean (or bit-wise),^,_ and (nite-eld multiplication). Forv Uop(v 1 ), we have TYPE(v) = TYPE(v 1 ), meaningv andv 1 have the same type. Forv Bop(v 1 ;v 2 ), the type depends on (1) ifBop isXor, (2) ifTYPE(v 1 ) andTYPE(v 2 ) areSID orRUD, and (3) the sets of input variables upon whichv 1 andv 2 depend. 4.2.3 BasicTypeInferenceRules Prior to dening the rules forBop, we dene two related functions,unq anddom, in addition tosupp(v), which is the set of input variables upon whichv depends syntactically. Denition 4.2.1. unq :V ! IN RANDOM is a function that returns, for each variablev2V , a subset of mask variables dened as follows: ifv2 IN RANDOM ,unq(v) =fvg; but ifv2 INnIN RANDOM ,unq(v) =fg; • ifv Uop(v 1 ),unq(v) = unq(v 1 ); and • ifv Bop(v 1 ;v 2 ),unq(v) = (unq(v 1 )[unq(v 2 ))n (supp(v 1 )\supp(v 2 )). Given the data-ow graph of all instructions involved in computingv and an input variablem2 unq(v), there must exist a unique path fromm tov in the graph. If there are more paths (or no path),m would not have appeared inunq(v). Denition 4.2.2. dom :V ! IN RANDOM is a function that returns, for each variablev2V , a subset of mask variables dened as follows: ifv2 IN RANDOM ,dom(v) =fvg, but ifv2 INnIN RANDOM , then dom(v) =fg; • ifv Uop(v 1 ),dom(v) = dom(v 1 ); and • ifv Bop(v 1 ;v 2 ), where operatorBop = Xor, thendom(v) = (dom(v 1 )[dom(v 2 ))\unq(v); else dom(v) =fg. 58 Given the data-ow graph of all instructions involved in computingv and an inputm2 dom(v), there must exist a unique path fromm tov, along which all binary operators are Xor; if there are more such paths (or no path),m would not have appeared indom(v). Following the denitions ofsupp,unq anddom, it is straightforward to arrive at the basic inference rules [158, 21, 228]: Rule1 dom(v)6=; TYPE(v) = RUD Rule2 supp(v)\INSECRET =;^TYPE(v)6= RUD TYPE(v) = SID Here, Rule 1 says ifv = mexpr, wherem is a random input and expr is not masked bym, then v has random uniform distribution. This is due to the property of XOR. Rule 2 says ifv is syntactically independent of variables inIN SECRET , it has a secret independent distribution, provided that it is notRUD. 4.2.4 InferenceRulestoImproveAccuracy With the two basic rules only, any variable not assignedRUD orSID will be treated asUKD, which is too conservative. For example,v = (km)^x wherek2 IN SECRET ,m2 IN RANDOM andx2 IN PUBLIC , is actually SID. This is because km is random and the other component, x, is secret independent. Unfortunately, the two basic rules cannot infer thatv isSID. The following rules are added to solve this problem. 59 Rule3a v Bop(v1;v2)^supp(v1)\supp(v2) =; ^Bop62fXor;GMulg^TYPE(v1) = RUD^TYPE(v2) = SID TYPE(v) = SID Rule 3b v Bop(v1;v2)^supp(v1)\supp(v2) =; ^Bop62fXor;GMulg^TYPE(v1) = SID^TYPE(v2) = RUD TYPE(v) = SID These rules mean that, for anyBop =f^;_g, if one operand isRUD, the other operand isSID, and they share no input, thenv has a secret independent distribution (SID). GMul denotes multiplication in a nite eld. Here, supp(v 1 )\ supp(v 2 ) =; is need; otherwise, the common input may cause problem. For example, ifv 1 mk andv 2 m^x, thenv = (v 1 ^v 2 ) = (m^:k)^x has a leak because ifk = 1, v = 0; but ifk = 0,v =m^x. Rule4 v Bop(v1;v2)^supp(v1)\supp(v2) =; ^TYPE(v1) = SID^TYPE(v2) = SID TYPE(v) = SID Similarly,Rule 4 may elevate a variablev fromUKD toSID, e.g., as inv ((km)^x 1 )^ (x 2 ) wherex 1 andx 2 are bothSID. Again, the conditionsupp(v 1 )\supp(v 2 ) =; inRule 4 is needed because, otherwise, there may be cases such asv ((km)^x 1 )^ (x 2 ^m), which is equivalent tov :k^ (m^x 1 ^x 2 ) and thus has a leak. Figure 4.5 shows the other inference rules used in our system. Since these rules are self-explanatory, we omit the proofs. 60 Rule5a v Bop(v1;v2)^ dom(v1)nsupp(v2) =;^TYPE(v1) = RUD^ dom(v1) = dom(v2)^supp(v1) = supp(v2) TYPE(v) = SID Rule 5b v Bop(v1;v2)^ dom(v2)nsupp(v1) =;^TYPE(v2) = RUD^ dom(v1) = dom(v2)^supp(v1) = supp(v2) TYPE(v) = SID Rule6 v Bop(v1;v2)^Bop62fXor;GMulg^TYPE(v1) = RUD^TYPE(v2) = RUD^ (dom(v1)nsupp(v2)6=;_dom(v2)nsupp(v1)6=;) TYPE(v) = SID Rule7a v Bop(v1;v2)^Bop =GMul^TYPE(v1) = RUD^TYPE(v2) = SID^dom(v1)nsupp(v2)6=; TYPE(v) = SID Rule 7b v Bop(v1;v2)^Bop =GMul^TYPE(v1) = SID^TYPE(v2) = RUD^dom(v2)nsupp(v1)6=; TYPE(v) = SID Rule8 v Bop(v1;v2)^Bop =GMul^ (dom(v1)ndom(v2)6=;_dom(v2)ndom(v1)6=;)^ TYPE(v1) = RUD^TYPE(v2) = RUD TYPE(v) = SID Figure 4.5: The remaining inference rules used in our type system (in addition toRule 14 ). 4.2.5 DetectingHD-sensitivePairs Based on the variable types, we compute HD-sensitive pairs. For each pair (v 1 ;v 2 ), we check ifHD(v 1 ;v 2 ) results in a leak whenv 1 andv 2 share a register. There are two scenarios: • v 1 expr 1 ;v 2 expr 2 , meaningv 1 andv 2 are dened in two instructions. • v 1 Bop(v 2 ;v 3 ), where the resultv 1 and one operand (v 2 ) are stored in the same register. In the two-instruction case, we checkHW (expr 1 expr 2 ) using Xor-related inference rules. For example, ifv 1 km andv 2 m, sincem appears in the supports of both expressions, (km)m is UKD. Such leak will be denotedSEN_HD D (v 1 ;v 2 ), whereD stands for “Double”. In the single-instruction case, we checkHW (Bop(v 2 ;v 3 )v 2 ) based on the operator type. When Bop =^, we have (v 2 ^v 3 )v 2 =v 2 ^:v 3 ; whenBop =_, we have (v 2 _v 3 )v 2 = (:v 2 ^v 3 ); when Bop = (Xor), we have (v 2 v 3 )v 2 =v 3 ; and whenBop = (GMul), the result of (v 2 v 3 )v 2 is fv 2 ;v 3 g ifv 2 v 3 6= 0x01 and is (v 2 0x01) otherwise. Since the type inference procedure is agnostic to the result of (v 2 v 3 ), the type of (v 2 v 3 )v 2 depends on the types ofv 3 andv 2 ; that is,TYPE(v 2 ) = 61 UKD_ TYPE(v 3 ) = UKD =) TYPE((v 2 v 3 )v 2 ) = UKD. If there is a leak, it will be denoted SEN_HD S (v 1 ;v 2 ). The reason why HD leaks are divided toSEN_HD D andSEN_HD S is because they have to be mitigated dierently. When the leak involves two instructions, it may be mitigated by constraining the register allocation algorithm such thatv 1 andv 2 no longer can share a register. In contrast, when the leak involves a single instruction, it cannot be mitigated in this manner because in x86, for example, all binary instructions require the result to share the same register or memory location with one of the operands. Thus, mitigating theSEN_HD S requires that we rewrite the instruction itself. We also dene a relationShare(v 1 ;v 2 ), meaningv 1 andv 2 indeed may share a register, and use it to lter the HD-sensitive pairs, as shown in the two rules below. Share(v1;v2)^TYPE(v1v2) = UKD^v1 expr1^v2 expr2 SEN_HDD(v1;v2) Share(v1;v2)^TYPE(v1v2) = UKD^v1 Bop(v2;v3) SEN_HDS (v1;v2) Backend information (Section 4.4.1) is required to dene the relation; for now, we assume8v 1 ;v 2 : Share(v 1 ;v 2 ) = true. 4.3 MitigationduringCodeGeneration We mitigate leaks by using the two types of HD-sensitive pairs as constraints during register allocation. RegisterAllocation The classic approach, especially for static compilation, is based ongraphcoloring [51, 93], whereas dynamic compilation may use faster algorithms such as lossy graph coloring [60] or linear scan [166]. We apply mitigation on both graph coloring and LLVM’s basic register allocation algorithms. For ease of comprehension, we use graph coloring to illustrate our constraints. 62 In graph coloring, each variable corresponds to a node and each edge corresponds to aninterference between two variables, i.e., they may be in use at the same time and thus cannot occupy the same register. Assigning variables tok registers is similar to coloring the graph withk colors. To be ecient, variables may be grouped to clusters, orvirtualregisters, before they are assigned to physical registers (colors). In this case, each virtual register (vreg), as opposed to each variable, corresponds to a node in the graph, and multiple virtual registers may be mapped to one physical register. 4.3.1 HandlingSEN_HD D Pairs For eachSEN_HD D (v 1 ;v 2 ), wherev 1 andv 2 are dened in two instructions, we add the following constraints. First,v 1 andv 2 are not to be mapped to the same virtual register. Second, virtual registersvreg 1 andvreg 2 (forv 1 andv 2 ) are not to be mapped to the same physical register. Toward this end, we constrain the behavior of two backend modules: Register Coalescer and Register Allocator. Our constraint on Register Coalescer states thatvreg 1 andvreg 2 , which correspond tov 1 andv 2 , must never coalesce, although each of them may still coalesce with other virtual registers. As forRegisterAllocator, our constraint is on the formulation of the graph. For each HD-sensitive pair, we add a newinterference edge to indicate thatvreg 1 andvreg 2 must be assigned dierent colors. During graph coloring, these new edges are treated the same as all other edges. Therefore, our constraints are added to the register allocator and its impact is propagated automatically to all subsequent modules, regardless of the architecture (x86, MIPS or ARM). When variables cannot t in the registers, some will be spilled to memory, and all reference to them will be directed to memory. Due to the constraints we added, there may be more spilled variables, but spilling is handled transparently by the existing algorithms in LLVM. This is an advantage of our approach: it identies a way to constrain the behavior of existing modules in LLVM, without the need to reimplement any module from scratch. 63 4.3.2 HandlingSEN_HD S Pairs For eachSEN_HD S (v 1 ;v 2 ) pair, wherev 1 andv 2 appear in the same instruction, we additionally constrain theDAGCombiner module to rewrite the instruction before constraining the register allocation modules. To see why, considermk = (mk), which compiles to MOVL -4(%rbp), %ecx //-4(%rbp)= m (random) XORL -8(%rbp), %ecx //-8(%rbp)= k (secret) Here, -4(%rbp) and -8(%rbp) are memory locations form andk, respectively. Althoughm andmk areRUD (no leak) when stored in %ecx, the transition fromm tomk,HW (mmk) =k, has a leak. To remove the leak, we must rewrite the instruction: MOVL -4(%rbp), %ecx //-4(%rbp)= m XORL %ecx, -8(%rbp)//-8(%rbp)= k, and then mk Whilem still resides in %ecx, bothk andmk reside in the memory -8(%rbp). There is no leak because %ecx only storesm (RUD) andHW (mm) = 0. Furthermore, the solution is ecient in that no additional memory is needed. Ifk were to be used subsequently, we would copyk to another memory location and re-directed uses ofk to that location. Example Figure 4.6 shows a real program [225], wheres is an array storing sensitive data whilem1-m8 are random masks. The compiled code (left) has leaks, whereas the mitigated code (right) is leak free. The reason why the original code (left) has leaks is because, prior to Line 8, %eax storesm1m5, whereas after Line 8, %eax storess[0 +i 4]m1m5; thus, bit-ips in %eax is reected inHW (%eax 1 %eax 2 ) = s[0 +i 4], which is the sensitive data. During register allocation, a virtual registervreg 1 would correspond tom1m5 whilevreg 2 would correspond tos[0 +i 4]m1m5. Due to a constraint from this SEN_HD S pair, our method would preventvreg 1 andvreg 2 from coalescing, or sharing a physical register. After rewriting,vreg 2 shares the 64 1 void remask(uint8_t s[16], uint8_t m1, uint8_t m2, uint8_t m3, uint8_t m4, uint8_t m5, uint8_t m6, uint8_t m7, uint8_t m8){ 2 int i; 3 for(i = 0; i< 4; i++){ 4 s[0+i*4] = s[0+i*4] ^ (m1^m5); 5 s[1+i*4] = s[1+i*4] ^ (m2^m6); 6 s[2+i*4] = s[2+i*4] ^ (m3^m7); 7 s[3+i*4] = s[3+i*4] ^ (m4^m8); 8 } 9 } 1 //Before Mitigation 2 movslq -28(%rbp), %rdx 3 movq -16(%rbp), %rcx 4 movzbl (%rcx,%rdx,4), %edi 5 movzbl -17(%rbp), %esi 6 movzbl -21(%rbp), %eax 7 xorl %esi, %eax 8 xorl %edi, %eax 9 movb %al, (%rcx,%rdx,4) 1 //After mitigation 2 movslq -28(%rbp), %rdx 3 movq -16(%rbp), %rcx 4 5 movzbl -17(%rbp), %esi 6 movzbl -21(%rbp), %eax 7 xorl %esi, %eax 8 9 xorb %al, (%rcx,%rdx,4) Figure 4.6: Code snippet from the Byte Masked AES [225]. same memory location ass[0 +i 4]) whilevreg 1 remains unchanged. Thus,m1m5 is stored in %al and s[0 +i 4]m1m5 is spilled to memory, which removes the leak. 4.4 Domain-specicOptimizations While the method presented so far has all the functionality, it can be made faster by domain-specic optimizations. 4.4.1 LeveragingtheBackendInformation To detect HD leaks that likely occur, we focus on pairs of variables that may share a register as opposed to arbitrary pairs of variables. For example, if the live ranges of two variables overlap, they will never share a register, and we should not check them for HD leaks. Such information is readily available in the compiler’s backend modules, e.g., in graph coloring based register allocation, variables associated with anyinterference edge cannot share a register. Thus, we deneShare(v 1 ;v 2 ), meaningv 1 andv 2 may share a register. After inferring the variable types as RUD, SID, or UKD, we useShare(v 1 ;v 2 ) to lter the variable pairs subjected to checking for SEN_HD D 65 andSEN_HD S leaks (see Section 4.2.5). We will show in experiments that such backend information allows us to dramatically reduce the number of HD-sensitive pairs. 4.4.2 Pre-computingDatalogFacts By default, only input annotation and basic data-ow (def-use) are encoded as Datalog facts, whereas the rest has to be deduced by inference rules. However, Datalog is not the most ecient way of computing sets, such assupp(v),unq(v) anddom(v), or performing set operations such asm 1 2 supp(v). In contrast, it is linear time [158] to compute sets such as supp(v), unq(v) and dom(v) explicitly. Thus, we choose to precompute them in advance and encode the results as Datalog facts. In this case, precomputation results are used to jump start Datalog based type inference. We will show, through experiments, that the optimization can lead to faster type inference than the default implementation. 4.4.3 EcientEncodingofDatalogRelations There are dierent encoding schemes for Datalog. For example, if IN =fi 0 ;:::;i 3 g and supp(v 1 ) = fi 1 ;i 2 g and supp(v 2 ) =fi 0 ;i 1 ;i 3 g. One way is to encode the sets is using a relation Supp : VIN , whereV are variables andIN are supporting inputs: Supp(v 1 ;i 1 )^Supp(v 1 ;i 2 ) = supp(v 1 ) Supp(v 2 ;i 0 )^Supp(v 2 ;i 1 )^Supp(v 2 ;i 3 ) = supp(v 2 ) While the size ofSupp isjVjjINj, each set needs up tojINj predicates, and set operation needsjINj 2 predicates. 66 Another way is to encode the sets is using a relationSupp :V 2 IN , where 2 IN is the power-set (set of all subsets ofIN ): Supp(v 1 ;b0110) = supp(v 1 ) Supp(v 2 ;b1011) = supp(v 2 ) While the size ofSupp isjVj 2 jINj , each set needs one predicate, and set operation needs 2 predicates (a bit-wise operation). WhenjINj is small, the second approach is more compact; but asjINj increases, the table size of Supp increases exponentially. Therefore, we propose an encoding, called segmented bitset representation (idx,bitset), where idx=i refers to thei-th segment and bitset i denotes the bits in thei-th segment. Supp(v 1 ;1;b01)^Supp(v 1 ;0;b10) = supp(v 1 ) Supp(v 2 ;1;b10)^Supp(v 2 ;0;b11) = supp(v 2 ) In practice, when the bitset size is bounded, e.g., to 4, the table size remains small while the number of predicates increases moderately. This encoding scheme is actually a generalization of the previous two. When the size of bitset decreases to 1 and the number of segments increases tojINj, it degenerates to the rst approach. When the size of bitset increases tojINj and the number of segments decrease to 1, it degenerates to the second approach. 4.5 Experiments We have implemented our method in LLVM 3.6 [125]. We used theZ [108] Datalog engine in Z3 [67] to infer types. While the mitigation part targeted x86, it may be extended to other platforms. We conducted experiments on a number of cryptographic programs. Table 4.2 shows the statistics, including the name, a description, the number of lines of code, and the number of variables, which are divided further to input and internal variables. All benchmarks are masked. P1-P3, in particular, are protected by Boolean masking 67 Table 4.2: Statistics of the benchmark programs. Name Description LoC Program Variables IN PUBLIC IN SECRET IN RANDOM Internal P1 AES Shift Rows [30] 11 0 2 2 22 P2 Messerges Boolean [30] 12 0 2 2 23 P3 Goubin Boolean [30] 12 0 1 2 32 P4 SecMultOpt_wires_1 [174] 25 1 1 3 44 P5 SecMult_wires_1 [174] 25 1 1 3 35 P6 SecMultLinear_wires_1 [174] 32 1 1 3 59 P7 CPRR13-lut_wires_1 [63] 81 1 1 7 169 P8 CPRR13-OptLUT_wires_1 [63] 84 1 1 7 286 P9 CPRR13-1_wires_1 [63] 104 1 1 7 207 P10 KS_transitions_1 [20] 964 1 16 32 2,329 P11 KS_wires [20] 1,130 1 16 32 2,316 P12 keccakf_1turn [20] 1,256 0 25 75 2,314 P13 keccakf_2turn [20] 2,506 0 25 125 4,529 P14 keccakf_3turn [20] 3,764 0 25 175 6,744 P15 keccakf_7turn [20] 8,810 0 25 349 15,636 P16 keccakf_11turn [20] 13,810 0 25 575 24,472 P17 keccakf_15turn [20] 18,858 0 25 775 33,336 P18 keccakf_19turn [20] 23,912 0 25 975 42,196 P19 keccakf_24turn [20] 30,228 0 25 1,225 53,279 P20 AES_wires_1 [63] 34,358 16 16 1,232 63,263 that was previously veried [30, 78, 228]. The other programs, from [20], are masked multiplication [174], masked S-box [63], masked AES [63] and various masked MAC-Keccak functions [20]. Our experiments were designed to answer three questions: (1) Is our type system eective in detecting HD leaks? (2) Are the domain-specic optimizations eective in reducing the computational overhead? (3) Does the mitigated code have good performance after compilation, in terms of both the code size and the execution speed? In all the experiments, we used a computer with 2.9 GHz CPU and 8GB RAM, and set the timeout (T/O) to 120 minutes. 4.5.1 LeakDetectionResults Table 4.3 shows the results, where Columns 1-2 show the benchmark name and detection time and Columns 3-4 show the number of HD leaks detected. The leaks are further divided intoSEN_HD D (two-instruction) and SEN_HD S (single-instruction). Columns 5-7 show more details of the type inference, including the number ofRUD,SID andUKD variables, respectively. While the time taken to complete type inference is not negligible, e.g., minutes for the larger programs, it is reasonable because we perform a much deeper 68 Table 4.3: Results of type-based HD leak detection. Name Detection Time HD Leaks Detected Details of the Inferred Types UKD[78] SEN_HD D SEN_HD S RUD SID UKD P1 0.061s NONE NONE 22 0 4 4 P2 0.105s NONE NONE 20 0 7 6 P3 0.099s NONE 2 31 3 1 1 P4 0.208s NONE 2 31 12 6 5 P5 0.216s NONE 2 29 10 1 1 P6 0.276s 4 2 48 15 1 1 P7 0.213s 10 2 151 25 2 2 P8 0.147s 12 2 249 42 4 4 P9 0.266s 6 2 153 61 2 2 P10 0.550s NONE NONE 2,334 12 31 -* P11 0.447s 4 16 2,334 0 31 - P12 0.619s NONE 7 2,062 300 52 - P13 1.102s NONE 5 4,030 600 49 - P14 1.998s NONE 5 5,995 900 49 - P15 16.999s NONE 25 13,861 2,100 49 - P16 24.801s NONE 5 21,723 3,300 49 - P17 59.120s NONE 5 29,587 4,500 49 - P18 2m1.540s NONE 4 37,449 5,700 47 - P19 3m22.415s NONE 5 47,280 7,200 49 - P20 16m12.320s 29 33 38,070 26,330 127 - -*Model counting can not nish on P10-P20 due to the limited scalability program analysis than mere compilation. To put it into perspective, the heavy-weight formal verication approaches often take hours [78, 228]. As for the number of leaks detected, although the benchmark programs are all masked, during normal compilation, new HD leaks were still introduced as a result of register reuse. For example, in P20, which is a masked AES [20], we detected 33SEN_HD S leaks after analyzing more than 60K intermediate variables. Overall, we detected HD leaks in 17 out of the 20 programs. Furthermore, 6 of these 17 programs have both SEN_HD D andSEN_HD S leaks, while the remaining 11 have onlySEN_HD S leaks. Results in Columns 5-7 of Table 4.3 indicate the inferred types of program variables. Despite the large number of variables in a program, our type inference method does a good job in proving that the vast majority of them areRUD orSID (no leak); even for the fewUKD variables, after the backend information is used, the number of actual HD leaks detected by our method is small. The last column of Table 4.3 shows theUKD variables detected by model counting [228, 78]. In comparison, our type system reports only 5% false postives (i.e., our inference rules are conservative). 69 Table 4.4: Results of quantifying impact of optimizations. Name Detection Time Without Backend-Info With Backend-Info w/o optimization w/ optimization SEN_HD D SEN_HD S SEN_HD D SEN_HD S P1 0.865s 0.061s 0 18 0 0 P2 0.782s 0.105s 0 9 0 0 P3 0.721s 0.099s 0 15 0 2 P4 1.102s 0.208s 0 32 0 2 P5 1.206s 0.216s 0 32 0 2 P6 1.113s 0.276s 8 40 4 2 P7 5.832s 0.213s 44 144 10 2 P8 4.306s 0.147s 68 323 12 2 P9 5.053s 0.266s 43 160 6 2 P10 10m1.513s 0.550s 12 180 0 0 P11 15m51.969s 0.447s 12 180 4 16 P12 T/O 0.619s 473 1,820 0 7 P13 T/O 1.102s 492 1,884 0 5 P14 T/O 1.998s 492 1,884 0 5 P15 T/O 16.999s 492 1,884 0 25 P16 T/O 24.801s 492 1,884 0 5 P17 T/O 59.120s 492 1,884 0 5 P18 T/O 2m1s 468 1,800 0 4 P19 T/O 3m22s 492 1,884 0 5 P20 T/O 16m13s 620 1,944 29 33 4.5.2 EectivenessofOptimizations To quantify the impact of our optimizations, we measured the performance of our method with and without them. Table 4.4 shows the signicant dierences in analysis time (Columns 2-3) and detected HD leaks (Columns 4-7). Overall, the optimized version completed all benchmarks whereas the unoptimized only completed half. For P12, in particular, the optimized version was 11,631X faster because the unoptimized version ran out of memory and started using virtual memory, which resulted in the slow-down. Leveraging the backend information also drastically reduced the number of detected leaks. This is because, otherwise, we have to be conservative and assume any two variables may share a register, which results in many false leaks in x86. In P12, for example, using the backend information resulted in 260X fewer leaks. 4.5.3 LeakMitigationResults We compared the size and execution speed of the LLVM compiled code, with and without our mitigation. The results are shown in Table 4.5, including the number of bytes in the assembly code and the execution time. Columns 8-9 show more details: the number of virtual registers marked as sensitive and non-sensitive, respectively. 70 Table 4.5: Results of our HD leak mitigation. Name Code-size Overhead (byte) Runtime Overhead (us) Virtual Register original mitigated % original mitigated % sensitive non-sensitive P3 858 855 0.3 - - - 2 4 P4 1,198 1,174 2 0.23 0.20 -13 2 13 P5 1,132 1,108 2.12 0.30 0.37 2.3 2 9 P6 1,346 1,339 0.52 0.30 0.27 -10 5 8 P7 3,277 3,223 1.64 0.29 0.30 3.4 10 27 P8 3,295 3,267 0.85 0.20 0.22 10 11 83 P9 3,725 3,699 0.69 0.7 0.78 11 10 29 P11 44,829 44,735 0.21 5.60 6.00 7.1 18 680 P12 46,805 46,787 0.03 6.20 6.50 4.83 7 726 P13 90,417 90,288 0.14 13.60 13.00 -4.41 5 1,384 P14 134,060 133,931 0.09 23.00 21.00 -8.69 5 2,040 P15 313,454 312,930 0.16 52.00 58.00 11.5 25 4,637 P16 496,087 495,943 0.03 91.00 96.00 5.49 5 7,288 P17 677,594 677,450 0.02 129.00 136.00 5.42 5 9,912 P18 859,150 859,070 0.009 178.00 183.00 2.80 4 12,537 P19 1,086,041 1,085,897 0.047 237.000 250.000 5.48 5 15,816 P20 957,372 957,319 0.005 228.600 248.300 8.75 56 9,035 The results show that our mitigation has little performance overhead. First, the code sizes are almost the same. For P8, the mitigated code is even smaller because, while switching the storage from register to memory during our handling of theSEN_HD S pairs, subsequent memory stores may be avoided. Second, the execution speeds are also similar. Overall, the mitigated code is 8%-11% slower, but in some cases, e.g., P4 and P6, the mitigated code is faster because of our memory related rewriting. The main reason why our mitigation has little performance overhead is because, as shown in the last two columns of Table 4.5, compared to the total number of virtual registers, the number of sensitive ones is extremely small. P17 (keccakf_15turn), for example, has only 5 sensitive virtual registers out of the 9,917 in total. Thus, our mitigation only has to modify a small percentage of the instructions, which does not lead to signicant overhead. 4.5.4 ComparisontoHigh-OrderMasking On the surface, HD leaks seem to be a type of second-order leaks, which involves two values. For people familiar with high-order masking [20], a natural question is whether the HD leaks can be mitigated using high-order masking techniques. To answer the question, we conducted two experiments. First, we checked if HD leaks exist in programs equipped with high-order masking. Second, we compared the size and execution speed of the code protected by either high-order masking or our mitigation. 71 Table 4.6: Comparison with order-d masking techniques [20]. Name Code size (byte) Run time (us) HW-leak HD-leak SEN_HD D SEN_HD S P4 (ours) 1,171 0.20 No No NONE NONE P4 (d=2) 2,207 0.75 No Yes NONE 2 P4 (d=3) 4,009 0.28 No Yes NONE 2 P4 (d=4) 5,578 0.75 No Yes NONE 2 P4 (d=5) 7,950 1.00 No Yes NONE 2 P5 (ours) 1,108 0.37 No No NONE NONE P5 (d=2) 2,074 0.70 No Yes NONE 2 P5 (d=3) 3,733 0.60 No Yes NONE 2 P5 (d=4) 5,120 0.75 No Yes NONE 2 P5 (d=5) 7,197 0.67 No Yes NONE 2 P6 (ours) 1,339 0.27 No No NONE NONE P6 (d=2) 3,404 0.83 No Yes NONE 2 P6 (d=3) 6,089 0.57 No Yes NONE 2 P6 (d=4) 9,640 0.80 No Yes NONE 2 P6 (d=5) 14,092 1.60 No Yes NONE 2 P7 (ours) 3,223 0.30 No No NONE NONE P7 (d=2) 8,456 1.41 No Yes NONE 2 P7 (d=3) 15,881 3.20 No Yes NONE 2 P7 (d=4) 25,521 4.20 No Yes NONE 2 P7 (d=5) 37,578 7.80 No Yes NONE 2 P8 (ours) 3,267 0.25 No No NONE NONE P8 (d=2) 8,782 1.30 No Yes NONE 2 P8 (d=3) 16,420 2.00 No Yes NONE 2 P8 (d=4) 26,431 4.00 No Yes NONE 2 P8 (d=5) 38,996 8.00 No Yes NONE 2 P9 (ours) 3,699 0.45 No No NONE NONE P9 (d=2) 9,258 1.15 No Yes NONE 2 P9 (d=3) 17,565 3.00 No Yes NONE 2 P9 (d=4) 28,189 5.11 No Yes NONE 2 P9 (d=5) 41,383 8.40 No Yes NONE 2 Table 4.6 shows the results on P4-P9, which come from [20] and have versions protected byd-order masking, whered = 2 to 5. While initially we also expected to see no HD leaks in these versions, the results surprised us. As shown in the last two columns, HD leaks were detected in all these high-order masking protected programs. A closer look shows that these leaks are all of theSEN_HD S type, meaning they are due to restriction of the x86 ISA: any binary operation has to store the result and one of the operands in the same place, and by default, that place is a general-purpose register. Measured by the code size and speed, our method is more ecient. In P9, for example, our mitigated code has 3K bytes in size and runs in 0.45us, whereas the high-order masking protected code has 9K to 41K bytes (ford = 2 to 5) and runs in 1.15us to 8.40us. 4.5.5 ThreattoValidity We rely on the HW/HD models [136, 135] and thus our results are valid only when these models are valid. We assume the attacker can only measure the power consumption but not other information such 72 as data-bus or timing. If such information becomes available, our mitigation may no longer be secure. Since we focus on cryptographic software, which has simple program structure and language constructs, there is no need for more sophisticated analysis than what is already available in LLVM. Our analysis is intra-procedural: for cryptographic benchmarks, we can actually inline all functions before conducting the analysis. Nevertheless, some of these issues need to be addressed to broaden the scope of our tool. 4.6 Summary I have presented a method for mitigating power side-channel leaks caused by register reuse. The method relies on type inference to detect leaks, and leverages the type information to constrain the compiler’s backend to guarantee that register allocation is secure. I have implemented the method in LLVM for x86 and evaluated it on cryptographic software. My experiments demonstrate that the method is eective in mitigating leaks and the mitigated program has low runtime overhead. Specically, it outperforms state-of-the-art high-order masking techniques in terms of both the code size and the execution speed. 73 Chapter5 VerifyingtheCorrectnessofMitigatedPrograms Invariant generation is a fundamental problem in program analysis and verication, e.g., to prove that assertions always hold during program execution. Loop invariants [85, 106], for example, are conditions that must be true at the beginning and the end of every iteration of a loop. Since the problem is undecidable in general, all practical techniques must search for invariants heuristically in a potentially-innite space of candidates. While there is a large body of work on making the search ecient, e.g., using guided search [186, 101], data-driven sampling [159, 232], supervised learning [191, 193], continuous logic network [224, 182], and decision tree with templates [90, 89], they target a single program, as opposed to relational invariants which is the main focus of this paper. Relational invariants are logical assertions dened over multiple programs or program executions. They are useful for reasoning about the relationship between these programs or program executions. One example application is to prove functional equivalence, i.e., two programs always behave the same when given the same input [58, 188]. Another example application is to check a security property called non-interference, i.e., executing a program using two dierent values of a secret input does not lead to observable dierences in the public output [55, 50, 21]. The third example application is to verify the continuity property, i.e., a program remains robust with respect to innitesimal changes to the input [53]. However, to the best of our knowledge, there is still a lack of techniques and tools for eciently synthesizing relational invariants. 74 State-of-the-art invariant synthesis tools, which were designed primarily for a single program, cannot be easily adapted to generate relational invariants. To conrm this, we have experimented with two state-of-the-art tools: Code2Inv [193] and LinearArbitrary [232]. In this experiment, we took two structurally-dierent but functionally-equivalent programs,P 1 andP 2 , and created a merged program P that executes instructions fromP 1 andP 2 in lockstep; then we specied the equivalence relation as fgPf g, which is a Hoare triple [106] saying that, ifP 1 andP 2 start from the same state (), after the lockstep execution, they must end at the same state ( ). Unfortunately, neither tools can generate invariants that are strong enough to help verify the equivalence relation. Code2Inv [193] generated an invariant in which none of the predicates was relational, while LinearArbitrary [232] generated an over-tted solution that unnecessarily depends on some arbitrary constants appeared in the sampled data. More details of this experiment can be found in Section 5.1. To overcome the limitations, we have developed a new method named Code2RelInv, whose input consists of a merged programP and a specication in the form of a Hoare triple' =h; i, where is the precondition and is the postcondition. The output of Code2RelInv, which is a relational invariant I, is guaranteed to be both inductive (i.e., a true invariant) and sucient (i.e., strong enough to prove the property at hand). Figure 5.1 shows the overall ow of Code2RelInv, which uses a standard syntax guided synthesis (SyGuS) [7] component to generate invariant candidates (I), one at a time, from a hypothesis space dened by a domain-specic language (DSL). Then, it uses an SMT-solver based program verier to check ifI is both inductive and sucient. Candidates that are not inductive, or not sucient, are removed. The iterative process continues until a desired invariant is found, or a predetermined time limit is reached. The novel part of our method is the component that leverages the learning based techniques to reduce the search space. 75 Merged ProgramP Spec' Generate Invariants (SyGuS) Logic Reasoning (LR) 1 Prune Search Space Reinforcement Learning (RL) 2 Prioritize the Search Proved Inductive Proved Sucient Relational InvariantI InvariantSynthesis I I N N Y Figure 5.1: Code2RelInv: Our invariant synthesis method. We propose two learning based techniques to make the synthesis procedure ecient. The rst one is logical reasoning (LR) basedsearchspacepruning: as soon as the verier declares an invariant candidate I as invalid, we analyze the reason why it is invalid and, based on the reason, skip all other invariant candidates that share the same reason. In this sense, our method has the ability to learn from past mistakes. Specically, our LR based pruning relies on the SMT solver’s ability to generate unsatisability (UNSAT) cores. The second technique is reinforcement learning (RL) basedsearchprioritization: the idea is to identify the part of the candidate space that is more promising and explore it rst. This is accomplished by treating the invariant synthesis process as a Markov Decision Process (MDP) and use the verier’s results as positive and negative rewards to compute an exploration policy. Details of our LR and RL based techniques can be found in Sections 5.3 and 5.4, respectively. While learning has been used to generate invariants before, e.g., in [193] and [232], they do not target relational invariants. The dierence is important, for two reasons. First, the predicates must berelational, i.e., consisting of variables from dierent programs. By denition, these variables have no standard control/data ow dependencies in the merged program. Thus, any prior technique relying on the standard program dependencies would not work. Second, while prior techniques may check whether a generated invariant is inductive, they do not check whether it is sucient. Thus, in many cases, it remains unclear how useful the generated invariants are in proving the property at hand. Our method overcomes the above two challenges. At a high level, it can be understood as a way to intelligently aggregate and learn from past mistakes. Whenever the verier declares an invariant candidate 76 I as either not inductive or not sucient, we use the information to avoid generating invariant candidates from the same equivalence class asI in the future. We also use the information to learn an exploration policy to identify invariant candidates that may have a higher chance of passing the verication. We have evaluated the proposed method on a diverse set of relational verication benchmarks, con- sisting of a set of C programs and three types of relational properties: equivalence over various loop optimizations [23], non-interference for DARPA STAC programs [14] and continuity of a number of sorting algorithms [53]. We experimentally compared our method with a state-of-the-art invariant synthesizer, Code2Inv [193]. The experimental results show that, for all benchmarks, our method was able to generate the desired invariants quickly, whereasCode2Inv failed in most cases. Furthermore, both of our learning based techniques (LR and RL) are eective in reducing the search space: with these techniques, the number of invariant candidates explored by our method can be reduced by as much as 96%. To summarize, this paper makes the following contributions: • We propose a new method for synthesizing relational invariants, which uses both syntax-guided synthesis (SyGuS) and an SMT solver based program verier to guarantee that the invariants are both inductive and sucient. • We propose a logical reasoning (LR) based technique, which leverages the SMT solver’s ability to compute unsatisability cores to prune the search space. • We propose a reinforcement learning (RL) based technique, which leverages the verier’s results as positive and negative rewards to prioritize the search. • We conduct experimental evaluation on a diverse set of relational verication benchmarks to demon- strate the eectiveness of our method. 77 int P 1 (int x,int n) int i , k = 0; for (i=0; i!=n; ++i) x += k*5; k += 1; if (i >= 5) k += 3; return x; (a) int P 2 (int x,int n) int i, k = 0; for( i=0; i!= n; ++ i) x += k; k += 5; if ( i >= 5) k += 15; return x; (b) : { x = x, n = n} int P 0 (int x,int n,int x,int n) int i , k = 0; int i, k = 0; while (i!=n && i != n) x += k*5; x += k; k += 1; k += 5; if (i >= 5) k += 3; if ( i >= 5) k += 15; i++; i ++; return x, x; : { x = x } (c) Figure 5.2: Given two programs P 1 and P 2 , we merge them into a single program P to execute the instructions in lockstep. 5.1 Motivation Consider the two programs in Figure 5.2, taken from [188], whereP 2 is obtained fromP 1 using a loop optimization called strength reduction [204]: if variable k is incremented in each loop iteration, the expressionkc can be safely rewritten ask, given thatc is a constant and increments tok at each iteration are scaled byc. Since variables in the two programs may have dierent values, for each variablex inP 1 , we usex to denote the same variable inP 2 . To prove the equivalence, an invariant must be provided to show how program states inP 1 andP 2 are related to each other. 5.1.1 ProblemStatement In relational verication, it is a common practice to construct a merged programP , shown in Figure 5.2 (c), that executes instructions fromP 1 andP 2 in lockstep. Statements fromP 1 andP 2 are carefully aligned, e.g., by adding auxiliary statements or even unrolling some loop iterations if needed. While techniques for loop alignment are important, they are not the focus of this work; for more information please refer to [23, 58]. The property under verication is expressed as' :=fgPf g, meaning that, from a state where the precondition holds, executingP leads to a state where the postcondition holds. Since loops are the most challenging part in program verication, without loss of generality, we denote the merged program as P :=whileg doS. In this context, we want an invariantI of the programP with respect to the property ' to satisfy three conditions: 78 (a) the precondition impliesI at the beginning of the loop, denoted !I; (b) I being true at the beginning of a loop impliesI being true at the end of the loop, denotedfI^ ggSfIg, and (c)I being true at the end of the loop implies the postcondition , denotedI^:g! . Conditions (a) and (b) imply thatI is inductive, and Condition (c) implies thatI is sucient for proving the property'. 5.1.2 LimitationsofExistingMethods Feeding the merged programP to state-of-the-art invariant synthesizers such asCode2Inv andLinearAr- bitrary does not produce the desired invariants. For the example in Figure 5.2,Code2Inv [193] produces ((i<= (0 1)jjn>= (n +k))^ (n == njji<= (k + 0))) which is neither inductive nor sucient. Furthermore, sinceCode2Inv relies on the standard program dependency information to decide whether two variables should be put into the same predicate, while pairs of variables fromP 1 andP 2 (such ask andk) do not have control/data dependencies at all, they never show up in the same predicate. LinearArbitrary [232] produces (xx 0^xx 0^(:(i<= 1)_i< 2)^(:(i<= 2)_:(i 3))^::: ) which is over-tted in the sense that some of the predicates unnecessarily depend on constant values appeared in the sampled data. This is an undesired consequence of using techniques that learn from sampled data. 5.1.3 HowOurBaselineMethodWorks In contrast, our method is able to generate the desired relational invariant:I :=fx =x^k5 =k^i =ig. Note that the invariant is both inductive and sucient. Furthermore, the invariant is relational in that each predicate refers to a pair of program variables fromP 1 andP 2 , respectively. 79 1 2 ! ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ 3 4 5 6 7 8 9 10 11 12 13 14 15 && = ' = ̅ !" =7 ! ∶ % = ∧( ̅ = )1 2 ! ∅ ∅ ∅ ∅ ∅ ∅ ∅ ∅ 3 4 5 6 7 8 9 10 11 12 13 14 15 && = ' = ̅ + !" =5 (a)ASTi withoutc (left) and withc (right) 1 2 !"# ∅ ∅ 5 ∅ ∅ ∅ ∅ 3 4 5 6 7 8 9 10 11 12 13 14 15 && = ) ∗ = ̅ 1 2 !"# ∅ ∅ ∅ ∅ ∅ ∅ 5 3 4 5 6 7 8 9 10 11 12 13 14 15 && = ) = ̅ ∗ ℎ $ %& =15 ℎ $ %& =7 (b)ASTi+1 withoutc (left) and withc (right) Figure 5.3: Constructing the next AST by modifying the current AST. Thei-th candidate shown in (a), with and without the conict predicate c . The (i + 1)-th candidate shown in (b), with and without c -based pruning. Our method works as follows. First, we capture the space of invariant candidates using the domain specication language (DSL) shown in Figure 5.4. Then, we use the syntax guided synthesis (SyGuS) framework [7] to enumerate invariant candidates from the hypothesis space, one at a time, and using a verier if they are both inductive and sucient. The rst invariant candidate may beI :=fk = k^x = xg, whose abstract syntax tree (AST) is shown in Figure 5.3 (a) asAST i . Here, the label; means the node is not-in-use (NULL). ForI to be inductive, the formula below must hold: F I := (!I)^ (fI^ggSfIg) (5.1) This is a classic program verication problem [85, 106], which can be solved by constructing a set of vericationconditions (VCs) and then discharging these VCs using an SMT solver. In our method, we use Z3 [149] as the SMT solver. ForI to be sucient, the formula below must hold: F s := (I^:g! ): (5.2) We check this formula also using the Z3 SMT solver. Since the rst invariant candidate is not inductive, it will fail the check byF I . Therefore, our method generates a new invariant candidate. Without our learning based optimizations, however, the baseline SyGuS procedure would have produced the candidate shown on the left of Figure 5.3(b). This is not ecient 80 because the new candidate would no only fail the check byF I , but also fail for the same reason as the initial candidate. 5.1.4 OurLearning-basedOptimizations With a logical reasoning (LR) based technique, our method is able to identify the reason why the rst candidate fails to be inductive. As shown by the red dashed box in Figure 5.3(a), it is because the rst candidate contains the conict predicate c := (k = k). In other words, c contradicts to the program semantics. Thus, as long as a candidate contains c , it will fail to be inductive. Since the second candidate on the left of Figure 5.3(b) also contains c , it would fail to be inductive for the same reason. Thus, our method avoids generating this candidate in the rst place. Instead, it generates the candidate on the right of Figure 5.3(b). In addition to the LR based optimization, our method also uses a reinforcement learning (RL) based optimization to prioritize the search. While invariant candidates are being analyzed, the RL agent uses the verier’s results as positive and negative rewards to compute an exploration policy. The exploration policy denes, for each AST node shown in Figure 5.3, a probability distribution of its possible values, which can be used by the synthesizer to pick values so as to maximize the expected reward. In the running example, assuming that the next AST node to ll is node 5 and the node type is an Arithmetic Expression 5 =fcvar;varg, we need to choose one of the two elements. By using the exploration policy computed by the RL agent, we can pick an element with a higher probability to generate the next candidate. 5.2 OurMethod In this section, we present the baseline method, while leaving the LR and RL based optimizations to Sections 5.3 and 5.4, respectively. 81 Algorithm3 Our method for synthesizing relational invariants. Input: Merged programP , Relational property' Output: Relational invariantI 1:I ;,dl 1, andG f(dl;G)j1dl 2 H 1g 2:C ; andPRL undef 3: while running time < thresholddo 4: I;T Gen_Next_InvCandidate(I;dl;G;C;PRL) 5: if Proved_Inductive(P ,',I)then 6: if Proved_Sufficient(P ,',I)thenreturnI 7: endif 8: endif 9: SC;c Prune_by_LR(P ,',I,dl,C) . updatedl;C 10: Prioritize_by_RL(I;T;SC;c,PRL) . updatePRL 11: endwhile 5.2.1 Top-levelProcedure Algorithm 3 shows the top-level procedure which takes a merged programP and a property' =h; i as input, and returns the invariantI as output. It rst initializes the data structures:I,dl,G,P RL andC. Here, I is the AST of the invariant candidate, which is initialized toNULL. The decision level,dl, is the index of the AST node inI that will be modied to generate the next invariant candidate. While modifying the AST, we follow the depth-rst-search (DFS) order. Therefore,dl refers to the backtracking point during DFS.G is a data structure that maps each backtracking pointdl to itsunvisited grammar setG; this is elaborated in Section 5.2.2. We ignoreC andP RL for now since they implement the learning-based optimizations to be presented in Sections 5.3 and 5.4. After initializing the data structures, our method uses syntax-guided synthesis (SyGuS) to generate an invariant candidateI in the hypothesis space dened by a domain specic language (DSL). IfI is bothinductive andsucient, it will be returned as the output. Otherwise, subroutinesPrune_by_LR and Prioritize_by_RL are invoked to reduce the search space, before our method generates another invariant candidate. In the remainder of this section, we focus on the baseline version of Algorithm 3 without the LR and RL based optimizations. 82 Boolean := p |: |_ |^ AtomicPred p := aa | # a # a |A 0 A 0 ArrayExpr # a := getValue(A;i) | # a] # a | # A:F ArrayIndex i := a |c ArithExpr a := a0 |a]a |a]c ArithExpr0 a0 := cvar |var Comparator := = |< | |> | |6= Operator ] := + | ArrayFunc F := sum(A;i l ;i h ) |min(A;i l ;i h ) |max(A;i l ;i h ) ArrayA 0 :=A |getSubset(A;i l ;i h ) |getSubset1(A;ic) Figure 5.4: The DSL for relational invariants, wherec is a set of constants,var is a set of variables, andA is a set of arrays. 5.2.2 Domain-SpecicLanguage(DSL) Figure 5.4 shows the context-free grammarG of the DSL for expressing the invariants.G maps a type (i.e., the left-hand side of ":=") to a set of compatible values (i.e., the right-hand side of ":="). For instance, the feasible values for representing atomic predicate areG[p] = {aa, # a # a ,A 0 A 0 }. The DSL is designed such that invariants in the DSL can be analyzed by any SMT solver that supports the popular linear integer arithmetic (LIA) and array theories. Letvar be the set of variables from programsP 1 andP 2 ,A be the set of arrays, andc be the set of constants. Linear integer arithmetic expression,a, is dened overvar andc, while array expression # a is dened overA. FunctiongetValue(A;i) returns thei-th element of the arrayA, while # A:F denotes applying function F to arrayA, which returns a single value. Here, functionF may besum,min ormax, which are frequently used in programs that manipulate arrays. Function getSubset(A;i l ;i h ) returns another arrayA[i l ;i h ], which has a subset of the elements. Similarly, getSubset1(A;ic) returns a subset of the elements satisfying the condition (ic). For instance,getSubset1(A;i6= 2) returns a new arrayS =fA[i]ji6= 2^ 0ijAjg. As an example, consider the expression (i =j + 1)^ (d[1;j] = d[1;j])^ (b[j] =a[j])^ (a = a). In our DSL, it is (i = j + 1) ^ (getSubset(d; 1;j) = getSubset( d, 1, j)) ^ (getValue(b;j) = getValue(a;j))^ (a = a). 83 5.2.3 AbstractSyntaxTree(AST) We use a complete binary tree to represent the ASTs of invariant candidates. LetH be the height of the tree, the total number of nodes will be 2 H 1. Figure 5.5 shows an example tree whose height isH = 3. Each node has a unique indexN2f1;:::; 2 H 1g. The index of the root node is 1. Given any node with indexN , its two child nodes have indices 2N and 2N + 1, respectively. Each nodeN has a type N , which may bevar,c,A, or any element in the setf;p; # a;i;a;a 0 ;;]; :::g, which corresponds to the set of grammar rules in Figure 5.4. If the type N isvar,c, orA, the nodeN corresponds to a scalar variable invar, a constant inc, or an array inA. Otherwise, the node corresponds to a set of production rules dened by the grammar in Figure 5.4. For example, if N = p, the set of production rules,G[ N ], is {aa, # a # a ,A 0 A 0 }. Assuming that # a # a is chosen, we have N ( 2N ; 2N+1 ), meaning that the two child nodes have a type 2N = 2N+1 = # a . Thus, an invariantI can be represented by a set of node (N ) and value (v) pairs: I := f (N;v)j 1N <= 2 H 1;v2G[ N ]g (5.3) In Figure 5.5, for example, we have an incomplete invariant under constructionI 3 =f(1; &&); (2; = ); (4;var 1 )g. 5.2.3.1 ConstructinganAST Our baseline method systematically traverses all ASTs that can be represented by the binary tree. To simplify implementation, the traversal strictly follows the DFS order. For the example in Figure 5.5, the DFS order isL = [1; 2; 4; 5; 3; 6; 7]. Similarly, for the example in Figure 5.3, the DFS order isL = [1; 2; 4; 8, 9; 5; 10; 11; 3; 6; 12; 13; 7; 14; 15]. 84 = 1,2,4,5,3,6,7 ? ? ? ? ? ? ? 1 2 3 4 5 6 7 = 0, = ∅ !" = = 0 = 1 # = !" , = ∅ ! ? ? ? ? ? ? 1 2 3 4 5 6 7 = 1, = ”&&” !" = = 1 = 2 $ = 1,&& " = ? ? ? ? ? 1 2 3 4 5 6 7 = 2, = ” = ” !" = = 2 = 4 % = { 1,&& }∪(2,=)} # ? ? ? ? 1 2 3 4 5 6 7 = 3, = ” $ ” !" = = 3 = 5 & = 1,&& , 2,= ∪ (4, $ )} && = " $ && && Figure 5.5: Step-by-step construction of invariant candidate. Figure 5.5 illustrates the construction of an AST rooted at Node 1. Assume that all nodes have the initial value;, meaning they are not yet part of the AST. Furthermore, assume the root node has the type , meaning it is a Boolean expression. Our method starts with Node 1. If it assigns the operator "&&" to Node 1, the tree maps toI 1 := {(1, &&)}. According to the DSL in Figure 5.4, the child node types must be 2 = 3 =. Our method continues with Node 2. If it assigns the operator "=" to Node 2, the tree maps toI 2 := {(1, &&), (2, =)}. According to the DSL, the child node types are 4 = 5 =a. By following the DFS order, our method lls the entire tree, to obtain the invariant candidate. Some nodes may remain;, meaning they are still not part of the AST. 5.2.3.2 ModifyinganAST Figure 5.3 illustrates the construction of the next AST by modifying the current AST. For now, let us focus on the two ASTs on the left-hand side, since they correspond to the baseline. Here,AST i is the current AST, andAST i+1 is the next AST that our method generates. According to the DFS order, if the backtracking point (N dl ) is 7, we should modify Node 7. 85 Since Node 7 is of the type 7 =a 0 , which may be eithervar orcvar, we change Node 7 fromvar tocvar. This results in assigning the operator “” to Node 7 and then assigning values to the child nodes accordingly. The new backtracking point is set toN dl = 15. 5.2.4 GeneratingInvariantCandidates We now present the subroutineGen_Next_InvCandidate, shown in Algorithm 4. Let us ignore the brown colored statements for now, since they are specic to our RL based optimization. Given the current candidateI old , and the backtracking leveldl, the subroutine retains the values of all nodes inL[0 : dl 1] and regenerates values of the remaining nodes as follows. First, for the node L[dl], it picks a value that has not yet been visited by this node, and then labels this value asvisited in its grammar setG[ dl ] (Lines 7-8). Then, starting fromL[dl], it creates an AST rooted atL[dl] by recursively applying the production rules in Figure 5.4. At the end, it computes the new backtracking leveldl. If there are still unvisited values inG[ dl ], where G = G[dl], the backtracking level remains unchanged. Otherwise, it becomes the lastdl 0 whereG[ dl 0] contains unvisited values. After the new backtracking level is found, our method also resets the grammar set asunvisited for all levels in between. Whenever the RL based optimization is enabled, the brown colored statements will be executed. There are two main dierences between this version and the baseline. First, instead of traversing the ASTs in a strict DFS order, it picks the valuev by sampling according to a probability distribution given byP RL (computed by the RL agent), and documents the historyhI;v; 0i in a traceT . Second, at the end of the procedure, instead of backtracking based on the strict DFS order, it always backtracks all the way todl = 1. 86 Algorithm4 SubroutineGen_Next_InvCandidate. 1: InputI old ,dl,G,C, andP RL . 2: OutputI, TraceT 3:I values ofL[0 :dl 1] inI old 4: while :AllNodeAssigned(L;I) do 5: G G[dl];N dl L[dl]; dl type[N dl ] 6: if P RL =undef then 7: v Pick the rstunvisited value in grammarG[ dl ] 8: Labelv asvisited inG[ dl ] inG 9: else 10: vP RL [; dl ] 11: endif 12: SetChildrenType(N dl ;v); 13: I I[f(N dl ;v)g; . Adding to partial AST 14: T T [ ; 15: dl dl + 1 16: endwhile 17: if all values inG[ dl ] are visitedthen 18: dl 0 dl . Backtracking 19: while all values inG[ dl 0] are visiteddo 20: dl 0 dl 0 1; 21: G G[dl 0 ];N dl 0 L[dl 0 ]; dl 0 type[N dl 0] 22: endwhile 23: LabelG[k] asunvisitedforalldlk>dl 0 24: dl dl 0 25: endif 26: ifP RL =undef then 27: dl 1 28: endif 29: returnI,T 5.2.4.1 SyntacticFiltering We have implemented several syntactic ltering techniques to optimize the baseline method, to get rid of the obviously bad invariant candidates. First, we perform a light-weight checking ofI before delivering it to Proved_ Inductive subroutine. For instance, with a set of conict predicates stored inC (which are computed by our LR based optimization), we rst verify ifI is consistent withC and if it is, we reject it without further verication. Second, we enforce a lexicographical ordering over operands under commutative operators (e.g.,^), to rule out the semantically-equivalent but syntactically dierent invariant candidates. For instance, ifpred 1 ^pred 2 has been explored before, thenpred 2 ^pred 1 will not be explored in the future. 87 5.2.4.2 Verication After generating the invariant candidateI, we use an SMT solver based verier to check whetherI is inductive and sucient, using the subroutines Proved_Inductive and Proved_Sufficient. They are based on the three conditions at the end of Section 5.1.1. Recall that Conditions (a) and (b) implies thatI is inductive and Condition (c) implies thatI is sucient. They can be checked by rst constructing two formulas,F I andF S , based on Eq. 5.1 and Eq. 5.2 in Section 5.1.3, and then solving the formulas using an o-the-shelf SMT solver. 5.3 LRbasedPruning In this section, we present our logical reasoning (LR) based optimization implemented in the subroutine Prune_by_LR, which is used by Algorithm 3. At this moment, the invariant candidateI has been rejected by the verier. Our goal is to analyze the reason whyI fails and learn from it. Algorithm 5 shows the pseudo code. Here, the input consists of the merged programP , the relational property', and the failed invariantI. The output consists ofS C and c , whereS C is an unsatisability (UNSAT) core and c is a conict predicate. Together, they illustrate the reason whyI fails the verica- tion. Besides the output, the procedure also updates two global data structuresC anddl, whereC is the accumulative set of all conict predicates generated so far, anddl is the backtracking level. In the remainder of this section, we present our method for constructing theUNSAT coreS C , computing the conict predicate c , performing non-chronological backtracking (by changingdl), and computing the strengthening predicate S . 88 5.3.1 ConstructingtheUNSATCore We take the inductive part ofF I for demonstration.F I :=8v:fI(v)^g(v)g S(v;v 0 ) fI(v 0 )g. In a loop’s bodyS,v stands for old variables (incoming to the loop) andv 0 for new ones (outgoing from the loop), e.g., a statementx =x + 1 in a loop’s body is encoded asx 0 =x + 1 in as an SMT formula. To identify the reason why a candidate fails, we leverage the SMT solver’s capability of extracting UNSAT cores from an unsatisable formula. However, this is not straightforward because formulasF I and F S , which are used for verication, contain universal quantier (8), and when they fail verication, the SMT solver returns satisfying solutions for the negated formulas:F I and:F S . However, to generate UNSAT cores, there must be unsatisable formulas to start with. Thus, the question is how to construct unsatisfying formulas from these two satisfying formulas? Counterexamples. Consider:F I . When it is evaluated asSAT, the solver returns a model, consisting of values assigned to the variables that makesI fail the verication. While it may be tempting to infer the root cause of the failure from this specic model, the result would be unsound in general, and most likely would not make sense in practice. This is because the model may be inconsistent with the precondition in the relational specication. In fact, checking if the model can be derived by the precondition would require the construction of a long series of recursion-free unwindings [232] . In this work, we propose a novel technique to overcome the aforementioned challenges. Our method relies on constructing a so-called mirror formula,:M F , such that:M F being unsatisable implies thatF I does not hold and candidate invariantI is invalid. Therefore, we can use the formula:M F to extract the UNSAT core. However, it is worth noting that the reverse does not have to be true. The invalidity ofI does not imply the unsatisability of:M F . 89 Algorithm5 Our LR based search space pruning. 1: procedurePrune_by_LR(P ,',I) 2: if CheckUnsat(:M F ) then 3: S C ObtainUnsatCore(P ,',I) 4: c UpdateTraverseOrder(S C ;I;C) 5: else 6: S C ; c ;;; 7: s ObtainAbductPred(P ,',I) 8: if CheckFeasible( s ;I)then 9: I I^ s 10: if Proved_Inductive(P ,',I)thenI i I 11: endif 12: endif 13: endif 14: returnS C ; c 15: endprocedure 16: procedureUpdateTraverseOrder(S C ;I;T NOW ;C) 17: c S C \f i j i 2Ig . conict predicate 18: C C[ c 19: M I {<n i ; i > | 1n i 2 H 1; i AST I } 20: n c GetValueByKey(M I ; c ) .M I [n c ] = c 21: dl n c 22: return c 23: endprocedure 5.3.1.1 TheMirrorFormula Denition 5.3.1 (Mirror Formula). Assuming the verication problem requires the validity of the F I , dened as the Hoare triple8 v:fI(v)^gg S(v;v 0 )fI(v 0 )g, the mirror formula:M F is dened as:8v:fI(v)^ggS(v;v 0 )f:I(v 0 )g. WheneverF I fails to be veried, we check if:M F is unsatisable. If:M F is indeed unsatisable, we use theUNSAT core extracted from:M F to identify the root cause, which in turn can guide us to prune the search space. Using the mirror formula to explain the root cause of an invalid formulaF I is sound in that, as long as an explanation can be found in this way, it is guaranteed to be the root cause. This is stated in the following theorem. Theorem 5.3.2 (Soundness of:M F ). Given an invariant candidateI and the correspondingF I , the unsatisability of its mirror formula,:M F , implies the invalidity ofI. 90 We provide the following formal proof to describe the the intuition behind the theorem 5.3.2, which illustrates the relationship betweenI and:M F . Our key insight is to come up with a negated formula such that when it is UNSAT, it implies that the invariant is invalid. According to Denition 5.3.1,M F :=8v:fI(v)^ggS(v;v 0 )f:I(v 0 )g, ifM F is satisable, then all its conjuncts, including:I(v 0 ) evaluate to true. Consequently, ifM F is satisable then I(v 0 ) is false, i.e., the invariantI is invalid. SinceM F is universally quantied, the solver evaluates its negated form:M F , and when it isUNSAT, it means that non-negated one isSAT, and hence,I is invalid. Now,UNSAT cores can be extracted from:M F to prune the search space. This approach catches only some cases forI being invalid, i.e., whenI does not work with the fresh variablesI(v 0 ). There could be cases whenF I fails on its other conjuncts, e.g., onI(v), but the mirror formula won’t be able to detect those cases. Specically,I(v) may not be strong enough to implyI(v 0 ). We will elaborate how we handle this scenario in Section 5.3.3. 5.3.1.2 TheUNSATCoreExample For the motivating example in Figure 5.2, when the inductive condition,F I , fails to be veried for the rst invariant candidate shown in Section 5.1.3, our method constructs the mirror formula and then computes theUNSAT core: a13:kN =k + 1^ a14:kN =k + 5^ a15: (i< 5^kNN =kN)_ (i 5^kNN =kN + 3) a17: (i< 5^kNN =kN)_ (i 5^kNN =kN + 15)^a21 :kNN =kNN For ease of presentation, the program variables are shown in the static single assignment (SSA) format: kN represents the updated version ofk and kNN represents the updated version of kN. Inside thisUNSAT core, onlya21 is from the invariant candidateI, while the rest of the constraints in theUNSAT core encodes the program semantics. Therefore, our method labelsa21 as the conict predicate 91 c , highlighted by the red dashed box on the right side of Figure 5.3(a). In other words, any invariant candidate that contains c is guaranteed to fail verication for the exact same reason. 5.3.2 Non-chronologicalBacktracking We now discuss how theUpdateTraversalOrder procedure in Algorithm 5 leverages theUNSAT coreS C to update the backtracking leveldl. SinceS C contains both constraints that encode the program semantics and constraints from the invariant candidateI, by intersectingI withS C (Line 13), we are able to extract the conict predicate c that falsiesF I . We leverage the conict predicate c to prune the search space, by forcing the baseline DFS traversal procedure to perform a non-chronological backtracking. Technically, this is accomplished by changing the value of the backtracking level (dl), which is a global variable. This allows our method to skip any redundant invariant candidates that share the same conict predicate c . For the running example in Figure 5.3, without the help of c , the baseline DFS traversal would have changed the value of Node 7 ofAST i to the * type and obtainAST i+1 , shown on the left of Figure 5.5 (b). Unfortunately, since the new invariant candidate still contains c , it would fail verication again. Further- more, if the DFS traversal continues along this subtree, it may generate many other ASTs, all of which contain c and thus would fail for the exact same reason. In contrast, our LR based optimization would force the DFS traversal to backtrack to Node 5 of the currentAST i , by changingN dl to 5 as shown on the right of Figure 5.5 (a). As a result, it would avoid generating the large number of redundant ASTs. Instead, the newAST i+1 would be the one shown on the right of Figure 5.5 (b), where the conict predicate { k =k} is now replaced byf k =k 5g. As shown in Algorithm 5, with the conict predicate c , our method conducts two types of optimizations: clause memorization and non-chronological backtracking. 92 For clause memorization, we compute a forbidden set,C, which is the union of all conict predicate sets ( c ). To avoid growing the forbidden set innitely, we bound the size ofC to a constant by removing the less frequently used predicate, following the popular least recently used (LRU) policy for cache replacement. In this context, however, the frequency refers to the number of invariant candidates that have conicts with the predicate. For non-chronological backtracking, we compute a mapM I which, given a node index, returns the corresponding subtree of the invariant candidateI. Here, i AST I means the AST representing i is a subtree of the AST representingI. UsingM I , we can locate the noden c corresponding to the conict predicate c , as shown in Line 16 of Algorithm 5. Based on noden c , we can modify the backtracking level dl accordingly. 5.3.3 TheStrengtheningPredicate It is worth noting that, if the mirror formula:M F isSAT, it does not imply the validity or invalidity ofI. Furthermore, there is no conict predicate that falsiesF I . AlthoughI does not yield conicts in this case, it still fails the inductive part of verication. The reason is thatI(v) is not strong enough to implyI(v 0 ). In other words, the failure is due to the inherent weakness ofI, rather than the conict predicate ofI. In such a case, we try to strengthenI to make it inductive (Lines 6-10 of Algorithm 5). In general, there can be two reasons why a candidate fails the verication. One reason is that it is overly constrained, e.g., by a conict predicate, and the other reason is that it is under constrained. In the latter case, we try to strengthen it by conjoining with an additional predicate. In the running example, for the invariant represented byI i+1 on the right of Figure 5.5 (b), the strengthening predicate would be S =fi = ig. The conjoined formulaI i+1 ^ (i = i) is able to pass the checkF I . 93 This is known as abductive reasoning in the literature [71, 70, 75, 173], and such techniques have been implemented in many existing tools. Our method relies on the built-inget-abduct function of the CVC5 solver to implement a subroutine namedObtainAbductPred, which starts with a true but not inductive invariantI, and iteratively strengthens it. In Lines 6-10 of Algorithm 5, we invoke the subroutine when the current candidateI is consistent with the program semantics but not yet inductive. It is worth noting that not all solutions returned by CVC5 are feasible and useful. That is why, in Lines 8 and 10, we check the feasibility of s and make sure it can make I inductive. The inductive candidateI i (line10) is subsequently used for further light-weight checkings similar to Sec 5.2.4.1. 5.4 RLbasedPrioritization In this section, we present our RL based optimization implemented in the subroutinePrioritize_by_RL, which is used by Algorithm 3. At this moment, theUNSAT coreS C , the conict predicate c , and the rollout traceT have all been computed for the failed candidateI. The rollout traceT , in particular, represents a sequence of values chosen during the construction ofI. Internally, our method rst computes the available information to compute the reward, and then relies on the reinforcement learning (RL) agent to compute a policy gradient. Finally, the policy gradient will be used to update the data structureP RL used by Algorithm 4. In the remainder of this section, we present our method in detail. 5.4.1 ThePolicyP RL Inside Algorithm 4, if there are multiple values that can be used to ll the current nodeN dl , the invariant synthesizer picks a value forN dl based on a probability distribution of these values provided byP RL . For instance, if the type of the current nodeN dl is dl =, which may have values “:”, “_” and “^”, and if the 94 . . . . . . . . . p[( =;v =:)] p[( =;v =_)] p[( =;v =^)] . . . p[( =i;v =c)] = : p[v =:] = 0:12; p[v =_] = 0:16; p[v =^] = 0:47 Hidden layer Input layer Output layer Embedding State Action I Ndl P RL (GRUNetwork) Figure 5.6: Illustrating the use ofP RL in our method. probabilities for these values are 0.12, 0.16 and 0.47, respectively, the likelihood of picking the operator "^" will be the highest. Our RL based optimization ensures thatP RL represents a policy that maximizes the chance of generating good invariants. Toward this end, we model the search for invariants in the hypothesis space as a Markov Decision Process (MDP), where a state is represented by the partial invariantI together withN dl , the node whose value will be lled next, and an action represents a possible value forN dl . We uses standard reinforcement learning techniques over theMDP to compute the policyP RL , which is then represented as a GRU network shown in Figure 5.6.P RL takes a state as input, and outputs the probability of each syntactic construct associated withN dl , such as negation ":" and conjunction "^". 5.4.2 TheRewardFunction The main dierence between our method and prior works on using RL techniques [191, 45, 57] is that, while they merely incorporate the negative feedback (i.e., the candidateI has been rejected by the verier), we are able to extract a richer set ofpositive andnegative feedbacks from the verier. Thus, we can more eectively aggregate and amplify feedbacks from the verier. At the center of our method is therewardfunction constructed using results of our logical reasoning (LR) subroutine. It aims to penalize bad candidates and reward good candidates. Here, bad candidates are the ones that contradict to the semantics of the merged programP . Whenever a candidateI contradicts to 95 the program semantics, we can construct an unsatisable formula and leverage the SMT solver to compute anUNSAT coreS C . In contrast,good candidates are the ones that are consistent with the program semantics. Recall that wheneverI is consistent with the program semantics, it satises the mirror formula:M F , althoughI is still not inductive. Based on the observation, we dene the reward function r() = 8 > > > > > > > > > > > > > > < > > > > > > > > > > > > > > : 0 ifI is incomplete o/w 8 > > > > > > > > > < > > > > > > > > > : +1 if:F I is UNSAT o/w 8 > > > > < > > > > : +0:5 ifS C =; (i:e:;:M F ) is SAT) 1 ifS C 6=; (i:e:;:M F ) is UNSAT) (5.4) The function assigns the reward only whenI is a complete AST, i.e., each AST node has been assigned either a concrete value or NULL. IfI is a good candidate (i.e.,:F I is unsatisable), it assigns +1 as the reward. Otherwise, in the second o/w case of Eq. (5.4), we check if:M F is satisable (i.e.,S C =;), which meansI is consistent with the program semantics but not yet inductive. Thus, we assignI a positive reward of +0.5, to bias the exploration toward this direction. However, if:M F is unsatisable (i.e.,S C 6=;), based on Theorem 5.3.2, predicates inI must contradict to the program semantics. Thus, we assignI a negative reward of -1, to bias the exploration against this direction. Thus, our design aims to provide potentially valid (good) candidates with positive rewards and always- conicting (bad) candidates with negative rewards. Furthermore, we extract more candidates from a failed candidate and use the derived candidates to provide ne-grained feedback to the RL agent. In contrast, prior works such as Chen et al. [57] only give negative rewards to the failed candidates. Their sparse reward design makes it more dicult for reinforcement learning to converge. 96 5.4.3 GeneratingMoreFeedback To amplify the feedback from a failed candidate, we propose techniques for deriving other bad candidates (I 0 ) from a bad candidateI, such thatI 0 fails the verication for a similar reason. In other words, we can useI 0 to update the policy without exploring it in the rst place. Recall that in Section 5.3, we compute theUNSAT coreS C for a bad candidateI, together with the set of conict predicates c , which is a subset of the constraints ofI . TakingS C and c as input, we obtainI 0 by mutating operators or operands inI such thatI 0 ^P (i.e.,I 0 ^ (S C n c )) remainsUNSAT. It ensures thatI 0 fails verication due to the sameUNSAT Core. As an example, consider the failedI =fx =x^k =kg in the motivating example, from which we can obtain the UNSAT CoreS C =fkN =k + 1 ^kN =k + 5 ^k =k^kN =kNg and the conict predicate c =fk =k^kN =kNg. Assume that the dierence between the two,S C n c =fkN =k + 1^kN =k + 5g, encodes part of the program semanticsP . In this case, we may mutateI to obtainI 0 =fx =x^k + 1 =kg. Since I 0 ^P remainsUNSAT, the newly createdI 0 is guaranteed to fail verication for the same reason. Given the reward function, policy gradient methods [203] can be used to update the policyP RL . Recall that, in Algorithm 4, each invariant candidateI corresponds a rollout trajectoryT =f(s 1 ;a 1 ;r 1 ); (s 2 ;a 2 ; r 2 ):::(s jTj ;a jTj ;r jTj )g, which is a sequence of state-action-reward tuples, obtained by picking the actions using the current policyP RL . In the nal state,s jTj =. Each candidateI corresponds to a trace T . A set of new tracesT 0 is obtained by the newly generatedI 0 . To amplify the solver feedback, the policy gradient is computed based on a set of tracesT 0 rather than a single traceT . The objective of policy gradient methods is to update the policyP RL such that it maximizes the expected cumulative reward. 97 Table 5.1: The list of relational verication benchmarks. Name Description of the verication Name Description of the verication problem problem E1 Strength Reduction [23] N1 Array Safe [14] E2 Loop Simple Optimization [23] N2 Array Unsafe [14] E3 Loop Align [23] N3 LoopAndBranchSafe [14] E4 Loop Pipelining [23] N4 LoopAndBranchUnSafe [14] E5 Loop Sinking [23] N5 NoSecret Safe [14] E6 Loop Unswitching [23] N6 NoTaint Unsafe [14] E7 Loop Var Reduction [23] N7 Sanity Safe [14] E8 Static Caching [23] N8 Sanity Unsafe [14] N11 modPow1 Safe (DARPA STAC) C1 BubbleSort [53] N12 modPow1 Unsafe (DARPA STAC) C2 Insertion-sort (Inner) [53] N13 modPow2 Safe (DARPA STAC) C3 Insertion-sort (Outer) [53] N14 modPow2 Unsafe (DARPA STAC) C4 Selection-sort (Inner) [53] N15 k96 Safe [92, 163] C5 Selection-sort (Outer) [53] N16 k96 Unsafe [92, 163] C6 Bellman-Ford [53] N17 gpt14 Safe [92, 163] C7 Floyd-Warshall [53] N18 gpt14 Unsafe [92, 163] 5.5 Evaluation We have implemented our method in a software tool (Code2RelInv), which relies on LLVM 3.6 to parse the merged C programs and construct the internal representation (IR) for the programs. It considers three types of relational properties: equivalence, continuity and non-interference, by encoding them uniformly at the IR level as a set of logical constraints. For equivalence, the encoding (e.g.,x =x) is straightforward. For continuity, the encoding is guided by the set of continuity analysis rules from Chaudhuri et al. [53]. For non-interference, it adopts the instrumentation-based technique of Chen et al. [55] to account for secret-induced resource usage. Our baseline SyGuS search procedure is implemented in C++. Our LR based optimization is implemented using Z3 as the SMT solver to compute conict predicates. It also uses CVC5 to compute abductive predicates. Our LR based optimization is implemented using PyTorch, which has RL agents for computing the exploration policy. For evaluation purposes, we have compared our method with the state-of-the-art toolCode2Inv [193] on all benchmarks. 98 5.5.1 Benchmarks Our benchmarks consist of 33 relational verication problems from three sources, as shown in Table 5.1. The equivalence verication problems, E1 to E8, are from Barthe et al. [23]. The goal is to prove the correctness of various types of loop optimizations [123, 27, 15, 2]. Thecontinuity verication problems, C1 to C7, are from Chaudhuri et al.[53]. The non-interference verication problems, N1 to N18, are from the DARPA STAC program and other side-channel security examples [92, 163]. Here we omitN9,N10 since they are straight-line code without loops. While some of the programs were in Java [14], we translated them to C before applying our tool. Our underlying program verier supports linear integer arithmetic (LIA) and array theories. Our experiments were designed to answer two questions: RQ.1 How eective isCode2RelInv in generating the desired relational invariants? RQ.2 How eective are the new LR-based and RL-based techniques in reducing the search space? We ran all experiments on Cloudlab with Intel Xeon Silver CPU 2.20GHz along with NVIDIA 12GB PCI P100 GPU. 5.5.2 ResultsforEvaluatingtheEectiveness To answer RQ.1, we applied our method to all benchmarks, and compared it with Code2Inv [193]. The results are shown in Table 5.2. For each benchmark, Column 1 shows the name, and Columns 2-5 show the running time of our method in seconds and the quality of the invariant. Here,T base denotes the baseline, T +RL denotes the baseline plus RL-based optimization, andT +RL+LR denotes the baseline plus RL- and LR-based optimizations. Columns 6-7 show the running time of Code2Inv [193] and the quality of its invariant. Here, T/O means timed out after 4 hours,3means the invariant is both inductive and sucient, 99 Table 5.2: Comparing the performance of our method (Code2RelInv) and the existing method Code2Inv [193]. Benchmark Code2RelInv Code2Inv[193] Benchmark Code2RelInv Code2Inv[193] T base (s) T+RL (s) T+RL+LR (s) Qual TCode2Inv (s) Qual T base (s) T+RL (s) T+RL+LR (s) Qual TCode2Inv (s) Qual Equivalence Non-interference E1 46.20 33.88 9.83 3 1623.07 3 N1 44.91 36.20 17.44 3 - - E2 49.87 29.01 15.33 3 1899.45 3 N2 18.54 29.31 10.19 3 - - E3 49.26 40.41 14.12 3 - - N3 46.36 37.21 12.72 3 2556.73 3 E4 T/O 1225.41 809.65 3 - - N4 35.08 30.60 10.05 3 - - E5 42.35 37.49 17.29 3 - - N5 33.08 29.13 12.08 3 2177.92 3 E6 48.25 38.36 15.14 3 - - N6 35.19 34.29 9.88 3 - - E7 T/O 218.37 24.50 3 - - N7 44.47 39.92 16.34 3 T/O 7 E8 T/O 1642.55 1140.72 3 - - N8 42.52 38.83 15.17 3 - - Continuity N11 46.47 32.34 13.05 3 T/O 7 C1 45.82 34.95 15.86 3 - - N12 45.86 34.22 13.18 3 - - C2 44.95 35.12 13.88 3 - - N13 67.02 35.25 15.50 3 T/O 7 C3 45.27 32.02 15.22 3 - - N14 67.52 43.34 17.46 3 - - C4 45.04 34.85 20.67 3 - - N15 96.62 46.22 29.18 3 T/O 7 C5 44.97 35.83 18.97 3 - - N16 102.32 49.15 27.44 3 - - C6 43.24 35.09 17.61 3 - - N17 91.94 46.24 18.54 3 T/O 7 C7 45.04 32.40 17.38 3 - - N18 145.69 47.96 14.48 3 - - 7means the invariant is not sucient, and means the method fails to generate any invariant (due to assertion failure or exception). The results in Table 5.2 show that our method is signicantly more eective in generating relational invariants. In fact, it succeeds on all benchmarks, whereasCode2Inv [193] only succeeds on four of them. In the four cases where it succeeds, it runs more than 1000x slower than our method. It also has 4T/O cases. 5.5.3 ResultsforEvaluatingtheOptimizations To answer RQ.2, we compared the running time of our method, with and without the learning based optimizations, also in Table 5.2. The running time of the baseline with syntactic ltering (T base ) is the largest, including threeT/O cases (E4, E7 and E8). The reason why E4, E7, and E8 are dicult is because the invariants needed to prove these properties are more complex and the depth of their corresponding ASTs are 5-8. As a result, the baseline version has to explore an extremely large candidate space. With the RL based optimization, the running time (T +RL ) is signicantly reduced; all benchmarks are completed within 0.5 hour. With both the RL and the LR based optimizations, the running time (T +RL+LR ) becomes the shortest. To better understand why our RL and LR based optimizations are eective, we also collected the number of invariant candidates explored by our method. The results are shown in Table 5.3. Here, # base is the 100 Table 5.3: Comparing the number of invariant candidates explored by our method with dierent optimiza- tions. Benchmark Inv. Candidates Benchmark Inv. Candidates # base # +RL # +RL+LR # base # +RL # +RL+LR Equivalence Non-interference E1 1389 147 5 N1 1244 284 101 E2 1627 139 22 N2 276 117 38 E3 1953 245 18 N3 1381 340 56 E4 - 8735 5086 N4 323 112 29 E5 1538 175 31 N5 148 104 21 E6 921 131 124 N6 317 159 23 E7 - 2612 172 N7 1265 379 108 E8 - 10483 7241 N8 1025 356 93 Continuity N11 1362 203 57 C1 1402 397 114 N12 1395 268 52 C2 1360 445 85 N13 1686 313 64 C3 1397 352 123 N14 1601 328 49 C4 1528 392 216 N15 2857 335 127 C5 1245 461 176 N16 2914 409 156 C6 1169 417 183 N17 2673 381 114 C7 1443 305 161 N18 3225 376 68 number of ASTs (of invariant candidates) explored by the baseline SyGuS search. # +RL is the number of ASTs explored after adding RL-based optimization, and # +RL+LR is the number of ASTs explored after adding both optimizations. For each benchmark, the minimal number is in bold font. The results show that, among the three versions, # +RL+LR is always the smallest. Furthermore, in many cases, such as E1, the reduction is drastic (from 1389 candidates to 5 candidates). On average, our method is able to skip 89.4% of invariant candidates. We also investigated the two individual components in LR based pruning, for computing conict predicates and abductive predicates, respectively. We found that, in general, the time to compute conict predicates is short and yet non-chronological backtracking based on these conict predicates is almost always eective in speeding up our method. In contrast, the time to compute abductive predicates may be signicantly longer, and may not always speed up our method. InE4,E7,C1,C3 andC5, it took an extremely long time. This is because the get-abduct function of CVC5, which is the abductive reasoning routine used in our method, may diverge in an innite chain of speculations. Thus, unlike conict predicates, abductive predicates must be used more judiciously. 101 Nevertheless, our results show that, by using abductive predicates and conict predicates in the same procedure, we can improve the overall performance consistently. 5.6 RelatedWork InvariantSynthesis. We have already mentioned two most closely related invariant synthesis techniques: Code2Inv [193] andLinearArbitrary [232]. LinearArbitrary is adata-driven technique, which uses sampled data to generate linear classies. Other examples in this category include ICE-DT [90, 89], LoopInvGen [159, 160], Guess-and-Check [188, 187], and [185, 91]. A problem with these techniques is that, while the synthesized predicates are consistent with sampled data, they may over-t and thus produce unnecessarily complicated invariants. We have shown an example of this problem in Section 5.1. Our method does not have this problem, because it focuses on the program semantics instead of the sampled data. Code2Inv [193] is aneuralnetworkbased technique, which utilizes graph neural networks to encode the program dependency and TreeLSTM to embed the partial invariant [191]. Other techniques in this category include Cln2Inv [182] and G-CLN [224]. The main problem with these techniques is that, since neural networks focus on encoding program dependency information, they are often ineective in synthesizing relational predicates. This has been conrmed by our experiments in Section 5.5. There are also other techniques for synthesizing polynomial invariants using program analysis techniques such as symbolic execution [151, 153, 152], abstract interpretation [175, 176], or compositional recurrence analysis [81, 117, 118]. However, they all target a single program, whereas our method aims to verify relational properties. There is also a class of constrained Horn clause (CHC) solvers, developed for generating loop invariants but in principle may be used to verify relational properties as well. We have evaluated a state-of-the-art CHC solver, Spacer [122]. Unfortunately, it returns unknown for most of our benchmarks. Given a set of CHC constraints with unknown predicate symbols, the CHC solver aims to produce a denition of 102 the unknown predicate symbols such that all the constraints are satised. This is accomplished by rst checking if all bounded unrollings of the CHC system satisfy the constraints, and then increasing the bound gradually until the proof no longer depends on the bound. Some CHC solvers [180, 140, 232, 12, 109] focus on developing new unwinding techniques while other solvers [122, 190, 107, 147] implicitly unwind the system. Specically, Shemer et al. [190] rene the property directed inference technique to support relational verication, which is orthogonal to our learning-based method for producing relation invariant. ProgramSynthesis. Besides invariant synthesis, learning based techniques have been used to improve program synthesis [45, 130, 191, 137]. Most of them utilize on-policy learning and often take the verier’s resultasis. An exception is Chen et al. [57] who perform o-policy learning and incorporate some additional feedback from the verier. However, it does not use ne-grained feedback such as the ones computed by our method, from both the conict predicates and the abductive predicates. Furthermore, the enumerative search procedure in [57] may produce ill-formed candidates, which do not occur in our method. Besides learning, other types of information have also been used for pruning the search space [82, 83, 84, 167, 214, 99]. Some of them leverage semantic information of the DSL to check the feasibility of partial programs [82, 83], while others, such asBlaze [214], use abstract interpretation to build the space of feasible programs. There are also type-directed pruning techniques to avoid infeasible programs [84, 167, 99, 86, 157]. However, our LR based pruning goes far beyond by pruning these well-formed but semantically-weak program candidates. Relational Verication. In relational verication, one widely used approach is to carefully craft a domain-specic proof logic [200, 223, 55, 31] or a set of domain-specicproofrules [53]. Another approach is to construct and leverage a merged program via syntactic or semantic alignment [23, 165, 58, 56, 190]. While the two approaches dier, both require high-quality relational invariants to make the proof go through. While some prior works in this domain [55, 216] also involve invariant synthesis, they focus on simple equalities which are too weak for most of benchmarks used in this paper, including the motivating example 103 in Section 2. Furthermore, unlike our method, which requires the invariants to be both inductive and sucient, they do not guarantee that the generated invariants are sucient [55]. 5.7 Summary I have presented a method for synthesizing relational invariants that are guaranteed to be bothinductive and sucient. This method leverages both syntax guided synthesis (SyGuS) and learning based techniques to prune the search space and prioritize the search. I have evaluated our method on a diverse set of relational verication benchmarks where the properties include equivalence, continuity, and non-interference. The experimental results show that this method can generate high-quality invariants for all cases whereas a state-of-the-art invariant synthesis tool fails most of the time. Furthermore, the learning based optimizations drastically reduce the search space. 104 Chapter6 QuantifyingSide-channelLeaks Datalog has been widely used for program analysis, including data-race detection, side-channel analysis, points-to analysis, and taint analysis. These analyses are all qualitative; for instance, taint analysis either returnsUKD (i.e., the type of a variable is Unknown) orSID (i.e., a variable is Statistically Independent of the Secret and thus secure). While the qualitative analysis techniques presented in Chapters 3 and 4 work well in practice, there are instances when developers want to estimate the likelihood of information leakage. This necessitates expanding qualitative analysis to incorporate quantitative information. However, techniques for computing such quantitative information are still severely lacking. To capture the quantitative information, the Datalog based analysis needs to track the probability propagation, which could be used to answer "How likely is a certain variable to leak information?". In this context, the predicates and the rules in Datalog are not perpetually true; they are instead associated with specic probabilities. Existing works, such asProbLog [68], referenced in studies [68, 129], allow users to assign probabilities to predicates and rules in Datalog. This process then transforms the Datalog program, which includes probabilities, into a weighted Boolean formula, thereby converting probabilistic inference tasks into weighted model counting. However, ProbLog does not accurately compute the joint probability when individual predicates have dependencies. For instance, given a rule as followsH :B 1 ;B 2 ;:::;B n : The 105 inference insideProbLog merely multiplies the probabilities of each predicate, i.e.,P (B 1 )P (B 2 ) P (B n ), neglecting the potential dependencies amongB 1 ;B 2 ;:::;B n . This issue is further elaborated in Section 6.1. One possible solution to makeProbLog cognizant of the dependencies amongB 1 ;B 2 ;:::;B n involves explicitly encoding the conditional probability (e.g.,P (B 1 jB 2 )) as auxiliary rules. To dene these auxiliary rules, one would need a specic value for the conditional probability, such asp c , to encodeP (B 1 jB 2 ) = p c asp c :: B 1 : -B 2 . However, this solution has two main limitations. First, even though we use this auxiliary rule to encode the conditional probability p c , the results returned by ProbLog are incorrect, deviating from the ground truth as calculated by Bayes’ theorem. This discrepancy is further discussed in Section 6.1. Second, obtainingp c is not always practical. Often, developers might only be aware of the general dependency relationship between distinct events likeB 1 andB 2 , rather than possessing the precise conditional probability value,p c . To address this challenge, we propose a new Datalog inference framework - QDatalog. Instead of associating each fact and rule with a singular probability value, we associate them with an interval, i.e., [lb,ub]. This interval denotes the lower and upper bounds of the probability for a particular relation or predicate being true. This approach provides users with a means to approximate conditional probability, especially when the dependency relationship betweenB 1 andB 2 is unclear. Our method issound, ensuring that the actual value always falls within the intervals determined byQDatalog. The remainder of this chapter is organized as follows. First, we motivate our work in Section 6.1 using examples. Then, we present the technical background in Section 6.2, followed by the high-level procedure of our method and the detailed algorithms and subroutines in Section 6.3. We present the experimental results in Section 6.4, and nally give our summary in Section 6.5. 106 6.1 Motivation In this section, we use examples to illustrate (i) what we want to achieve, (ii) limitations of existing techniques, and (iii) advantages of our new method. 6.1.1 Examples 0.3::A(x). 0.5::B(y). R(x,y). S(z). 0.9::T(z) :-A(x), B(y), R(x,y), S(z). query(T). Listing 6.2: PredicatesA(x) andB(y) do not have dependency. 0.3::A(x). 0.5::B(y). R(x,y). S(z). 0.45::A(x) :-B(y). 0.55::\+A(x) :-B(y). 0.9::T(z) :-A(x), B(y), R(x,y), S(z). query(T). Listing 6.3: Predicates A(x) and B(y) have dependency. If B(y) happens, the probability of A(x) increases. For instance, A(x) denotes that person x has the Alzheimer’s disease while B(y) denotesy (e.g.,x’s parent) also has the Alzheimer’s disease. P(A(x)|B(y))=0.45. We presented our example in Listing 6.2 and 6.3 respectively. In Listing 6.2,A(x) represents a relation A that takesx as its parameter. B(y), R(x,y), S(z) also represent relations that take parametersx, y, z. The notation 0.3::A(x) indicates that the probability of the relation A(x) being true stands at 30%. The constrcutA::B represents that the probability of B being true equals toA. Similar semantics applies to 0.5::B(y). The notationR(x,y). conveys thatR(x,y) is constantly true. Similar semantics applies to S(z). Within the Datalog rule0.9::T(z) :-A(x), B(y), R(x,y), S(z)., everything to the right of :- describes the body of this rule while the left signies its head. The comma denotes a logical conjunction, and the period marks the conclusion of a Datalog rule. This particular rule asserts that if all the relations in the 107 bodyA(x), B(y), R(x,y), S(z) are true, then the head relationT(z) holds true with 90% probability. The nal command,query(T), is deployed to ascertain the probability of the eventT being true. The only dierence between Listing 6.2 and 6.3 is the dependency relation between relationsA(x) and B(y). In Listing 6.2, relations A(x) and B(y) are independent. In contrast, in Listing 6.3, relations A(x) andB(y) are dependent. More specically, the rules0.45::A(x) :-B(y). and0.55::\+A(x) :-B(y). have illustrated the dependency relation. The rule 0.45::A(x) :-B(y). denotes that if relation B(y) is true, then the probability of relation A(x) being true is 0.45, which is used to encode the conditional probabilityP(A(x)|B(y))=0.45. Inside another rule0.55::\+A(x) :-B(y)., the symbol "\+" means the logical negation. It denotes that if relationB(y) is true, the probability of relationA(x) being false is 0.55. This conditional probability between relations is widely used in practice. For instance,A(x) could denote that personx has the Alzheimer’s disease whileB(y) denotesy (e.g.,x’s parent) also has the Alzheimer’s disease. The probability of A(x) being true is 0.3. However, given thatx’s parent has Alzheimer’s disease, the probability of A(x) being true increases to 0.45. This is reasonable as the empirical evidence shows that those who have a parent or sibling with Alzheimer’s are more likely to develop the disease than those who do not have a rst-degree relative with Alzheimer’s. 6.1.2 Limitationsof ProbLog In the rst case, when relationsA(x) andB(y) are independent,ProbLog can directly take Listing 6.2 as input, and it can correctly computeP(T) = P(A)P(B)P(R)P(S) = 0.15. However, in the second case, when relations A(x) and B(y) are dependent, ProbLog computes the incorrect result. In Listing 6.3, P(A(x)|B(y))=0.45. Based on Bayes’ theorem, we know that P(A(x)|B(y))P(B(y))=P(B(y)|A(x))P(A(x)). Given that P(A(x))=0.3 and P(B(y))=0.5, we can also compute P(B(y)|A(x))=0.75. Given this conditional probability, P(T) should be calculated as P(A)P(B|A)P(R)P(S) = 0.2025. However, when we feed Listing 6.3 intoProbLog, it returns the 0.1245 108 as the probability of P(T), which is inconsistent with the groundtruth 0.2025. Based on our observation, although ProbLog allows users to encode the conditional probability, it fails to utilize this conditional probability to compute the correct joint probability, e.g. P(A&B). In addition to the aforementioned limitation,ProbLog has another limitation; it requires the accurate value of the conditional dependency. In practice, it is not always feasible to obtain the precise value of the conditional probability, e.g., P(A(x)|B(y)). For instance, the users may only know that P(A(x)) andP(B(y)) are dependent. Meanwhile, it may also be expensive to compute the precise joint or union probability especially when multiple events are dependent in various ways. For instance, when we want to compute the probability of union of three may-dependent events, we need to calculate the following formula:P (A[B[C) =P (A) +P (B) +P (C)P (A\B)P (A\C)P (B\C) +P (A\B\C). This complexity suggests the usefulness of approximating dependencies during inference. Consequently, we propose using intervals to depict joint, conditional, or marginal probabilities when the dependency information is either undened or loosely specied. 6.1.3 AdvantagesofOurMethod We introduce intervals to represent the approximated probability of a certain event. For instance, instead of associating relations A(x), and B(y) with 0.3 and 0.5 respectively, we associate the relations A(x), and B(y) with intervals [0.3, 0.3], and [0.5, 0.5]. The probability interval of relation A(x) is denoted asI A . To compute the probability interval of the predicateT(z) as shown in Listing 6.3, we rst determine the joint probability of the relationsA(x),B(y),R(x,y),S(z), by conjoining all the intervals:I A ;I B ;I R ;I S . Our method for computing interval conjunction is detailed in Section 6.3. Here, in Listing 6.3, given that the conditional probability P(A(x)|B(y)) is 0.45, our tool, QDatalog, takes into account this 0.45 and establishes user-customized rules for computing the interval conjunction. The process can be formalized as follows: 109 Denition6.1.1 (Customized Conjunction Rules). Given relations A(x) and B(y), we dene their intervals asI A = [A L ;A H ],I B = [B L ;B H ]. The conditional probability is dened asP (A(x)jB(y)) = K, and we want to compute the interval of the relation C(x;y) where C(x;y) = A(x)^B(y). We denote I C = [C L ;C H ] whereC L =min(B L K;B L KA L =A H ) andC H =max(B H K;KB H A H =A L ). The proof of Denition 6.1.1 is presented at Section 6.3. With Denition 6.1.1,I A = [0:3; 0:3],I B = [0:5; 0:5], andK = 0:45, we can obtain this interval [0.2025, 0.2025] which is equivalent to the ground truth. However,ProbLog’s result is out of this bound. 6.1.4 AnotherExample In this section, we use another example to show the shortcoming of ProbLog, where it fails to produce the correct result. 0.7::burglary. 0.2::earthquake. 0.9::alarm :-burglary, earthquake. 0.8::alarm :-burglary, \+earthquake. 0.1::alarm :-\+burglary, earthquake. query(alarm). Listing 6.2: In this Bayesian network, predicates earthquake and burglary do not have dependency. 0.7::burglary. 0.2::earthquake. 0.1::earthquake :-burglary. 0.9::\+earthquake :-burglary. 0.9::alarm :-burglary, earthquake. 0.8::alarm :-burglary, \+earthquake. 0.1::alarm :-\+burglary, earthquake. query(alarm). Listing 6.3: In this Bayesian network, predicates earthquake and burglary have dependency. If the burglary happens, the probability of earthquake would decrease. As if the burglary happens outside, it may indicate that people are still outside, so it is less likely to have the earthquake, i.e.,P(earthquake|burglary)=0.1. In Listing 6.2, when predicates earthquake and burglary do not have dependency, P(alarm) = 0.9*0.7*0.2+0.8*0.7*0.8+0.1*0.3*0.2 = 0.58. In Listing 6.3, when predicatesearthquake andburglary have dependency,P(earthquake|burglary) = 0.1. Based on the Bayes’ theorem, we can obtain the follows: 110 P(earthquake|burglary) *P(burglary) =P(burglary|earthquake) *P(earthquake). GivenP(earthquake) = 0.2 andP(burglary) = 0.7, we can infer thatP(burglary|earthquake) = 0.35. P(alarm1) = 0.9*P(earthquake|burglary)*P(burglary) = 0.9 * 0.1 * 0.7 = 0.063 P(alarm2) = 0.8*P(burglary^:earthquake)=0.8*P(:earthquake | burglary) *P(burglary) = 0.8*(1-0.1)*0.7 = 0.504 P(alarm3) = 0.1*P(:burglary^earthquake) =0.1*P(:burglary | earthquake)*P(earthquake) = 0.1*(1-0.35)*0.2 = 0.013 P(alarm) = 0.063+0.504+0.013 = 0.58 However,Problog returns the value 0.05856, which is not equivalent to the groundtruth 0.58. 6.2 Preliminary In this section, we will introduce Datalog, together with its corresponding solver - Soué. Our tool QDatalog aims at computing the probabilistic interval of Datalog relations, which is built upon Soué. 6.2.1 Datalog Datalog has been widely used in program analysis. The extractor transforms the input program to a set of Datalog relations describing the syntax and semantics of the input program. These input Datalog relations are named as Extensional Database (EDB) in Database terminology. The program analysis can be expressed and encoded into Datalog rules. Starting from the input relation, the Datalog Solver (e.g., Z3 and Soué) iteratively applies the Datalog rules to infer new relations which is named as Intensional Database (IDB), until the set of IDB predicates does not change and the Datalog solver reaches the xed point. Datalog rules are represented as follows: R(x;y) :R 1 (x;x 1 );R 2 (x 1 ;x 2 );:::R n (x n ;y): (6.1) 111 In Equation 6.1, R(x;y) is the head of the rule, and the body of the rule consists of predicates R 1 (x;x 1 );R 2 (x 1 ;x 2 );:::R n (x n ;y). It represents the following meaning: R 1 (x;x 1 )^R 2 (x 1 ;x 2 )^::: R n (x n ;y)!R(x;y). Consider the following example. We will use it to walk through our method in Section 6.3. path (X,Y) :- edge(X,Y). path(X,Z) :- path(X,Y),edge(Y,Z). (6.2) Given the above rules as shown in equation 6.2, the relation path(X,Y) is queried in the end. Here, edge(X,Y) is the input relation and referred to as EDB. Output relation (e.g., path(X,Y)) and the two rules are referred to as IDB. The Datalog solver computes the transitive closure of path(X,Y) and edge(X,Y) relations. In Soué, this computation procedure of transitive closure can be transformed into C++ programs, which is described in Section 6.2.2. 6.2.2 Semi-naiveEvaluationofSoulé Soué transforms Datalog evaluation into C++ programs for improving the performance. For instance, the entries in the edge(X,Y) relation are implicitly sorted, which enables the fast range query. In practice, Soué outperforms other Datalog engines such asZ, bddbddb and an SQLite-based solver. Given the path-edge example, Algorithm 6 shows the pseudocode of the C++ programs generated by Soué, named as Semi(S edge ). The Datalog rules of equation 6.2 are hardcoded in Algorithm 6, which take the edge relation as input and return the path relation as output. In Line 4, Semi(S edge ) rst computes path(X,Y) relation from the edge(X,Y) relation by applying the rst rule in Equation 6.2. In Lines 6-22, it computes the the path(X,Y) relation by applying the second rule in Equation 6.2. It maintains a delta set and iterates each entry inside delta. Based on the rule path(X,Z) :- path(X,Y),edge(Y,Z)., for each relation path(X,Y) inside delta (Line 9), it looks for relation edge starting with Y, e.g., edge(Y,Z) in theS edge set (Line 10-12). Afterwards, it 112 Algorithm6 Semi(S edge ) Semi-naive Evaluation from Soué [183] Input: Set of edge relationsS edge Output: Set of path relationstc 1: using Tuple = std::array<int,2> 2: using Relation = std :: set <Tuple> 3: Relation edge, tc 4: edge =S edge . Filling edge relation 5: tc = edge; . Evaluating rst rule path (X,Y) :- edge(X,Y). 6: auto delta = tc . Evaluating second rule path(X,Z) :- path(X,Y),edge(Y,Z). 7: while !delta.empty()do 8: Relation nDelta . Computing new delta 9: for const auto& t1 : deltado 10: auto a = edge.lower_bound ({t1[1] ,0}) 11: auto b = edge.upper_bound ({t1[1]+1 ,0}) 12: for auto it = a; it != b; ++itdo 13: auto& t2 = *it 14: Tuple tr({t1[0],t2[1]}) Tuple path(ft1;t2g) G[tr] G[tr][ path 15: if !contains(tc,tr)then 16: nDelta.insert(tr) 17: endif 18: endfor 19: endfor 20: tc.insert(nDelta.begin(), nDelta.end()) . Inserting delta 21: delta.swap (nDelta) . Swapping data 22: endwhile 23: return tc produces the new relation path(X,Z) which is assembled as tr(t1[0],t2[1]) in Line 14. If this new relation is not contained in the output set, it is added back to both the delta set (Lines 16 and 21) and the output settc (Line 20). The name semi-naive evaluation comes from its evaluation style. For instance, if we slightly modify the rules in Equation 6.2 as follows: path (X,Y) :- edge(X,Y). path(X,Z) :- path(X,Y), path(Y,Z). (6.3) The Soué generated program for Equation 6.3 only changes the Lines 10-11 of Algorithm 6 by replacing edge with tc. As we can see, whenever we have a new relation path(x,y) in the delta set, Algorithm 6 looks for path(y,_). However, it never looks for path(_,x). This evaluation is one-way, and semi-naive. We will show that in Section 6.3, this semi-naive evaluation may inuence the probability computation. 113 Algorithm7 IntervalCal(S edge ;M) Input: Set of edge relationsS edge , and a mapM.8r2S edge ;M(r) =Ir Output: Set of output relationstc and a mapM 0 .8r2tc;M 0 (r) =Ir 1: tc,G Semi’(S edge ) 2: M 0 M . Updating the Semi algorithm to record the derivation graphG 3: visited ;,Gscc ;,Id2Set ;. 4: for noden2Gdo 5: if !visited[n]then 6: DFS_Tarjan(n;G;Gscc;Id2Set;visited) . Eliminating the cycles in the derivation graphG 7: endif 8: endfor 9: M 0 SccSymCon(Gscc;Id2Set;G;M;M 0 ) 10: return tc, M’ 6.3 OurMethod In this section, we begin with a high-level overview of our algorithm designed to compute the probability interval. Following that, we describe each submodule in detail. 6.3.1 High-levelOverview Algorithm 7 shows the high-level procedure of QDatalog. It takes the set of input relationS edge along with their probability mapM, and then returns the set of output relationstc along with their probability mapM 0 . A probability map takes the relation as key, and returns the corresponding probability interval as the value, i.e.,M[R] =I R . In Algorithm 7, the rst step is to run the semi-naive evaluation to generate the derivation map. We dene the derivation map as follows:G =f(R;S)jR2tc; S =f(r;r 0 )gg. For each output relationR2tc, derivation mapG records the set of all derivation paths leading toR. Each derivation path is represented as a tuple (r;r 0 ), in line with the second rule from Equation 6.3, which mandates only two predicates in its body (e.g.,r andr 0 ) to deduce the headR. As an example,r might signify path(X,Y) whiler 0 could signify path(Y,Z). In order to obtainG, we modied semi(S edge ) in Algorithm 6 to record the graphG. The modication is highlighted by the red lines in Algorithm 6. In Line 1 of Algorithm 7, we use semi’(S edge ) to denote the modied version. 114 Given the derivation graphG, if it does not contain cycles, we can directly use our interval analysis in Section 6.3.2 to compute the probability. Otherwise, the interval analysis cannot be directly applied. For instance, for one relation path(1,5), the tuple 2 G[path(1,5)]. Similarly, for another relation path(1,7), the tuple 2G[path(1,7)]. It denotes that the probability computation of path(1,5) relies on the probability of path(1,7) as input, and similarly, the computation of path(1,7) requires path(1,5) to be known. However, as both of them are unknown, the interval analysis cannot be applied to both of them. We detailedly present our way for handling cycles in Section 6.3.3. At a high level, we use the classic Tarjan’s algorithm to help us identify the cycles in Lines 3-7. If the node is not inside any strongly-connected-components (SCC), we directly applied the interval analysis to compute the concrete probability. Otherwise, we represent the relations using symbolic variables, then encode the probability interval computation as a monolithic formula. We subsequently invoke the solver to address this formula all at once. This procedure is encapsulated by the functionSccSymCon in Line 9, detailed further in Section 6.3.3 of Algorithm 7. 6.3.2 ProbabilisticIntervalAnalysis We begin by using an example to motivate our probabilistic interval analysis. Consider the derivation graph G, where the output relation path(1,5) has three derivation paths. Specically,G[path(1; 5)] = {, , }. To compute the probability interval of path(1,5), two logical operations are required: conjunction and disjunction. Conjunction calculates the probability from a single derivation path, while disjunction computes the union from multiple paths. For the computation of path(1,5), we rstly need to computeC 1 (the conjunction of path(1,2) and path(2,5)),C 2 (the conjunction of path(1,7) and path(7,5)), andC 3 (the conjunction of path(1,3) and path(3,5)). We then compute the disjunction ofC 1 ; C 2 andC 3 . 115 Conjunction Given the conjunction P (A^B) = P (A)*P (BjA) = P (B)*P (AjB), where P (A) 2 [A l ;A h ],P (B)2 [B l ;B h ], we need to compute the interval [AB l ;AB h ] forP (A^B). For the lower boundAB l , it is 0 as eventA andB might be mutually exclusive, suggesting thatP (AjB) = 0. For the upper boundAB h , the value is given bymin(A h ;B h ). Intuitively, if eventA implies eventB, the joint probability is determined by whichever event has the smaller upper bound. Given thatP (A^B) =P (A)P (BjA) =P (B)*P (AjB), and considering that bothP (BjA);P (AjB)2 [0; 1], in order to maximizeP (A^B), eitherP (AjB) orP (BjA) must be 1. • ifP (A) =P (B), thenP (AjB) =P (BjA) = 1, the upper bound ofP (A^B) isP (A). • ifP (A)P (B) andP (BjA) < 1,P (AjB) = 1, the upper bound ofP (A^B) =P (B)*P (AjB) is P (B). To summarize,P (A^B)2 [0;min(A h ;B h )]. However, the lower bound 0 is not tight enough and we would like to obtain a tighter lower bound. For instance, given two eventsA andB withP (A) = 90% andP (B) = 90%. Although we do not know the dependency relation betweenA andB, we can draw the conclusion thatA andB are not exclusive, as P (A) +P (B)> 1. If we consider the following four cases: • :A^:B: its upper bound ismin(P (:A);P (:B)) = 10% • A^:B: its upper bound ismin(P (A);P (:B)) = 10% • :A^B: its upper bound ismin(P (A);P (:B)) = 10% • A^B: it is equivalent to 1P (:A^:B)P (A^:B)P (:A^B). 116 To summarize the above four cases, we know that the lower bound ofP (A^B) is the complement of the upper bound ofP (:A^:B) +P (A^:B) +P (:A^B). Hence, the lower bound ofP (A^B) is 1UBfP (:A^:B) +P (A^:B) +P (:A^B)g, which is 1 (10% + 10% + 10%) = 70%. Inspired by the above example, we can infer that the lower bound ofP (A^B) equals to 1min(1 A l ;B h )min(A h ; 1B l )min(1A l ; 1B l ). Hence the interval ofP (A^B) is [1min(1 A l ;B h )min(A h ; 1B l )min(1A l ; 1B l );min(A h ;B h )]. It is natural that the probability is always between 0 and 1. Then the lower bound is equivalent tomax(0; 1min(1A l ;B h )min(A h ; 1 B l )min(1A l ; 1B l )). Disjunction P (A_B) =P (A) +P (B)P (A^B). We also assume thatP (A)2 [A l ;A h ],P (B)2 [B l ;B h ]. Disjunction operation is useful to represent the case when there are multiple rules available to infer a predicate path(1,5). For instance, given two rulesA! path(1,5) andB! path(1,5), the probability of path(1,5) being true is equivalent to the probability of the disjunction of two events A and B, i.e., P (path(1,5)) =P (A_B). P (A_ B) = P (A) + P (B) P (A^ B) = P (A) + P (B) P (A)P (BjA), the upper bound of P (A_B) isA h +B h when eventsA andB are exclusive (i.e.,P (A^B) = 0). The lower bound of P (A_B) ismax(A l ;B l ). It is because, when minimizingP (A_B), we would maximizeP (A)P (BjA) orP (B)P (AjB). To maximizeP (A)P (BjA) orP (B)P (AjB), eitherP (AjB) orP (BjA) is 1. There are two cases as follows: • ifP (A)>P (B), thenP (AjB) = 1 andP (BjA)< 1 asP (A)P (BjA) =P (B)P (AjB). Then the lower bound ofP (A_B) =P (A) +P (B)P (B)P (AjB) =P (A) +P (B)P (B) =P (A) • ifP (A)P (B), thenP (BjA) = 1 andP (AjB)< 1 asP (A)P (BjA) =P (B)P (AjB). Then the lower bound ofP (A_B) =P (A) +P (B)P (A)P (BjA) =P (A) +P (B)P (A) =P (B) 117 Algorithm8IntervalAnalysis(R;G;M 0 ) Input: relation nodeR, derivation graphG and the probability mapM 0 . Output: Updated probability mapM 0 1: disjuncL;disjuncH;conjL;conjH 0; 0; 0; 0 2: if Contains(M 0 ,R)then returnM 0 3: endif 4: forDPath 2 G[R]do 5: A; B DPath[0];DPath[1] 6: conjL 1min(1M 0 [A][0];M 0 [B][1])min(M 0 [A][1]; 1M 0 [B][0])std ::min(1M 0 [A][0]; 1M 0 [B][0]) 7: conjH min(M 0 [A][1];M 0 [B][1]); . Conjunction computation 8: disjuncL std ::max(disjuncL;conjL); 9: disjuncH std ::min(1;disjuncH +conjH ); . Disjunction computation 10: endfor 11: M 0 [R] fdisjuncL; disjuncHg 12: returnM 0 As a result, we can prove that the lower bound ofP (A_B) is equivalent tomax(P (A);P (B)), which ismax(A l ;B l ). To summarize, the interval ofP (A_B) is [max(A l ;B l );min(1;A h +B h )]. User-denedConjunction/DisjunctionRules We also want to highlight that our framework is exible in such a way that we allow users to specify the dependency information between dierent predicates, and our tool is able to customize the above conjunction and disjunction rules, such that the interval computed by our tool is tighter. Denition 6.1.1 in Section 6.1.3 demonstrates the user-customized rules. Algorithm 8 illustrates how we use the conjunction and disjunction rules to compute the probability interval of a relationR. In Line 4, given the derivation graphG, the algorithm iterates all the possible derivation pathsDPath, to infer the relationR. Focusing on the example rules presented in Equation 6.3, each path includes two predicates,A andB, since the body consists of these two predicates, represented as R A^B. Lines 6-7 calculate the lower and upper bounds of the conjunction operation.M 0 [A] indicates the probability interval of relationA. HereM 0 [A][0]; M 0 [A][1] are the lower and upper bounds of this interval, respectively. In Lines 8-9, the algorithm computes the lower and upper bounds for the disjunction operation. The algorithm concludes by updating the probability map and returningM 0 . Due to the nature of this mixed combination of conjunction and disjunction, Depth-First-Search is ideally suited for our interval analysis, as detailed in Section 6.3.3. 118 Algorithm9SccSymCon(v;G scc ;Id2Set;G;M 0 ; visited) Input: v;Gscc;Id2Set;G;M 0 ; visited Output: Updated probability mapM 0 1: visited[v] true 2: forvn2Gscc[v]do . Traversing the neighbor node (vn) ofv in the reduced graphGscc 3: if !visited[vn]then 4: SccSymCon(vn;Gscc;Id2Set;G;M 0 ; visited); 5: endif 6: endfor 7: Sv Id2Set[v] 8: ifjSvj = 1then 9: M 0 IntervalAnalysis(Sv[0];G;M 0 ) 10: else 11: M 0 SymSolving(Sv;G;M 0 ) 12: endif 13: returnM 0 6.3.3 HandlingCyclesintheDerivationGraph In Section 6.3.1, we provided a high-level overview. We observed potential cycles within the derivation graph,G. Such cycles hinder the direct application of interval analysis. This hindrance arises from the requirement of Algorithm 8, which stipulates that to compute the probability of the head predicate, the probabilities of the body predicates must be pre-determined. To address this challenge, we begin by identifying the Strongly Connected Components (SCCs) in the original derivation graph, denoted asG. Following this, we produce the reduced graph, termedG scc , and the mapId2Set. The reduced graph Gscc consists of nodes dened byG scc =fvjv2Zg, wherev denotes the id of the SCC group. The mapping function,Id2Set, is given byId2Set =f(v;S)jv2G scc ;S =fv 0 g;v 0 2Gg. This function maps each SCC node id to a set of nodes from the original derivation graph,G. Algorithm 9 showcases the computation of probability using the cycle-free reduced graph,G scc . In Lines 1-6, it follows the DFS order to traverseG scc . After a nodev has traversed all of its children, we start to compute its probability in Lines 7-12.S v represents a node set within the original derivation graph G, corresponding to the nodev in the reduced graphG scc . IfS v only contains one variable (i.e.,S v [0]), it implies that there is no cycle involved withS v [0]. We can directly apply the interval analysis in Algorithm 8 toS v [0] at Line 9. Otherwise, we encode all the nodes insideS v symbolically and invoke the SMT solver to compute their probability interval. 119 6.4 Experiments Our experiments were designed to answer the following research questions (RQs): • How ecient is our tool in terms of the running time as compared with the classic Datalog solving? • How useful is our computed probability interval? We implementedQDatalog on top of Soué [115]. More specically,QDatalog takes the C++ program generated by Soué, and compiled it to another C++ program for conducting the probability interval computation. The symbolic solving of the SCC components is implemented by utilizing CVC5 solver [18]. We ran all experiments on a computer with 2.9 GHz Intel Core i5 CPU and 64 GB RAM. 6.4.1 Benchmarks Our evaluation used two kinds of benchmarks. The rst class was collected from the probabilistic program- ming community while the second class was gathered from the side-channel community. The rst class includes examples such as Bayesian networks, Probabilistic Graphs, Social Networks, Path Connection, Stochastic memorization, Crowd protocol, and Monty Hall puzzle. These examples are represented as small Datalog programs, which are widely used in statistical inference and machine learning. The second class consists of masked multiplication, masked S-box, masked AES and various masked MAC-Keccak functions. We manually associate probabilities to the input relations based on their respective problem domains. 6.4.2 Results As shown in Table 6.1, there are four benchmarks we have evaluated, and all of them are from Bayesian inference. Benchmarks 3 and 4 have more relations. For the ease of presentation, we do not plot them in Table 6.1. The second rowT denotes the running time of our tool QDatalog while the third rowT w=o 120 denotes the running time of Soué without probabilistic inference. Starting from the fourth row, Table 6.1 reports the probability interval computed by our tool. In terms of the running time, our tool spent more time than Soué, which is reasonable, as we compute more probabilistic intervals and the additional CVC5 solver invocations also increase the running time. In terms of the usefulness of QDatalog, we have shown that most intervals we computed are pretty tight, e.g., [0.4, 0.45]. Although in Section 6.1 we have shown that the results of ProbLog is incorrect (and are not in the interval computed by our tool QDatalog), we did not put the results of ProbLog in Table 6.1, because it does not accept the interval as inputs. We will conduct another head-to-head comparison between QDatalog and ProbLog. We plan to feed a single probability inputx to ProbLog and feed [x;x] to our tool, for benchmarks with user-specied conditional dependency. In addition to that, we also plan to run the side-channel benchmarks to evaluate whetherQDatalog is able to prune the unknown cases, falsify more leaky cases and increase the accuracy. Table 6.1: Running time and results of QDatalog Bayesian Inference 1 Bayesian Inference 2 Bayesian Inference 3 Bayesian Inference 4 T 26.037ms 27.822 ms 27.894 ms 28.531 ms T w=o 3.383 ms 3.459 ms 3.582 ms 4.183 ms R1 [0.3, 0.33] [0.4, 0.45] [0.2, 0.34] [0.63, 0.81] R2 [0.3, 0.33] [0.4, 0.45] [0.2, 0.34] [0.63, 0.81] R3 [0, 0.99] [0, 1] [0.2, 0.34] [0, 1] R4 [0.3, 0.33] [0.4, 0.45] [0, 1] [0.63, 0.81] R5 [0, 0.66] [0, 0.9] [0.2, 0.34] [0,1] R6 [0.3, 0.33] [0.4, 0.45] [0, 1] [0,1] R7 [0, 0.33] [0, 0.45] [0, 0.68] [0, 0.81] R8 [0.3, 0.33] [0.4, 0.45] [0, 0.34] [0.63, 0.81] R9 [0, 0.33] [0, 0.45] [0.2, 0.34] [0, 0.81] R10 [0, 0.33] [0, 0.45] [0, 0.34] [0.63, 0.81] R11 [0.3, 0.33] [0.4, 0.45] [0.2, 0.34] [0.63, 0.81] R12 [0, 0.33] [0, 0.45] [0.2, 0.34] [0.63, 0.81] R13 [0.3, 0.33] [0.4, 0.45] [0.2, 0.34] [0.63, 0.81] R14 [0.3, 0.33] [0.4, 0.45] [0.2, 0.34] [0, 1] R15 [0, 0.33] [0, 0.45] [0, 0.68] [0, 0.81] 121 6.5 Summary I have presented a method for quantifying side-channel leaks by computing the probability interval of Datalog predicates. This method leverages both interval analysis and on-demand cycle handling to quickly and soundly determine the probability interval associated with a specic predicate. I have evaluated my method across a varied range of benchmarks including statistical inference and we plan to incorporate evaluation results for side-channel benchmarks. Preliminary results indicate that this method consistently produces tight intervals for the majority of relations with little runtime overhead. Moreover, my tool QDatalog has demonstrated its ability to compute the correct result whereas prior works fell short. 122 Chapter7 Conclusion In this chapter, I give my conclusions and then discuss promising directions for future research. 7.1 Summary As side-channel attacks aect almost all of our processors, it is urgent for us to x these vulnerabilities and ensure mission-critical applications are immune to power side-channel attacks. Whileprogramanalysis andsynthesis are promising techniques for detecting and mitigating power side-channel attacks, e.g., by checking and then automatically removing the statistical dependencies in software code, existing methods and tools for program analysis and synthesis fall short in terms of both scalability and accuracy. That is, they are either fast but extremely inaccurate or accurate but extremely slow. To ll the gap, this dissertation proposes four innovative methodstoovercomethescalability/accuracybolenecks.. In Chapter 3, I proposed a data-driven method for synthesizing static analyses to detect side-channel information leaks in cryptographic software. This analyzer consists of a set of type-inference rules learned from the training data, i.e., example code snippets annotated with the ground truth. To implement this technique, I used syntax-guided synthesis (SyGuS) to generate new recursive features and decision tree learning (DTL) to generate analysis rules based on these features. Soundness is guaranteed by proving each learned analysis rule via a technique called query containment checking. My results show that, in 123 addition to being automated and provably sound during synthesis, the learned analyzer can achieve the same empirical accuracy as two state-of-the-art, manually crafted analyzers while being signicantly faster. In Chapter 4, I proposed a program analysis and transformation based method to eliminate power side- channel leaks. To implement this technique, I rst proposed a type-based technique for detecting compiler- induced leaks, which leverages Datalog-based declarative analysis and domain specic optimizations to achieve high eciency and accuracy. I also developed the mitigation technique for the compiler’s backend, more specically the register allocation modules, to ensure that leaky intermediate computation results are stored in dierent CPU registers or memory locations. Experimental evaluations demonstrated that the method is eective in removing the side channel while being ecient, i.e., the mitigated code is more compact and runs faster than code mitigated using state-of-the-art techniques. In Chapter 5, I proposed a method for synthesizing invariants that can help verify the functional equivalence of the programs before and after the mitigation. To implement this technique, I rst generated invariant candidates using syntax guided synthesis (SyGuS) and then ltered them using an SMT-solver based verier, to ensure they are both inductive invariants and sucient for verifying the property at hand. To improve performance, I proposed two learning based techniques: a logical reasoning (LR) technique to determine which part of the search space can be pruned away, and a reinforcement learning (RL) technique to determine which part of the search space to prioritize. Experimental evaluations demonstrated that our method can generate invariants of a higher quality in much shorter time, which makes the equivalence verication more ecient and scalable. In Chapter 6, I proposed a new method for quantifying side-channel leaks. This was achieved by augmenting the Datalog-based side-channel analysis with probability tracking. To implement this technique, I rst dened the probability interval for Datalog predicates and developed a sound and generalinterval analysis framework to compute the interval. Then I introduced a novel on-demand cycle handling technique to address the challenges faced by forward interval analysis. Experimental evaluations demonstrated that 124 our method can generate tight and sound intervals for most relations with little run-time overhead, which can help quantify side-channel leaks. 7.2 FutureWork There are two future research directions that I consider promising. 7.2.1 Side-channelsecurityforcyber-physicalsystems One important direction for future research is improving the side-channel security of cyber-physical systems, e.g., autonomous driving systems that leverage deep learning models and adaptive localization algorithms. Prior research shows that correlations between the physical state of the vehicle and various side channels pose security risks to the vehicle’s control software [131]. I am interested in analyzing the correlations and developing mitigation techniques. Unlike other application domains, this requires a deep understanding of not only the intrinsic logic of the vehicle’s control software, but also the interactions with the user. Thus, techniques for modeling the computer–user interaction, together with techniques for tracking the statistical dependencies, must be incorporated. 7.2.2 Quantitativeanalysisofprobabilisticprograms Another important direction for future research is quantitative analysis of probabilistic programs. As random variables are increasingly used in cryptographic software to protect secret data, modeling the related probabilistic behavior is useful to ensure security. Meanwhile, in Chapters 3 and 4, we primarily focus on developing qualitative analysis to decide if a certain variable is leak-free or leaky. As our analysis is sound in that if it reports leak-free, the variable is guaranteed to be secure. Otherwise, this variable may leak information. In this case, we also want to quantify leaks. More specically, what is the probability of leaking information? 125 This brings us to another important area of research: quantitative analysis of probabilistic programs. While I have developed type inference rules for quantifying statistical dependencies where random variables, drawn from a particular type of probability distribution, are involved in the data ow (in Chapter 4), there is a need to generalize the technique, e.g., to quantify the dependencies of probabilistic programs, where both the control ow and the data ow are associated with random variables drawn from arbitrary distributions. Probabilistic programs are useful in modeling various types of security risks, robustness levels, or fairness. However, scalable techniques for analyzing and quantifying them are still severely lacking. One promising direction that I would like to pursue is to augment existing program analysis techniques (e.g., Datalog-based declarative program analysis and SMT solver based verication techniques) with the capability of tracking probability distributions of symbolic variables. 126 Bibliography [1] Alessandro Abate, Iury Bessa, Dario Cattaruzza, Lucas Cordeiro, Cristina David, Pascal Kesseli, Daniel Kroening, and Elizabeth Polgreen. “Automated formal synthesis of digital controllers for state-space physical plants”. In: International Conference on Computer Aided Verication. CAV 2017. Springer. 2017, pp. 462–482. [2] Tarek S Abdelrahman and Robert Sawaya. “Improving the structure of loop nests in scientic programs”. In: Comput. Syst. Sci. Eng. 19.1 (2004), pp. 11–25. [3] Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases: The Logical Level. 1st. Pearson, 1994. [4] Giovanni Agosta, Alessandro Barenghi, and Gerardo Pelosi. “A code morphing methodology to automate power analysis countermeasures”. In: Proceedings of the The 49th Annual Design Automation Conference 2012 (DAC). 2012, pp. 77–82. [5] Mehdi-Laurent Akkar and Louis Goubin. “A generic protection against high-order dierential power analysis”. In: International Workshop on Fast Software Encryption. 2003, pp. 192–205. [6] José Bacelar Almeida, Manuel Barbosa, Jorge Sousa Pinto, and Bárbara Vieira. “Formal verication of side-channel countermeasures using self-composition”. In: (2013). [7] Rajeev Alur, Rastislav Bodík, Garvit Juniwal, Milo M. K. Martin, Mukund Raghothaman, Sanjit A. Seshia, Rishabh Singh, Armando Solar-Lezama, Emina Torlak, and Abhishek Udupa. “Syntax-guided synthesis”. In: International Conference on Formal Methods in Computer-Aided Design. 2013. [8] Rajeev Alur, Rastislav Bodik, Garvit Juniwal, Milo MK Martin, Mukund Raghothaman, Sanjit A Seshia, Rishabh Singh, Armando Solar-Lezama, Emina Torlak, and Abhishek Udupa. “Syntax-guided synthesis”. In: 2013 Formal Methods in Computer-Aided Design. IEEE. 2013, pp. 1–8. [9] Rajeev Alur, Arjun Radhakrishna, and Abhishek Udupa. “Scaling enumerative program synthesis via divide and conquer”. In: International Conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer. 2017, pp. 319–336. 127 [10] Shengwei An, Rishabh Singh, Sasa Misailovic, and Roopsha Samanta. “Augmented example-based synthesis using relational perturbation properties”. In: Proceedings of the ACM on Programming Languages 4.POPL (2019), pp. 1–24. [11] Nikolaos Athanasios Anagnostopoulos, Stefan Katzenbeisser, John A. Chandy, and Fatemeh Tehranipoor. “An overview of DRAM-based security primitives”. In: Cryptography 2.2 (2018), p. 7. [12] Emanuele De Angelis, Fabio Fioravanti, Alberto Pettorossi, and Maurizio Proietti. “Relational verication through horn clause transformation”. In: International Static Analysis Symposium. Springer. 2016, pp. 147–169. [13] Timos Antonopoulos, Paul Gazzillo, Michael Hicks, Eric Koskinen, Tachio Terauchi, and Shiyi Wei. “Decomposition instead of self-composition for proving the absence of timing channels”. In: ACM SIGPLAN Conference on Programming Language Design and Implementation. 2017, pp. 362–375. [14] Timos Antonopoulos, Paul Gazzillo, Michael Hicks, Eric Koskinen, Tachio Terauchi, and Shiyi Wei. “Decomposition instead of self-composition for proving the absence of timing channels”. In: ACM SIGPLAN Notices 52.6 (2017), pp. 362–375. [15] David F Bacon, Susan L Graham, and Oliver J Sharp. “Compiler transformations for high-performance computing”. In: ACM Computing Surveys (CSUR) 26.4 (1994), pp. 345–420. [16] Josep Balasch, Benedikt Gierlichs, Vincent Grosso, Oscar Reparaz, and François-Xavier Standaert. “On the cost of lazy engineering for masked software implementations”. In:InternationalConference on Smart Card Research and Advanced Applications. Springer. 2014, pp. 64–81. [17] Lucas Bang, Abdulbaki Aydin, Quoc-Sang Phan, Corina S. Pasareanu, and Tevk Bultan. “String analysis for side channels with segmented oracles”. In: ACM SIGSOFT Symposium on Foundations of Software Engineering. 2016, pp. 193–204. [18] Haniel Barbosa, Clark Barrett, Martin Brain, Gereon Kremer, Hanna Lachnitt, Makai Mann, Abdalrhman Mohamed, Mudathir Mohamed, Aina Niemetz, Andres Nötzli, et al. “cvc5: A versatile and industrial-strength SMT solver”. In: International Conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer. 2022, pp. 415–442. [19] Pablo Barceló, Miguel Romero, and Moshe Y Vardi. “Does query evaluation tractability help query containment?” In: Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. ACM. 2014, pp. 188–199. [20] Gilles Barthe, Sonia Belaïd, François Dupressoir, Pierre-Alain Fouque, Benjamin Grégoire, and Pierre-Yves Strub. “Veried proofs of higher-order masking”. In: Annual International Conference on the Theory and Applications of Cryptographic Techniques. 2015, pp. 457–485. [21] Gilles Barthe, Sonia Belaïd, François Dupressoir, Pierre-Alain Fouque, Benjamin Grégoire, Pierre-Yves Strub, and Rébecca Zucchini. “Strong non-interference and type-directed higher-order masking”. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM. 2016, pp. 116–129. 128 [22] Gilles Barthe, Sonia Belaïd, Pierre-Alain Fouque, and Benjamin Grégoire. “maskVerif: a formal tool for analyzing software and hardware masked implementations.” In: IACR Cryptology ePrint Archive 2018 (2018), p. 562. [23] Gilles Barthe, Juan Manuel Crespo, and César Kunz. “Relational verication using product programs”. In: International Symposium on Formal Methods. Springer. 2011, pp. 200–214. [24] Gilles Barthe, François Dupressoir, Sebastian Faust, Benjamin Grégoire, François-Xavier Standaert, and Pierre-Yves Strub. “Parallel implementations of masking schemes and the bounded moment leakage model”. In: Annual International Conference on the Theory and Applications of Cryptographic Techniques. 2017, pp. 535–566. [25] Gilles Barthe, Benjamin Grégoire, and Vincent Laporte. “Secure compilation of side-channel countermeasures: the case of cryptographic “constant-time””. In: 2018 IEEE 31st Computer Security Foundations Symposium (CSF). IEEE. 2018, pp. 328–343. [26] Gilles Barthe, Boris Köpf, Laurent Mauborgne, and Martín Ochoa. “Leakage Resilience against Concurrent Cache Attacks”. In: International Conference on Principles of Security and Trust. 2014, pp. 140–158. [27] Gilles Barthe and César Kunz. “Certicate translation in abstract interpretation”. In: European Symposium on Programming. Springer. 2008, pp. 368–382. [28] Rohan Bavishi, Hiroaki Yoshida, and Mukul R Prasad. “Phoenix: automated data-driven synthesis of repairs for static analysis violations”. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM. 2019, pp. 613–624. [29] Ali Galip Bayrak, Francesco Regazzoni, Philip Brisk, François-Xavier Standaert, and Paolo Ienne. “A rst step towards automatic application of power analysis countermeasures”. In: ACM/IEEE Design Automation Conference. 2011, pp. 230–235. [30] Ali Galip Bayrak, Francesco Regazzoni, David Novo, and Paolo Ienne. “Sleuth: Automated verication of software power analysis countermeasures”. In: International Workshop on Cryptographic Hardware and Embedded Systems. 2013, pp. 293–310. [31] Nick Benton. “Simple relational correctness proofs for static analyses and program transformations”. In: ACM SIGPLAN Notices 39.1 (2004), pp. 14–25. [32] Guido Bertoni, Joan Daemen, Michael Peeters, Gilles Van Assche, and Ronny Van Keer. “Keccak implementation overview”. In: URL: http://keccak. noekeon. org/Keccak-implementation-3.2. pdf (2012). [33] Swarup Bhunia, Michael S Hsiao, Mainak Banga, and Seetharam Narasimhan. “Hardware Trojan attacks: threat analysis and countermeasures”. In: Proceedings of the IEEE 102.8 (2014), pp. 1229–1247. [34] Pavol Bielik, Veselin Raychev, and Martin Vechev. “Learning a static analyzer from data”. In: International Conference on Computer Aided Verication. CAV 2017. Springer. 2017, pp. 233–253. 129 [35] Elia Bisi, Filippo Melzani, and Vittorio Zaccaria. “Symbolic analysis of higher-order side channel countermeasures”. In: IEEE Transactions on Computers 66.6 (2016), pp. 1099–1105. [36] Elia Bisi, Filippo Melzani, and Vittorio Zaccaria. “Symbolic analysis of higher-order side channel countermeasures”. In: IEEE Trans. Computers 66.6 (2017), pp. 1099–1105. [37] Roderick Bloem, Hannes Gross, Rinat Iusupov, Bettina Könighofer, Stefan Mangard, and Johannes Winter. “Formal verication of masked hardware implementations in the presence of glitches”. In: Annual International Conference on the Theory and Applications of Cryptographic Techniques. Springer. 2018, pp. 321–353. [38] Johannes Blömer, Jorge Guajardo, and Volker Krummel. “Provably secure masking of AES”. In: International workshop on selected areas in cryptography. Springer. 2004, pp. 69–83. [39] Arthur Blot, Masaki Yamamoto, and Tachio Terauchi. “Compositional Synthesis of Leakage Resilient Programs”. In: International Conference on Principles of Security and Trust. 2017, pp. 277–297. [40] Martin Bravenboer and Yannis Smaragdakis. “Strictly Declarative Specication of Sophisticated Points-to Analyses”. In: Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications. OOPSLA 2009. ACM, 2009, 243–262. [41] Martin Bravenboer and Yannis Smaragdakis. “Strictly declarative specication of sophisticated points-to analyses”. In: ACM SIGPLAN Notices 44.10 (2009), pp. 243–262. [42] Tegan Brennan, Seemanta Saha, and Tevk Bultan. “Symbolic path cost analysis for side-channel detection”. In: International Conference on Software Engineering. 2018, pp. 424–425. [43] Tegan Brennan, Seemanta Saha, Tevk Bultan, and Corina S Păsăreanu. “Symbolic path cost analysis for side-channel detection”. In: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM. 2018, pp. 27–37. [44] Eric Brier, Christophe Clavier, and Francis Olivier. “Correlation power analysis with a leakage model”. In: International Workshop on Cryptographic Hardware and Embedded Systems. Springer. 2004, pp. 16–29. [45] Rudy Bunel, Matthew Hausknecht, Jacob Devlin, Rishabh Singh, and Pushmeet Kohli. “Leveraging Grammar and Reinforcement Learning for Neural Program Synthesis”. In: International Conference on Learning Representations. 2018. [46] Diego Calvanese, Giuseppe De Giacomo, and Maurizio Lenzerini. “On the decidability of query containment under constraints”. In: PODS. Vol. 98. 1998, pp. 149–158. [47] Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Moshe Y Vardi. “Reasoning on regular path queries”. In: ACM SIGMOD Record 32.4 (2003), pp. 83–92. [48] Diego Calvanese, Giuseppe De Giacomo, and Moshe Y Vardi. “Decidable containment of recursive queries”. In: Theoretical Computer Science 336.1 (2005), pp. 33–56. 130 [49] David Canright and Lejla Batina. “A very compact “perfectly masked” S-box for AES”. In: International Conference on Applied Cryptography and Network Security. Springer. 2008, pp. 446–459. [50] Sunjay Cauligi, Craig Disselkoen, Klaus v Gleissenthall, Dean Tullsen, Deian Stefan, Tamara Rezk, and Gilles Barthe. “Constant-time foundations for the new spectre era”. In: Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation. 2020, pp. 913–926. [51] Gregory Chaitin. “Register allocation and spilling via graph coloring”. In: ACM SIGPLAN notices 39.4 (2004), pp. 66–74. [52] Suresh Chari, Charanjit S Jutla, Josyula R Rao, and Pankaj Rohatgi. “Towards sound approaches to counteract power-analysis attacks”. In: Annual International Cryptology Conference. 1999, pp. 398–412. [53] Swarat Chaudhuri, Sumit Gulwani, and Roberto Lublinerman. “Continuity analysis of programs”. In: Proceedings of the 37th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 2010, pp. 57–70. [54] Jia Chen, Yu Feng, and Isil Dillig. “Precise detection of side-channel vulnerabilities using quantitative cartesian hoare logic”. In: ACM SIGSAC Conference on Computer and Communications Security. 2017, pp. 875–890. [55] Jia Chen, Yu Feng, and Isil Dillig. “Precise detection of side-channel vulnerabilities using quantitative cartesian hoare logic”. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 2017, pp. 875–890. [56] Jia Chen, Jiayi Wei, Yu Feng, Osbert Bastani, and Isil Dillig. “Relational verication using reinforcement learning”. In: Proceedings of the ACM on Programming Languages 3.OOPSLA (2019), pp. 1–30. [57] Yanju Chen, Chenglong Wang, Osbert Bastani, Isil Dillig, and Yu Feng. “Program Synthesis Using Deduction-Guided Reinforcement Learning”. In: International Conference on Computer Aided Verication. Springer. 2020, pp. 587–610. [58] Berkeley Churchill, Oded Padon, Rahul Sharma, and Alex Aiken. “Semantic program alignment for equivalence checking”. In: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation. 2019, pp. 1027–1040. [59] Christophe Clavier, Jean-Sébastien Coron, and Nora Dabbous. “Dierential power analysis in the presence of hardware countermeasures”. In: International Workshop on Cryptographic Hardware and Embedded Systems. 2000, pp. 252–263. [60] Keith D. Cooper and Anshuman Dasgupta. “Tailoring graph-coloring register allocation for runtime compilation”. In: Fourth IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2006), 26-29 March 2006, New York, New York, USA. 2006, pp. 39–49. 131 [61] Jean-Sébastien Coron, Christophe Giraud, Emmanuel Prou, Soline Renner, Matthieu Rivain, and Praveen Kumar Vadnala. “Conversion of security proofs from one leakage model to another: A new issue”. In: International Workshop on Constructive Side-ChannelAnalysis and Secure Design. Springer. 2012, pp. 69–81. [62] Jean-Sébastien Coron, Aurélien Greuet, Emmanuel Prou, and Rina Zeitoun. “Faster evaluation of sboxes via common shares”. In: International Conference on Cryptographic Hardware and Embedded Systems. Springer. 2016, pp. 498–514. [63] Jean-Sébastien Coron, Emmanuel Prou, Matthieu Rivain, and Thomas Roche. “Higher-order side channel security and mask refreshing”. In:InternationalWorkshoponFastSoftwareEncryption. 2013, pp. 410–424. [64] Patrick Cousot and Radhia Cousot. “Abstract interpretation: a unied lattice model for static analysis of programs by construction or approximation of xpoints”. In: Proceedings of the 4th ACM SIGACT-SIGPLAN symposium on Principles of programming languages. 1977, pp. 238–252. [65] Cristina David, Daniel Kroening, and Matt Lewis. “Using program synthesis for program analysis”. In: Logic for programming, articial intelligence, and reasoning. Springer. 2015, pp. 483–498. [66] Leonardo De Moura and Nikolaj Bjørner. “Z3: An ecient SMT solver”. In: International conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer. 2008, pp. 337–340. [67] Leonardo De Moura and Nikolaj Bjørner. “Z3: an ecient SMT solver”. In:ProceedingsoftheTheory and Practiceof Software, 14th InternationalConference on Toolsand Algorithms forthe Construction and Analysis of Systems. Budapest, Hungary: Springer-Verlag, 2008, pp. 337–340. [68] Luc De Raedt, Angelika Kimmig, and Hannu Toivonen. “ProbLog: A probabilistic Prolog and its application in link discovery”. In:IJCAI2007,Proceedingsofthe20thinternationaljointconferenceon articial intelligence. IJCAI-INT JOINT CONF ARTIF INTELL. 2007, pp. 2462–2467. [69] Jacob Devlin, Jonathan Uesato, Surya Bhupatiraju, Rishabh Singh, Abdel-rahman Mohamed, and Pushmeet Kohli. “Robustll: Neural program learning under noisy i/o”. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org. 2017, pp. 990–998. [70] Isil Dillig and Thomas Dillig. “Explain: a tool for performing abductive inference”. In: International Conference on Computer Aided Verication. Springer. 2013, pp. 684–689. [71] Isil Dillig, Thomas Dillig, Boyang Li, and Ken McMillan. “Inductive invariant generation via abductive inference”. In: Acm Sigplan Notices 48.10 (2013), pp. 443–456. [72] Alastair F Donaldson, Leopold Haller, Daniel Kroening, and Philipp Rümmer. “Software verication using k-induction”. In: International Static Analysis Symposium. Springer. 2011, pp. 351–368. [73] Goran Doychev, Dominik Feld, Boris Köpf, Laurent Mauborgne, and Jan Reineke. “CacheAudit: A tool for the static analysis of cache side channels”. In: Proceedings of the 22th USENIX Security Symposium. 2013, pp. 431–446. 132 [74] Alexandre Duc, Sebastian Faust, and François-Xavier Standaert. “Making masking security proofs concrete”. In: Annual International Conference on the Theory and Applications of Cryptographic Techniques. 2015, pp. 401–429. [75] Mnacho Echenim, Nicolas Peltier, and Yanis Sellami. “A generic framework for implicate generation modulo theories”. In: International Joint Conference on Automated Reasoning. Springer. 2018, pp. 279–294. [76] Inès Ben El Ouahma, Quentin L Meunier, Karine Heydemann, and Emmanuelle Encrenaz. “Symbolic approach for side-channel resistance analysis of masked assembly codes”. In: Security Proofs for Embedded Systems. 2017. [77] Hassan Eldib and Chao Wang. “Synthesis of masking countermeasures against side channel attacks”. In: International Conference on Computer Aided Verication. CAV 2014. Springer. 2014, pp. 114–130. [78] Hassan Eldib, Chao Wang, and Patrick Schaumont. “Formal verication of software countermeasures against side-channel attacks”. In: ACM Transactions on Software Engineering and Methodology 24.2 (2014), p. 11. [79] Hassan Eldib, Chao Wang, and Patrick Schaumont. “SMT-based verication of software countermeasures against side-channel attacks”. In: Tools and Algorithms for the Construction and Analysis of Systems: 20th International Conference, TACAS 2014, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2014, Grenoble, France, April 5-13, 2014. Proceedings 20. Springer. 2014, pp. 62–77. [80] Hassan Eldib, Chao Wang, Mostafa Taha, and Patrick Schaumont. “QMS: Evaluating the side-channel resistance of masked software from source code”. In: ACM/IEEE Design Automation Conference. 2014, pp. 1–6. [81] Azadeh Farzan and Zachary Kincaid. “Compositional recurrence analysis”. In: 2015 Formal Methods in Computer-Aided Design (FMCAD). IEEE. 2015, pp. 57–64. [82] Yu Feng, Ruben Martins, Osbert Bastani, and Isil Dillig. “Program synthesis using conict-driven learning”. In: ACM SIGPLAN Notices 53.4 (2018), pp. 420–435. [83] Yu Feng, Ruben Martins, Jacob Van Geen, Isil Dillig, and Swarat Chaudhuri. “Component-based synthesis of table consolidation and transformation tasks from examples”. In: ACM SIGPLAN Notices. Vol. 52. 6. ACM. 2017, pp. 422–436. [84] John K Feser, Swarat Chaudhuri, and Isil Dillig. “Synthesizing data structure transformations from input-output examples”. In: ACM SIGPLAN Notices 50.6 (2015), pp. 229–239. [85] Robert W. Floyd. “Assigning Meanings to Programs”. In: Proceedings of Symposium on Applied Mathematics. 1967. [86] Jonathan Frankle, Peter-Michael Osera, David Walker, and Steve Zdancewic. “Example-directed synthesis: a type-theoretic interpretation”. In: ACM Sigplan Notices 51.1 (2016), pp. 802–815. 133 [87] Mikhail YR Gadelha, Hussama I Ismail, and Lucas C Cordeiro. “Handling loops in bounded model checking of C programs via k-induction”. In: International Journal on Software Tools for Technology Transfer 19.1 (2017), pp. 97–114. [88] Pengfei Gao, Hongyi Xie, Jun Zhang, Fu Song, and Taolue Chen. “Quantitative Verication of Masked Arithmetic Programs Against Side-Channel Attacks”. In: International Conference on Tools and Algorithms for Construction and Analysis of Systems. 2019, pp. 155–173. [89] Pranav Garg, Christof Löding, P Madhusudan, and Daniel Neider. “ICE: A robust framework for learning invariants”. In: International Conference on Computer Aided Verication. Springer. 2014, pp. 69–87. [90] Pranav Garg, Daniel Neider, Parthasarathy Madhusudan, and Dan Roth. “Learning invariants using decision trees and implication counterexamples”. In: ACM Sigplan Notices 51.1 (2016), pp. 499–512. [91] Timon Gehr, Dimitar Dimitrov, and Martin Vechev. “Learning commutativity specications”. In: International Conference on Computer Aided Verication. Springer. 2015, pp. 307–323. [92] Daniel Genkin, Itamar Pipman, and Eran Tromer. “Get your hands o my laptop: Physical side-channel key-extraction attacks on PCs”. In: Journal of Cryptographic Engineering 5.2 (2015), pp. 95–112. [93] Lal George and Andrew W. Appel. “Iterated register coalescing”. In: Conference Record of POPL’96: The 23rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Papers Presented at the Symposium, St. Petersburg Beach, Florida, USA, January 21-24, 1996. 1996, pp. 208–218. [94] Louis Goubin. “A sound method for switching between boolean and arithmetic masking”. In: International Workshop on Cryptographic Hardware and Embedded Systems. Springer. 2001, pp. 3–15. [95] Marc Gourjon. Towards Secure Compilation of Power Side-Channel Countermeasures. 2019. [96] Radu Grigore and Hongseok Yang. “Abstraction renement guided by a learnt probabilistic model”. In: ACM SIGPLAN Notices. Vol. 51. 1. ACM. 2016, pp. 485–498. [97] Sumit Gulwani, William R Harris, and Rishabh Singh. “Spreadsheet data manipulation using examples”. In: Communications of the ACM 55.8 (2012), pp. 97–105. [98] Shengjian Guo, Meng Wu, and Chao Wang. “Adversarial symbolic execution for detecting concurrency-related cache timing leaks”. In: ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2018, pp. 377–388. [99] Zheng Guo, Michael James, David Justo, Jiaxiao Zhou, Ziteng Wang, Ranjit Jhala, and Nadia Polikarpova. “Program synthesis by type-guided abstraction renement”. In: Proceedings of the ACM on Programming Languages 4.POPL (2019), pp. 1–28. [100] Tihomir Gvero and Viktor Kuncak. “Synthesizing Java expressions from free-form queries”. In:Acm Sigplan Notices. Vol. 50. 10. ACM. 2015, pp. 416–432. 134 [101] Travis Hance, Marijn Heule, Ruben Martins, and Bryan Parno. “Finding Invariants of Distributed Systems: It’s a Small (Enough) World After All.” In: NSDI. 2021, pp. 115–131. [102] Kihong Heo, Hakjoo Oh, and Hongseok Yang. “Learning a variable-clustering strategy for octagon from labeled data generated by a static analysis”. In: International Static Analysis Symposium. Springer. 2016, pp. 237–256. [103] Kihong Heo, Hakjoo Oh, and Hongseok Yang. “Resource-aware program analysis via online abstraction coarsening”. In: Proceedings of the 41st International Conference on Software Engineering. IEEE Press. 2019, pp. 94–104. [104] Kihong Heo, Hakjoo Oh, Hongseok Yang, and Kwangkeun Yi. “Adaptive Static Analysis via Learning with Bayesian Optimization”. In: ACM Transactions on Programming Languages and Systems (TOPLAS) 40.4 (2018), p. 14. [105] Kihong Heo, Hakjoo Oh, and Kwangkeun Yi. “Machine-learning-guided selectively unsound static analysis”. In: Proceedings of the 39th International Conference on Software Engineering. IEEE Press. 2017, pp. 519–529. [106] C. A. R. Hoare. “An Axiomatic Basis for Computer Programming”. In: Commun. ACM 12.10 (1969), pp. 576–580. [107] Kryštof Hoder and Nikolaj Bjørner. “Generalized property directed reachability”. In: International Conference on Theory and Applications of Satisability Testing. 2012, pp. 157–171. [108] Krystof Hoder, Nikolaj Bjørner, and Leonardo de Moura. “muZ - An ecient engine for xed points with constraints”. In: Computer Aided Verication - 23rd International Conference, CAV 2011, Snowbird, UT, USA, July 14-20, 2011. Proceedings. Vol. 6806. Lecture Notes in Computer Science. 2011, pp. 457–462. [109] Hossein Hojjat and Philipp Rümmer. “The ELDARICA horn solver”. In: 2018 Formal Methods in Computer Aided Design (FMCAD). IEEE. 2018, pp. 1–7. [110] Shourong Hou, Yujie Zhou, Hongming Liu, and Nianhao Zhu. “Improved DPA attack on rotating S-boxes masking scheme”. In: Communication Software and Networks (ICCSN), 2017 IEEE 9th International Conference on. 2017, pp. 1111–1116. [111] Jinru Hua, Mengshi Zhang, Kaiyuan Wang, and Sarfraz Khurshid. “Towards practical program repair with on-demand candidate generation”. In: Proceedings of the 40th International Conference on Software Engineering. ACM. 2018, pp. 12–23. [112] Ralf Hund, Carsten Willems, and Thorsten Holz. “Practical timing side channel attacks against kernel space ASLR”. In: IEEE Symposium on Security and Privacy. 2013, pp. 191–205. [113] Yuval Ishai, Amit Sahai, and David Wagner. “Private circuits: Securing hardware against probing attacks”. In: Annual International Cryptology Conference. Springer. 2003, pp. 463–481. 135 [114] Yuval Ishai, Amit Sahai, and David A. Wagner. “Private Circuits: Securing Hardware against Probing Attacks”. In: Advances in Cryptology - CRYPTO 2003, 23rd Annual International Cryptology Conference, Santa Barbara, California, USA, August 17-21, 2003, Proceedings. 2003, pp. 463–481. [115] Herbert Jordan, Bernhard Scholz, and Pavle Subotić. “Soué: On synthesis of program analyzers”. In: Computer Aided Verication: 28th International Conference, CAV 2016, Toronto, ON, Canada, July 17-23, 2016, Proceedings, Part II 28. Springer. 2016, pp. 422–430. [116] Omer Katz, Ran El-Yaniv, and Eran Yahav. “Estimating types in binaries using predictive modeling”. In: ACM SIGPLAN Notices. Vol. 51. 1. ACM. 2016, pp. 313–326. [117] Zachary Kincaid, Jason Breck, Ashkan Forouhi Boroujeni, and Thomas Reps. “Compositional recurrence analysis revisited”. In: ACM SIGPLAN Notices 52.6 (2017), pp. 248–262. [118] Zachary Kincaid, John Cyphert, Jason Breck, and Thomas Reps. “Non-linear reasoning for invariant synthesis”. In: Proceedings of the ACM on Programming Languages 2.POPL (2017), pp. 1–33. [119] Tristan Knoth, Di Wang, Nadia Polikarpova, and Jan Homann. “Resource-guided program synthesis”. In: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation. PLDI 2019. ACM. 2019, pp. 253–268. [120] Paul Kocher, Joshua Jae, and Benjamin Jun. “Dierential power analysis”. In: Annual International Cryptology Conference. 1999, pp. 388–397. [121] Paul C Kocher. “Timing attacks on implementations of Die-Hellman, RSA, DSS, and other systems”. In: Annual International Cryptology Conference. 1996, pp. 104–113. [122] Anvesh Komuravelli, Arie Gurnkel, and Sagar Chaki. “SMT-based model checking for recursive programs”. In: Formal Methods in System Design 48.3 (2016), pp. 175–205. [123] Sudipta Kundu, Zachary Tatlock, and Sorin Lerner. “Proving optimizations correct using parameterized program equivalence”. In: ACM Sigplan Notices 44.6 (2009), pp. 327–337. [124] Monica S Lam, John Whaley, V Benjamin Livshits, Michael C Martin, Dzintars Avots, Michael Carbin, and Christopher Unkel. “Context-sensitive program analysis as database queries”. In: Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. ACM. 2005, pp. 1–12. [125] Chris Lattner and Vikram Adve. “LLVM: A compilation framework for lifelong program analysis & transformation”. In: International Symposium on Code Generation and Optimization. 2004, p. 75. [126] Xuan-Bach D Le, Duc-Hiep Chu, David Lo, Claire Le Goues, and Willem Visser. “S3: syntax-and semantic-guided repair synthesis via programming by examples”. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. ACM. 2017, pp. 593–604. [127] Alan Leung, John Sarracino, and Sorin Lerner. “Interactive parser synthesis by example”. In: ACM SIGPLAN Notices 50.6 (2015), pp. 565–574. 136 [128] Peng Li, Peng Zhang, Louis-Noel Pouchet, and Jason Cong. “Resource-aware throughput optimization for high-level synthesis”. In: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM. 2015, pp. 200–209. [129] Ziyang Li, Jiani Huang, and Mayur Naik. “Scallop: A Language for Neurosymbolic Programming”. In: Proceedings of the ACM on Programming Languages 7.PLDI (2023), pp. 1463–1487. [130] Chen Liang, Mohammad Norouzi, Jonathan Berant, Quoc V Le, and Ni Lao. “Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing”. In: NeurIPS. 2018. [131] Mulong Luo, Andrew C Myers, and G Edward Suh. “Stealthy tracking of autonomous vehicles with cache side channels”. In: 29th USENIX Security Symposium (USENIX Security 20). 2020, pp. 859–876. [132] Abhranil Maiti and Patrick Schaumont. “Improved Ring Oscillator PUF: An FPGA-friendly Secure Primitive”. In: J. Cryptology 24.2 (2011), pp. 375–397. [133] Pasquale Malacaria, MHR Khouzani, Corina S Pasareanu, Quoc-Sang Phan, and Kasper Luckow. “Symbolic side-channel analysis for probabilistic programs”. In: 2018 IEEE 31st Computer Security Foundations Symposium (CSF). IEEE. 2018, pp. 313–327. [134] Ravi Mangal, Xin Zhang, Aditya V Nori, and Mayur Naik. “A user-guided approach to program analysis”. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. ACM. 2015, pp. 462–473. [135] Stefan Mangard. “A simple power-analysis (SPA) attack on implementations of the AES key expansion”. In: International Conference on Information Security and Cryptology. Springer. 2002, pp. 343–358. [136] Stefan Mangard, Elisabeth Oswald, and Thomas Popp. Power analysis attacks - revealing the secrets of smart cards. 2007. [137] Jiayuan Mao, Chuang Gan, Pushmeet Kohli, Joshua B Tenenbaum, and Jiajun Wu. “The neuro-symbolic concept learner: Interpreting scenes, words, and sentences from natural supervision”. In: arXiv preprint arXiv:1904.12584 (2019). [138] Mikaël Mayer, Gustavo Soares, Maxim Grechkin, Vu Le, Mark Marron, Oleksandr Polozov, Rishabh Singh, Benjamin Zorn, and Sumit Gulwani. “User interaction models for disambiguation in programming by example”. In: Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology. 2015, pp. 291–301. [139] David McCann, Carolyn Whitnall, and Elisabeth Oswald. “ELMO: Emulating Leaks for the ARM Cortex-M0 without Access to a Side Channel Lab.” In: IACR Cryptology ePrint Archive 2016 (2016), p. 517. [140] Kenneth L McMillan. “Lazy annotation revisited”. In: International Conference on Computer Aided Verication. Springer. 2014, pp. 243–259. 137 [141] Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury. “Angelix: Scalable multiline program patch synthesis via symbolic analysis”. In: Proceedings of the 38th international conference on software engineering. ACM. 2016, pp. 691–701. [142] Charith Mendis, Cambridge Yang, Yewen Pu, Saman Amarasinghe, and Michael Carbin. “Compiler Auto-Vectorization with Imitation Learning”. In: Advances in Neural Information Processing Systems. 2019, pp. 14598–14609. [143] Thomas S Messerges. “Using second-order power analysis to attack DPA resistant software”. In: International Workshop on Cryptographic Hardware and Embedded Systems. 2000, pp. 238–251. [144] Thomas S Messerges, Ezzat A Dabbish, and Robert H Sloan. “Examining smart-card security under the threat of power analysis attacks”. In: IEEE transactions on computers 51.5 (2002), pp. 541–552. [145] Thomas S Messerges, Ezzy A Dabbish, and Robert H Sloan. “Investigations of power analysis attacks on smartcards.” In: Smartcard 99 (1999), pp. 151–161. [146] Amir Moradi. “Side-channel leakage through static power”. In: International Workshop on Cryptographic Hardware and Embedded Systems. Springer. 2014, pp. 562–579. [147] Dmitry Mordvinov and Grigory Fedyukovich. “Property directed inference of relational invariants”. In: 2019 Formal Methods in Computer Aided Design (FMCAD). IEEE. 2019, pp. 152–160. [148] Andrew Moss, Elisabeth Oswald, Dan Page, and Michael Tunstall. “Compiler assisted masking”. In: International Conference on Cryptographic Hardware and Embedded Systems. 2012, pp. 58–75. [149] Leonardo Mendonça de Moura and Nikolaj S. Bjørner. “Z3: An Ecient SMT Solver”. In: International Conference on Tools and Algorithms for the Construction and Analysis of Systems. Ed. by C. R. Ramakrishnan and Jakob Rehof. 2008. [150] Stephen Muggleton, Dianhuan Lin, and Alireza Tamaddoni-Nezhad. “Meta-interpretive Learning of Higher-order Dyadic Datalog: Predicate Invention Revisited”. In: Machine Learning 100.1 (July 2015), pp. 49–73. [151] ThanhVu Nguyen, Timos Antonopoulos, Andrew Ruef, and Michael Hicks. “Counterexample-guided approach to nding numerical invariants”. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. 2017, pp. 605–615. [152] Thanhvu Nguyen, Deepak Kapur, Westley Weimer, and Stephanie Forrest. “Dig: A dynamic invariant generator for polynomial and array invariants”. In: ACM Transactions on Software Engineering and Methodology (TOSEM) 23.4 (2014), pp. 1–30. [153] ThanhVu Nguyen, Deepak Kapur, Westley Weimer, and Stephanie Forrest. “Using dynamic analysis to discover polynomial and array invariants”. In: 2012 34th International Conference on Software Engineering (ICSE). IEEE. 2012, pp. 683–693. [154] Shirin Nilizadeh, Yannic Noller, and Corina S. Pasareanu. “DifFuzz: dierential fuzzing for side-channel analysis”. In: International Conference on Software Engineering. 2019, pp. 176–187. 138 [155] NIST. “NIST Selects Winner of the Secure Hash Algorithm (SHA-3) Competition”. In: https://www.nist.gov/news-events/news/2012/10/nist-selects-winner-secure-hash-algorithm- sha-3-competition (2012). [156] Hakjoo Oh, Hongseok Yang, and Kwangkeun Yi. “Learning a strategy for adapting a program analysis via bayesian optimisation”. In: ACM SIGPLAN Notices. Vol. 50. 10. ACM. 2015, pp. 572–588. [157] Peter-Michael Osera and Steve Zdancewic. “Type-and-example-directed program synthesis”. In: ACM SIGPLAN Notices 50.6 (2015), pp. 619–630. [158] Inès Ben El Ouahma, Quentin Meunier, Karine Heydemann, and Emmanuelle Encrenaz. “Symbolic approach for Side-Channel resistance analysis of masked assembly codes”. In: Security Proofs for Embedded Systems. 2017. [159] Saswat Padhi, Rahul Sharma, and Todd Millstein. “Data-driven precondition inference with learned features”. In: ACM SIGPLAN Notices 51.6 (2016), pp. 42–56. [160] Saswat Padhi, Rahul Sharma, and Todd Millstein. “Loopinvgen: A loop invariant generator based on precondition inference”. In: arXiv preprint arXiv:1707.02029 (2017). [161] Kostas Papagiannopoulos and Nikita Veshchikov. “Mind the gap: towards secure 1st-order masking in software”. In: International Workshop on Constructive Side-Channel Analysis and Secure Design. Springer. 2017, pp. 282–297. [162] Corina S. Pasareanu, Quoc-Sang Phan, and Pasquale Malacaria. “Multi-run Side-Channel analysis using symbolic execution and Max-SMT”. In: IEEE Computer Security Foundations Symposium. 2016, pp. 387–400. [163] Corina S Pasareanu, Quoc-Sang Phan, and Pasquale Malacaria. “Multi-run side-channel analysis using Symbolic Execution and Max-SMT”. In: 2016 IEEE 29th Computer Security Foundations Symposium (CSF). IEEE. 2016, pp. 387–400. [164] Quoc-Sang Phan, Lucas Bang, Corina S. Pasareanu, Pasquale Malacaria, and Tevk Bultan. “Synthesis of Adaptive Side-Channel Attacks”. In: IEEE Computer Security Foundations Symposium. 2017, pp. 328–342. [165] Lauren Pick, Grigory Fedyukovich, and Aarti Gupta. “Exploiting synchrony and symmetry in relational verication”. In: International Conference on Computer Aided Verication. Springer. 2018, pp. 164–182. [166] Massimiliano Poletto and Vivek Sarkar. “Linear scan register allocation”. In: ACM Trans. Program. Lang. Syst. 21.5 (1999), pp. 895–913. [167] Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. “Program synthesis from polymorphic renement types”. In: ACM SIGPLAN Notices. Vol. 51. 6. ACM. 2016, pp. 522–538. [168] Emmanuel Prou and Matthieu Rivain. “Masking against side-channel attacks: A formal security proof”. In: Annual International Conference on the Theory and Applications of Cryptographic Techniques. 2013, pp. 142–159. 139 [169] Jean-Jacques Quisquater and David Samyde. “Electromagnetic analysis (ema): Measures and counter-measures for smart cards”. In: Smart Card Programming and Security. 2001, pp. 200–210. [170] Veselin Raychev, Martin Vechev, and Andreas Krause. “Predicting program properties from big code”. In: ACM SIGPLAN Notices. Vol. 50. 1. ACM. 2015, pp. 111–124. [171] Oscar Reparaz, Begül Bilgin, Svetla Nikova, Benedikt Gierlichs, and Ingrid Verbauwhede. “Consolidating masking schemes”. In: Annual Cryptology Conference. Springer. 2015, pp. 764–783. [172] Thomas W. Reps. “Demand Interprocedural Program Analysis Using Logic Databases”. In: Applications of Logic Databases. Springer, 1995, pp. 163–196. [173] Andrew Reynolds, Haniel Barbosa, Daniel Larraz, and Cesare Tinelli. “Scalable Algorithms for Abduction via Enumerative Syntax-Guided Synthesis”. In: International Joint Conference on Automated Reasoning. Springer. 2020, pp. 141–160. [174] Matthieu Rivain and Emmanuel Prou. “Provably secure higher-order masking of AES”. In: International Workshop on Cryptographic Hardware and Embedded Systems. Springer. 2010, pp. 413–427. [175] Enric Rodríguez-Carbonell and Deepak Kapur. “Automatic generation of polynomial loop invariants: Algebraic foundations”. In: Proceedings of the 2004 international symposium on Symbolic and algebraic computation. 2004, pp. 266–273. [176] Enric Rodríguez-Carbonell and Deepak Kapur. “Generating all polynomial invariants in simple loops”. In: Journal of Symbolic Computation 42.4 (2007), pp. 443–476. [177] Lior Rokach and Oded Maimon. “Top-down induction of decision trees classiers-a survey”. In: IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 35.4 (2005), pp. 476–487. [178] Reudismam Rolim, Gustavo Soares, Loris D’Antoni, Oleksandr Polozov, Sumit Gulwani, Rohit Gheyi, Ryo Suzuki, and Björn Hartmann. “Learning syntactic program transformations from examples”. In: Proceedings of the 39th International Conference on Software Engineering. IEEE Press. 2017, pp. 404–415. [179] Ulrich Rührmair, Heike Busch, and Stefan Katzenbeisser. “Strong PUFs: models, constructions, and security proofs”. In: Towards Hardware-Intrinsic Security - Foundations and Practice. 2010, pp. 79–96. [180] Philipp Rümmer, Hossein Hojjat, and Viktor Kuncak. “Disjunctive interpolants for Horn-clause verication”. In: International Conference on Computer Aided Verication. Springer. 2013, pp. 347–363. [181] Stuart Russell and Peter Norvig. “Articial intelligence: a modern approach”. In: (2002). [182] Gabriel Ryan, Justin Wong, Jianan Yao, Ronghui Gu, and Suman Jana. “CLN2INV: learning loop invariants with continuous logic networks”. In: arXiv preprint arXiv:1909.11542 (2019). 140 [183] Bernhard Scholz, Herbert Jordan, Pavle Subotić, and Till Westmann. “On fast large-scale program analysis in datalog”. In: Proceedings of the 25th International Conference on Compiler Construction. 2016, pp. 196–206. [184] Kai Schramm and Christof Paar. “Higher order masking of the AES”. In: Cryptographers’ track at the RSA conference. Springer. 2006, pp. 208–225. [185] Rahul Sharma and Alex Aiken. “From Invariant Checking to Invariant Inference Using Randomized Search”. In: International Conference on Computer Aided Verication. Springer. 2014, pp. 88–105. [186] Rahul Sharma and Alex Aiken. “From invariant checking to invariant inference using randomized search”. In: Formal Methods in System Design 48.3 (2016), pp. 235–256. [187] Rahul Sharma, Saurabh Gupta, Bharath Hariharan, Alex Aiken, and Aditya V Nori. “Verication as learning geometric concepts”. In: International Static Analysis Symposium. Springer. 2013, pp. 388–411. [188] Rahul Sharma, Eric Schkufza, Berkeley Churchill, and Alex Aiken. “Data-driven equivalence checking”. In: Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications. 2013, pp. 391–406. [189] Mary Sheeran, Satnam Singh, and Gunnar Stålmarck. “Checking safety properties using induction and a SAT-solver”. In: International conference on formal methods in computer-aided design. Springer. 2000, pp. 127–144. [190] Ron Shemer, Arie Gurnkel, Sharon Shoham, and Yakir Vizel. “Property directed self composition”. In: International Conference on Computer Aided Verication. Springer. 2019, pp. 161–179. [191] Xujie Si, Hanjun Dai, Mukund Raghothaman, Mayur Naik, and Le Song. “Learning loop invariants for program verication”. In: Neural Information Processing Systems. 2018. [192] Xujie Si, Woosuk Lee, Richard Zhang, Aws Albarghouthi, Paraschos Koutris, and Mayur Naik. “Syntax-guided synthesis of Datalog programs”. In: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM. 2018, pp. 515–527. [193] Xujie Si, Aaditya Naik, Hanjun Dai, Mayur Naik, and Le Song. “Code2Inv: a deep learning framework for program verication”. In: International Conference on Computer Aided Verication. Springer. 2020, pp. 151–164. [194] Xujie Si, Mukund Raghothaman, Kihong Heo, and Mayur Naik. “Synthesizing Datalog Programs using Numerical Relaxation”. In: arXiv preprint arXiv:1906.00163 (2019). [195] Gagandeep Singh, Markus Püschel, and Martin Vechev. “Fast numerical program analysis with reinforcement learning”. In: International Conference on Computer Aided Verication. CAV 2018. Springer. 2018, pp. 211–229. [196] Rishabh Singh and Sumit Gulwani. “Predicting a correct program in programming by example”. In: International Conference on Computer Aided Verication. CAV 2015. Springer. 2015, pp. 398–414. 141 [197] Rishabh Singh and Sumit Gulwani. “Synthesizing number transformations from input-output examples”. In: International Conference on Computer Aided Verication. CAV 2012. Springer. 2012, pp. 634–651. [198] Calvin Smith and Aws Albarghouthi. “MapReduce program synthesis”. In: Acm Sigplan Notices 51.6 (2016), pp. 326–340. [199] Armando Solar-Lezama, Liviu Tancau, Rastislav Bodik, Sanjit Seshia, and Vijay Saraswat. “Combinatorial sketching for nite programs”. In: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems. 2006, pp. 404–415. [200] Marcelo Sousa and Isil Dillig. “Cartesian hoare logic for verifying k-safety properties”. In: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation. 2016, pp. 57–69. [201] François-Xavier Standaert, Tal G Malkin, and Moti Yung. “A unied framework for the analysis of side-channel key recovery attacks”. In: Annual International Conference on the Theory and Applications of Cryptographic Techniques. 2009, pp. 443–461. [202] Chungha Sung, Brandon Paulsen, and Chao Wang. “CANAL: A cache timing analysis framework via LLVM transformation”. In: IEEE/ACM International Conference On Automated Software Engineering. 2018. [203] Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2018. [204] Ross Tate, Michael Stepp, Zachary Tatlock, and Sorin Lerner. “Equality saturation: a new approach to optimization”. In: Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 2009, pp. 264–276. [205] Saeid Tizpaz-Niari, Pavol Čern` y, and Ashutosh Trivedi. “Quantitative mitigation of timing side channels”. In: International Conference on Computer Aided Verication. CAV 2019. Springer. 2019, pp. 140–160. [206] Omer Tripp, Salvatore Guarnieri, Marco Pistoia, and Aleksandr Aravkin. “Aletheia: Improving the usability of static security analysis”. In: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security. ACM. 2014, pp. 762–774. [207] Chao Wang and Patrick Schaumont. “Security by compilation: an automated approach to comprehensive side-channel resistance”. In: ACM SIGLOG News 4.2 (2017), pp. 76–89. [208] Chao Wang and Patrick Schaumont. “Security by compilation: an automated approach to comprehensive side-channel resistance”. In: ACM SIGLOG News 4.2 (2017), pp. 76–89. [209] Jingbo Wang, Chungha Sung, Mukund Raghothaman, and Chao Wang. “Data-Driven Synthesis of Provably Sound Side Channel Analyses”. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE. 2021, pp. 810–822. 142 [210] Jingbo Wang, Chungha Sung, and Chao Wang. “Mitigating power side channels during compilation”. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2019, pp. 590–601. [211] Jingbo Wang and Chao Wang. “Learning to Synthesize Relational Invariants”. In: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 2022, pp. 1–12. [212] Shuai Wang, Yuyan Bao, Xiao Liu, Pei Wang, Danfeng Zhang, and Dinghao Wu. “Identifying Cache-Based Side Channels through Secret-Augmented Abstract Interpretation”. In: CoRR abs/1905.13332 (2019). [213] Shuai Wang, Pei Wang, Xiao Liu, Danfeng Zhang, and Dinghao Wu. “CacheD: Identifying cache-based timing channels in production software”. In: USENIX Security Symposium. 2017, pp. 235–252. [214] Xinyu Wang, Isil Dillig, and Rishabh Singh. “Program synthesis using abstraction renement”. In: Proceedings of the ACM on Programming Languages 2.POPL (2017), pp. 1–30. [215] Yingchen Wang, Riccardo Paccagnella, Elizabeth Tang He, Hovav Shacham, Christopher W Fletcher, and David Kohlbrenner. “Hertzbleed: Turning powerfSide-Channelg attacks into remote timing attacks on x86”. In: 31st USENIX Security Symposium (USENIX Security 22). 2022, pp. 679–697. [216] Yuepeng Wang, Isil Dillig, Shuvendu K Lahiri, and William R Cook. “Verifying equivalence of database-driven applications”. In: Proceedings of the ACM on Programming Languages 2.POPL (2017), pp. 1–29. [217] John Whaley, Dzintars Avots, Michael Carbin, and Monica S Lam. “Using datalog with binary decision diagrams for program analysis”. In: Asian Symposium on Programming Languages and Systems. Springer. 2005, pp. 97–118. [218] John Whaley and Monica Lam. “Cloning-based Context-sensitive Pointer Alias Analysis Using Binary Decision Diagrams”. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. PLDI 2004. ACM, 2004, pp. 131–144. [219] John Whaley and Monica S Lam. “Cloning-based context-sensitive pointer alias analysis using binary decision diagrams”. In: ACM SIGPLAN Notices. Vol. 39. 6. ACM. 2004, pp. 131–144. [220] Meng Wu, Shengjian Guo, Patrick Schaumont, and Chao Wang. “Eliminating timing side-channel leaks using program repair”. In: International Symposium on Software Testing and Analysis. 2018. [221] Meng Wu and Chao Wang. “Abstract Interpretation under Speculative Execution”. In: ACM SIGPLAN Conference on Programming Language Design and Implementation. 2019. [222] Yingfei Xiong, Jie Wang, Runfa Yan, Jiachen Zhang, Shi Han, Gang Huang, and Lu Zhang. “Precise condition synthesis for program repair”. In: 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). IEEE. 2017, pp. 416–426. [223] Hongseok Yang. “Relational separation logic”. In: Theoretical Computer Science 375.1-3 (2007), pp. 308–334. 143 [224] Jianan Yao, Gabriel Ryan, Justin Wong, Suman Jana, and Ronghui Gu. “Learning nonlinear loop invariants with gated continuous logic networks”. In: Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation. 2020, pp. 106–120. [225] Yuan Yao, Mo Yang, Conor Patrick, Bilgiday Yuce, and Patrick Schaumont. “Fault-assisted side-channel analysis of masked implementations”. In: 2018 IEEE International Symposium on Hardware Oriented Security and Trust (HOST). IEEE. 2018, pp. 57–64. [226] Manzil Zaheer, Jean-Baptiste Tristan, Michael L Wick, and Guy L Steele Jr. “Learning a Static Analyzer: A Case Study on a Toy Language”. In: (2016). [227] Jun Zhang, Pengfei Gao, Fu Song, and Chao Wang. “SC Infer: renement-based verication of software countermeasures against side-channel attacks”. In: International Conference on Computer Aided Verication. CAV 2018. Springer. 2018, pp. 157–177. [228] Jun Zhang, Pengfei Gao, Fu Song, and Chao Wang. “SCInfer: Renement-based verication of software countermeasures against Side-Channel attacks”. In: International Conference on Computer Aided Verication. 2018. [229] Rui Zhang, Shuang Qiu, and Yongbin Zhou. “Further improving eciency of higher order masking schemes by decreasing randomness complexity”. In: IEEE Transactions on Information Forensics and Security 12.11 (2017), pp. 2590–2598. [230] Xin Zhang, Ravi Mangal, Radu Grigore, Mayur Naik, and Hongseok Yang. “On abstraction renement for program analyses in Datalog”. In: ACM SIGPLAN Notices. Vol. 49. 6. ACM. 2014, pp. 239–248. [231] Yongbin Zhou and Dengguo Feng. “Side-Channel Attacks: Ten years after its publication and the impacts on cryptographic module security testing.” In:IACRCryptologyePrintArchive (2005), p. 388. [232] He Zhu, Stephen Magill, and Suresh Jagannathan. “A data-driven CHC solver”. In: ACM SIGPLAN Notices 53.4 (2018), pp. 707–721. 144
Abstract (if available)
Abstract
The objective of my dissertation research is to develop rigorous methods and analysis tools for improving the security of software systems. My primary focus is on an emerging class of security threats termed as side- channel attacks. During a side-channel attack, the adversary relies on exploiting statistical dependencies between the secret data (e.g., passwords or encryption keys) and seemingly unrelated non-functional properties (e.g., power consumption or execution time) of the computer. In particular, power side-channel leaks are caused by statistical dependencies instead of syntactic or semantic dependencies between sources and sinks; thus, existing techniques that focus primarily on classic information-flow security (e.g., taint analysis) would not work.
I have designed and implemented an automated framework leveraging program analysis and synthesis to help detect and mitigate statistical dependencies. First, I developed a set of type inference rules to capture and detect these dependencies, and then a set of transformation-based methods to mitigate them. Second, to adapt these type inference rules to constantly evolving program characteristics, I proposed a data-driven method for learning provably sound side-channel analysis rules from annotated programs. Third, to ensure the correctness of the mitigation, I developed new methods to help prove the equivalence of the original and mitigated programs. Finally, I developed an extension of the side-channel analysis framework, to add the capability of quantifying the security risk. Experimental evaluations demonstrated that the efficiency and precision of these methods in detecting and eliminating side-channel related statistical dependencies, which in turn leads to more secure software for critical applications.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Formal analysis of data poisoning robustness of K-nearest neighbors
PDF
Automatic detection and optimization of energy optimizable UIs in Android applications using program analysis
PDF
Detecting SQL antipatterns in mobile applications
PDF
Automated repair of layout accessibility issues in mobile applications
PDF
Constraint-based program analysis for concurrent software
PDF
Improving binary program analysis to enhance the security of modern software systems
PDF
Detection, localization, and repair of internationalization presentation failures in web applications
PDF
Static program analyses for WebAssembly
PDF
Analysis of embedded software architecture with precedent dependent aperiodic tasks
PDF
Utilizing user feedback to assist software developers to better use mobile ads in apps
PDF
Security functional requirements analysis for developing secure software
PDF
Data-driven and logic-based analysis of learning-enabled cyber-physical systems
PDF
Learning logical abstractions from sequential data
PDF
Attacks and defense on privacy of hardware intellectual property and machine learning
PDF
Security-driven design of logic locking schemes: metrics, attacks, and defenses
PDF
Graph machine learning for hardware security and security of graph machine learning: attacks and defenses
PDF
The effects of required security on software development effort
PDF
Calculating architectural reliability via modeling and analysis
PDF
Automated synthesis of domain-specific model interpreters
PDF
Assessing software maintainability in systems by leveraging fuzzy methods and linguistic analysis
Asset Metadata
Creator
Wang, Jingbo
(author)
Core Title
Side-channel security enabled by program analysis and synthesis
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Degree Conferral Date
2023-12
Publication Date
09/14/2023
Defense Date
08/24/2023
Publisher
Los Angeles, California
(original),
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
OAI-PMH Harvest,program analysis,program synthesis,side-channel security
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Wang, Chao (
committee chair
), Deshmukh, Jyotirmoy (
committee member
), Medvidovic, Nenad (
committee member
), Nuzzo, Pierluigi (
committee member
), Raghothaman, Mukund (
committee member
)
Creator Email
jingbow@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC113376975
Unique identifier
UC113376975
Identifier
etd-WangJingbo-12381.pdf (filename)
Legacy Identifier
etd-WangJingbo-12381
Document Type
Thesis
Format
theses (aat)
Rights
Wang, Jingbo
Internet Media Type
application/pdf
Type
texts
Source
20230918-usctheses-batch-1098
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
program analysis
program synthesis
side-channel security