Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Data-driven and logic-based analysis of learning-enabled cyber-physical systems
(USC Thesis Other)
Data-driven and logic-based analysis of learning-enabled cyber-physical systems
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Data-driven and Logic-based Analysis of Learning-enabled Cyber-Physical Systems by Xin Qin A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) August 2024 Copyright 2024 Xin Qin Dedication To my advisor, mom, dad, husband, and the school. ii Acknowledgements I would like to express my sincere gratitude to my advisor, Prof. Jyotirmoy V. Deshmukh, who has provided exceptional guidance and support over the past six years. I am also thankful to the school for offering a beautiful campus, delicious dining halls, engaging activities, and the opportunity to meet wonderful students in my cohort. I am deeply grateful to my parents for their unwavering support, both emotional and financial, especially during the early days of my marriage. To my extended family members, thank you for your warm greetings and care. To my husband, whom I met online and married in real life, thank you for your constant care and sharing experiences in your field. I look forward to combining our knowledge from both worlds in the future. I extend my thanks to my previous advisors, Prof. Yi Ma, Prof. Hao Chen, Prof. Angjoo Kanazawa, Richard Zhang, Prof. Alexei Efros, and Prof. Jitendra Malik. Without their mentorship and support, I would not be where I am today. I truly appreciate the encouragement, valuable feedback, and insightful discussions we had. My sincere thanks also go to the professors at USC. Thank you, Prof. Chao Wang, Prof. Souti Chattopadhyay, Prof. Yan Liu, and Prof. Paul Bogdan, for being on my thesis committee and providing valuable feedback. I am grateful for the enriching discussions we had. USC has given me the invaluable opportunity to meet esteemed professors, both within the university and through connections with other institutions. I am especially thankful to Prof. Nenad Medvidovic, Prof. iii William G.J. Halfond, Prof. Pierluigi Nuzzo, Prof. Lars Lindemann, Prof. Chuchu Fan, Dr. Nickovic Dejan, Prof. Ezio Bartocci, Dr. Mateis Cristinel for the inspiring conversations we shared. I cherish the friendships I have formed with my schoolmates. Special thanks go to Violeta Padilla, Dr. Yannan Li, Dr. Jingbo Wang, Dr. Zunchen Huang, Dr. Shengjian Guo, Dr. Yuchen Lin, Dr. Yufeng Yin, and Dr. Jun Yan for their encouragement, companionship, and support. I am also grateful to my brilliant and kind labmates for the fun times and great collaborations. Thank you, Dr. Sara Mohammadinejad, Dr. Aniruddh Puranic, Anand Balakrishnan, Navid Hashemi, Sheryl Paul, Vidisha Kudalkar, Yuan Xia, Yuriy Biktairov, and Merve Atasever. During challenging times and moments of nostalgia, the greetings from my childhood friends have been invaluable. Thank you, Jianwen Chen, Yue Wu, Xiwen Shen, Menghan Guo, Jiaying Li, Zibei Zhang, Yizhou Shen, Dongyu Gu, and Yuxin Li, for staying in touch and sending your kind wishes. Finally, I extend my appreciation to the conference and journal reviewers for their feedback, and to the funding bodies for supporting our research. Your support fuels our progress. iv Table of Contents Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.1 Statistical verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.2 Runtime Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.1 Conformance Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.2 Robust Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Specification Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3.1 Shape Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3.2 Mining Shape Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Chapter 2: Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1 Signal Temporal Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Conformal Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Risk Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Chapter 3: Statistical Verification using Surrogate Models and Conformal Inference and a Comparison with Risk-aware Verification . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.2 Learning Surrogate Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.3 Conformal Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.3.1 Conformal Inference Recap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.3.2 Computing (c, ϵ)-probabilistic surrogate models . . . . . . . . . . . . . . . . . . . . 23 3.3.3 Naïve Parameter Space Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.3.4 Gaussian Processes for Refinement . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.4 Risk Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.4.1 Risk Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.4.2 Computing risk against different system parameters . . . . . . . . . . . . . . . . . 31 v 3.5 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.5.1 Comparing with risk measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.6 Related Work and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Chapter 4: Conformal Prediction for STL Runtime Verification . . . . . . . . . . . . . . . . . . . . . 45 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.1.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.2.1 Trajectory Predictors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.2.2 Predictive Runtime Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.3 Conformal Prediction for Predictive Runtime Verification . . . . . . . . . . . . . . . . . . . 52 4.3.1 Direct STL Predictive Runtime Verification . . . . . . . . . . . . . . . . . . . . . . . 52 4.3.2 Indirect STL Predictive Runtime Verification . . . . . . . . . . . . . . . . . . . . . . 55 4.4 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.4.1 F-16 Aircraft Simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.4.2 Autonomous Driving in CARLA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Chapter 5: Conformance Testing for Stochastic Cyber-Physical Systems . . . . . . . . . . . . . . . 67 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.2 Problem Statement and Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.2.1 Distance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.3 Conformance for Stochastic Input-Output Systems . . . . . . . . . . . . . . . . . . . . . . . 72 5.4 Transference of System Properties under Conformance . . . . . . . . . . . . . . . . . . . . 75 5.4.1 Transference under stochastic conformance . . . . . . . . . . . . . . . . . . . . . . 76 5.4.2 Transference under non-conformance risk . . . . . . . . . . . . . . . . . . . . . . . 78 5.5 Statistical Estimation of Stochastic Conformance . . . . . . . . . . . . . . . . . . . . . . . . 80 5.5.1 Estimating stochastic conformance . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.5.2 Estimating non-conformance risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.6 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 5.6.1 Dubin’s car. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 5.6.2 F-16 aircraft. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.6.3 Autonomous Driving using the CARLA simulator. . . . . . . . . . . . . . . . . . . 90 5.6.4 Spacecraft Rendezvous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 5.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 Chapter 6: Robust Testing for Cyber-Physical Systems using Reinforcement Learning . . . . . . . . 93 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 6.2 Problem Statement and Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 6.2.1 Adversarial Testing through Policy Synthesis . . . . . . . . . . . . . . . . . . . . . 98 6.2.2 Policy synthesis through Reinforcement Learning . . . . . . . . . . . . . . . . . . . 98 6.3 Learning Constrained Adversarial Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 6.4 Rationale for Robust Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 6.5 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.5.1 Benchmarking Generalizability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.5.2 Autonomous Driving Case studies . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 6.6 Related Work and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 vi Chapter 7: Shape Expressions for Specifying and Extracting Signal Features . . . . . . . . . . . . . 119 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 7.2 Shape Expressions and Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 7.2.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 7.2.2 Shape Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 7.2.3 Shape Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 7.3 Pattern Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 7.4 Policy Scheduler for Shape Matching Automata . . . . . . . . . . . . . . . . . . . . . . . . 132 7.5 Implementation and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 7.5.1 Detection of Anomalous Patterns in ECG . . . . . . . . . . . . . . . . . . . . . . . 135 7.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 Chapter 8: Mining Shape Expressions from Positive Examples . . . . . . . . . . . . . . . . . . . . . 139 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 8.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 8.3 Learning Shape Expressions from Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 145 8.3.1 Approximating Time Series with Sequences of Linear Segments . . . . . . . . . . . 145 8.3.2 Abstracting Sequences of Linear Segments to Finite Traces over Finite Alphabets . 154 8.3.3 Inferring Expressions from Finite Traces . . . . . . . . . . . . . . . . . . . . . . . . 156 8.3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 8.4 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 8.4.1 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 8.4.2 Mining Patterns in ECG Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 8.4.3 Mining robot motion patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 8.5 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Chapter 9: Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 vii List of Tables 3.1 Comparison of Alg. 2 for different partitioning strategies. (1 − ϵ) = 0.95 . . . . . . . . . . 33 3.2 Performance of Algorithm 2 using the GP-based greatest uncertainty split method with 95% confidence level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.3 Number of CVaR and VaR not bounded by the upper and lower bound of Conformal Inference, with −ρ as loss function and 95% as confidence level . . . . . . . . . . . . . . . 41 3.4 Number of CVaR and VaR not bounded by the upper and lower bound of Conformal Inference, with ρ as loss function and 95% as confidence level . . . . . . . . . . . . . . . . . 42 5.1 Effect of calibration set size on the validation score and risk measures. The size of the test set, i.e., |Dtest|, is 1000. We use the conformal prediction procedure from Section 5.5 to obtain δ as defined in Definition 5.3.1 for ϵ = 0.05. . . . . . . . . . . . . . . . . . . . . . . 86 5.2 Empirical evaluation of transference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 5.3 Transference results for various case studies. We use ϵ = 0.05 and ϵ¯ = 0.05. As before, ρ1 is used as short-hand for ρ(ϕ, Y1) for each spec, and d∞ is used as short-hand for d∞(Y1, Y2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 5.4 Empirical validation of risk transference for all case studies. As before, ρi is short-hand for ρ(ϕ, Yi), and d∞ is short-hand for d∞(Y1, Y2). Here, we set the risk level β = ϵ in each case. 88 6.1 Empirical demonstration of the robustness of adversarial testing. . . . . . . . . . . . . . . . 109 6.2 Ego Specifications and Adversarial Rules for case studies . . . . . . . . . . . . . . . . . . . 111 6.3 Demonstration of Theorem 6.4.5. In all cases, the mean robustness degradation is bounded below as predicted by the theorem. Initial conditions with small value function degradation β are more likely to yield counterexamples, as predicted by the theory. The column success init and failure init shows the number of initial conditions that leads to a successful falsifying case or a non-successful falsifying case. . . . . . . . . . . . . . . . . . 114 7.1 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 viii 8.1 Computational cost of the specification mining algorithm. . . . . . . . . . . . . . . . . . . 160 8.2 Sensitivity of specification mining to the maximum error threshold. . . . . . . . . . . . . . 160 ix List of Figures 2.1 Trajectories satisfying / violating STL formulas . . . . . . . . . . . . . . . . . . . . . . . . 10 3.1 Overview of our approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2 Illustration of the Value-at-Risk, and the Conditional Value-at-Risk with ε = 0.7 . . . . . 31 3.3 Mountain Car parameter space partitioning using different approaches . . . . . . . . . . . . . . . 32 3.4 Lane Keep Assist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.5 F16 - Pull up (Top) Level Flight (bottom). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.6 F16-Ground Collision Avoidance (GCAS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.7 F16 - Pull up. 100 samples per region. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.8 F16 - Pull up. With more samples (1000) per region. . . . . . . . . . . . . . . . . . . . . . . . . 41 3.9 F16 - Pull up 100 samples per region. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.1 Overview of the proposed STL predictive runtime verification algorithms. Both algorithms use past observations (x0, . . . , xt) to obtain state predictions (ˆxt+1, xˆt+2, . . .). The direct algorithm calculates the satisfaction measure ρ(ϕ, xˆ) of the specification ϕ based on these predictions, and obtains a prediction region C for the unknown satisfaction measure ρ(ϕ, x) using conformal prediction. The indirect method obtains prediction regions for the unknown states xt+1, xt+2, . . . using conformal prediction first, and then obtains a lower of the unknown satisfaction measure ρ(ϕ, x) based on the state prediction regions. 46 4.2 Ten realizations of two stochastic systems (solid lines) and corresponding LSTM predictions at time t := 100 (red dashed lines). The specification is that trajectories should be within the green box between 150 and 250 time units. . . . . . . . . . . . . . . . . . . . 50 x 4.3 LSTM predictions of the altitude h on Dtest (left, left-mid) and direct predictive runtime verification method (right-mid, right). Left: five best (in terms of mean square error) predictions on Dtest, left-mid: five worst predictions on Dtest, right-mid: histogram of the nonconformal score R(i) on Dcal for direct method, right: predicted robustness ρ(ϕ, xˆ (i) , τ0) and ground truth robustness ρ(ϕ, x(i) , τ0) on Dtest. . . . . . . . . . . . . . . . 59 4.4 Left: F-16 Fighting Falcon within the high fidelity aircraft simulator from [124]. Right: Self-driving car within the autonomous driving simulator CARLA [92]. . . . . . . . . . . . 60 4.5 Indirect predictive runtime verification method. Left, left-mid, and right-mid: histograms of the nonconformal scores R(i) of τ step ahead prediction on Dcal for τ ∈ {50, 100, 200} and the indirect method, right: worst case predicted robustness ρ¯(ϕ, xˆ (i) , τ0) and ground truth robustness ρ(ϕ, x(i) , τ0) on Dtest. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.6 LSTM predictions of the imitation learning controller on Dtest. Left: five best (in terms of mean square error) ce predictions, left-mid: five worst ce predictions, right-mid: five best θe predictions, right: five worst θe predictions. . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.7 LSTM predictions of the control barrier function controller on Dtest. Left: five best (in terms of mean square error) ce predictions, left-mid: five worst ce predictions, right-mid: five best θe predictions, right: five worst θe predictions. . . . . . . . . . . . . . . . . . . . . 63 4.8 Histograms of the nonconformal scores R(i) on Dcal and prediction region C. Left: IL controller and ϕ1, left-mid: CBF controller and ϕ1, right-mid: IL controller and ϕ2, right: CBF controller and ϕ2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.9 Predicted robustness ρ(ϕ, xˆ (i) , τ0) and ground truth robustness ρ(ϕ, x(i) , τ0) on Dtest. Left: IL controller and ϕ1, left-mid: CBF controller and ϕ1, right-mid: IL controller and ϕ2, right: CBF controller and ϕ2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.1 The solid lines refer to Y1 and the dashed lines refer to Y2; in each of the displayed plots, the initial condition for each pair of realizations is the same. . . . . . . . . . . . . . . . . . 85 5.2 Distance and robustness histogram for Dubin’s car with ϵ = ¯ϵ = 0.05. We use CV aR(d) to denote CV aR(d(Y1, Y2)). The z1 and δ are the values of conformal prediction on the calibration set of ρ(ϕdubin, Y1) and d∞(Y1, Y2). . . . . . . . . . . . . . . . . . . . . . . . . . 86 6.1 Simulation environments for case studies in the CARLA simulator[91]. . . . . . . . . . . . 96 6.2 The ego agent E is embedded in a simulation with a collection of adversarial agents Hθ i , which learn (possibly from a bank of past experience) to stress-test the ego by a particular reward function as derived from the constraints for the adversary and the ego specification. 102 6.3 Simulation environments for case studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 xi 6.4 Traces in (a) show an early episode in the driving in the lane case study. The adversary is unable to cause a collision, and the distance between the ego vehicle and the adversarial vehicle remains above the collision threshold for the duration of the episode. Traces in (b) show a later episode in which the adversary successfully causes a collision. Traces in (c) show a different behavior that the adversary successfully learned to cause a collision. . . . 111 6.5 Adversarial vehicle behaviors across episodes in the lane change maneuvers case study. . . 113 6.6 An illustration of our implementation structure . . . . . . . . . . . . . . . . . . . . . . . . 115 6.7 Yellow light case study. The green region represents the region in which the ego vehicle will run the yellow light. The adversary learns to drive the ego car into the target region. . 116 7.1 (a) Two pulses shapes (b) Idealized Pulse shape (Color figure online) . . . . . . . . . . . . . 121 7.2 Shape automaton Apulse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 7.3 Pulse train - three runs κ1, κ2 and κ3 over ξ in Aˆ pulse. . . . . . . . . . . . . . . . . . . . . . 131 7.4 Recognizing pulses in ECG signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 8.1 Illustrative example - set of pulses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 8.2 Learning shape expressions from examples - an overview. . . . . . . . . . . . . . . . . . . . 146 8.3 Example - inferring linear segments from pulses. . . . . . . . . . . . . . . . . . . . . . . . . 150 8.4 Approximating exponential decay with linear segments – robustness to noise in data. . . . 153 8.5 Example - clustering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 8.6 Inferred automaton and expression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 8.7 Train of pulses and its segmentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 8.8 Example of a heart beat from two patients. . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 xii Abstract Rigorous analysis of cyber-physical systems (CPS) is becoming increasingly important, especially for safety-critical applications incorporating learning-enabled components. Given a system requirement, such as "if the vehicle deviates from the center of the road, it should return to the center in time," we would like to evaluate how well the system satisfies this requirement. The uncertain environment in which the system operates complicates the reasoning process, making it challenging to consider all possible behaviors and provide formal guarantees. We propose that, rather than striving to address every possible scenario or offering formal guarantees through a white-box model, the use of a black-box model in conjunction with sampling-based methods can be effective. For uncertainties arising from various environmental conditions as well as stochastic parameters in the system design, we demonstrate that statistical verification can help effectively quantify these uncertainties. However, we note that it is often impossible to eliminate all sources of uncertainty at design time. We show that runtime verification techniques can help reason about stochasticity using predictive techniques and statistical analysis methods. A crucial element of formal reasoning is the use of mathematically unambiguous specification languages to articulate desirable/anomalous system behaviors. We show that our proposed methods can reuse analysis results by leveraging quantitative satisfaction semantics of specification languages like Signal Temporal Logic. This approach enables reasoning about the safety of unseen scenarios without the need for actual simulation or execution of the system. xiii Most specification languages are not robust to sensor noise. Enhancing resistance to noise and facilitating the analysis of properties related to shape information becomes crucial. We propose that employing a novel specification language could be more effective in addressing these challenges. Since these stages all involve a similar approach of black-box modeling with properties related to specification languages, we anticipate that future work could integrate the results from various stages of this thesis. This integration would allow for the sharing and reuse of findings at each stage, thereby enhancing the analysis of system safety and improving the scalability of the reasoning process. xiv Chapter 1 Introduction Cyber-physical systems (CPS) integrate both software and hardware components, functioning in uncertain environments and often having stringent safety requirements. These systems have applications in diverse fields including automotive, avionics, healthcare, robotics, industrial automation, and power grids and distribution. Such cyber-physical systems are often safety-critical; any failures of such systems can cause harm to human lives or property. Thus, research has focused on the safety analysis of individual components within CPS and there is increasing attention on synthesizing these individual analyses to develop a comprehensive safety assessment for the entire system. There are two prevalent approaches to modeling CPS applications: the white-box approach and the black-box approach. The white-box approach assumes the availability of a symbolic representation of the dynamics of the environment and physical processes in the system, as well as access to the source code of the cyber components (e.g., the control software). The black-box modeling approach, on the other hand, only assumes the availability of a simulator that can be used to sample behaviors (i.e., trajectories) of the system or access to a dataset containing system behaviors. While white-box methods can offer strong deterministic guarantees, they require such symbolic models and access to source code, which pose several challenges: a modeling effort has to balance model fidelity with simulation speed, models need to be validated, symbolic models may be highly nonlinear and software 1 code may contain features that make it difficult to analyze. Black-box methods, in contrast, do not require symbolic representations of the physical or cyber components but may not be able to provide equally strong guarantees. The fundamental hypothesis of this dissertation is that, by combining expressive specification languages, powerful analysis methods from statistics, formal verification, and machine learning/optimization, we can provide probabilistic guarantees on temporal behaviors of even black-box systems. We begin by examining how to quantify uncertainties arising from various sources of disturbance, such as initial states [205, 204] and runtime disturbances [167]. Next, we explore how to reuse analysis results to avoid exhaustively enumerating or triggering all possible system behaviors [203] while still covering rare events [201]. Finally, we investigate the invention of new specification languages [189, 40] as a means to enhance system analysis and provide valuable properties. The technique of the thesis centers around three main pillars: 1.1 Verification Factors such as the complexity and stochasticity of operating environments, the curse of dimensionality, and the nonlinearity of dynamics pose significant scalability challenges for verification procedures. 1.1.1 Statistical verification Uncertainty in safety-critical cyber-physical systems can be modeled using a finite number of parameters or parameterized input signals. Signal Temporal Logic is a popular formalism that has been widely used to express safety specifications for many CPS applications. Given a system specification in Signal Temporal Logic (STL), we would like to verify that for all (infinite) values of the model parameters/input signals, the system satisfies its specification. Unfortunately, this problem is undecidable in general. Statistical model checking (SMC) offers a solution approach to provide guarantees on the correctness of CPS 2 models by statistical reasoning on model simulations. We propose a new statistical verification approach based on uncertainty quantification using conformal prediction. We assume that the user provides a distribution on the sources of stochastic uncertainty in the model (e.g., initial states, model parameters, etc). Our technique then provides probabilistic guarantees on the model satisfying a given STL property. Our technique uses model simulations to learn surrogate models, that are learned using standard data-driven learning techniques, which are then combined with conformal inference/prediction. Additionally, we can provide prediction intervals containing the quantitative satisfaction values of the given STL property for any user-specified confidence level. We also propose a refinement procedure based on Gaussian Process (GP)-based surrogate models for obtaining fine-grained probabilistic guarantees over sub-regions in the parameter space. This in turn enables the CPS designer to choose assured validity domains in the parameter space for safety-critical applications. We demonstrate the efficacy of our technique on several CPS models. A different approach to uncertainty quantification is based on risk estimation procedures drawn from the area of quantitative financial analysis. We also use risk estimation techniques to provide probabilistic guarantees, and empirically compare the two types of techniques. 1.1.2 Runtime Verification To address the uncertainty encountered during runtime, we formulate methods to predict failures in cyberphysical systems during their operation. Particularly, we consider stochastic systems and Signal Temporal Logic specifications, and we aspire to calculate the probability that the current system trajectory may violate the specification at a future time step. We present two predictive runtime verification algorithms that predict future violations of the specification from the current observed system trajectory. Our first algorithm directly constructs a prediction region for the satisfaction measure of the specification so that we can predict specification violations with a desired confidence. The second algorithm constructs prediction regions for future system states first, and uses these to obtain a prediction region for the satisfaction 3 measure. We yet again leverage conformal prediction to provide probabilistic guarantees on the failure predictions. To the best of our knowledge, these are the first formal guarantees for a predictive runtime verification algorithm, while being computationally simple and making no assumptions on the underlying distribution. Our numerical experiments on an F-16 aircraft model and a self-driving car simulation model provide an empirical evaluation of the methods. 1.2 Testing 1.2.1 Conformance Testing The notion of conformance tries to measure how close two systems are to each other. There are numerous notions of conformance such as IOCO [265], and conformance measures based on distances between system trajectories [84, 2]. Conformance can capture the distance between design models and their real implementations and thus aid in robust system design. However, previous notions of conformance target deterministic systems. We argue that probabilistic reasoning over the distribution of distances between model trajectories is a good measure for stochastic conformance [203]. Additionally, we propose the nonconformance risk to reason about the risk of stochastic systems not being conformant. We show that both notions have the desirable transference property, meaning that conformant systems satisfy similar system specifications, i.e., if the first model satisfies a desirable specification, the second model will satisfy (nearly) the same specification. Lastly, we propose how stochastic conformance and non-conformance risk can be estimated from data using statistical tools such as conformal prediction. We present empirical evaluations of our method on an F-16 aircraft model, an autonomous vehicle simulation model, a spacecraft simulation model, and Dubin’s vehicle simulation model. 4 1.2.2 Robust Testing Sampling-based methods for verification could overlook rare events. Hence, it is important to consider techniques that can identify adversarial scenarios that can trigger rare failures of the system. We specifically propose a robust testing framework [201] for cyber-physical systems (CPS) that operate in uncertain environments. Our framework is based on the use of reinforcement learning. We model the environment of a SUT as a collection of agents, and synthesize adversarial policies for these agents that guarantee that the environment satisfies user-defined constraints while the SUT fails to satisfy its specification. In our framework, the test generation tool can provide meaningful and challenging tests even when there are small changes to the SUT. Such a method can be quite valuable in incremental design methods where small changes to the design do not necessitate expensive test generation from scratch. We demonstrate the efficacy of our method on three example systems in autonomous driving implemented within a photo-realistic autonomous driving simulator. 1.3 Specification Languages There have been efforts to use temporal logic to detect similar shape information in signals. However, this process is challenging, as different techniques usually can only handle changes in shape according to either time or space, but not both. 1.3.1 Shape Expressions Detecting the occurrence of continuous signal patterns, such as pulses and ramps, in simulation or execution traces is a common yet tedious and error-prone task in systems engineering. It is usually performed either by “eyeballing” the traces or by ad-hoc signal processing scripts that may or may not capture the behavior under consideration. We provide a simple pattern matching language [189] that allows to define arbitrary parameterized shapes and automatically detect their occurrences in noisy data. Basic signal 5 shapes consist of linear, exponential, or sine signal segments parameterized by their offset, slope, rate, frequency, phase and/or magnitude. They can be composed into complex shape expressions by using union, concatenation, Kleene star and parameter constraints. We define two feature extraction problems: (i) find a decomposition of a complete signal according a given shape expression; (ii) find subsegments of a signal matching a given shape expression. Idealized shapes typically do not occur in pure form in a given signal; hence, we consider the problem of extracting the shapes whose parameter instantiations best fit the given signal, and proceed by multiple (linear, exponential, sine) regressions of basic shapes. We provide an online algorithm to solve the shape decomposition and matching problems by introducing shape automata, which are weighted automata with real-valued parameters, whose costs are defined according to the regression of basic shapes. 1.3.2 Mining Shape Expressions We further demonstrate that the newly proposed specification formalism preserves some advantageous properties similar to other languages, such as the ability to be mined from data. This means the specification can be automatically derived from data without human input. We propose a novel method for mining a broad and interesting fragment of Shape Expressions from time-series data using a combination of techniques from linear regression, unsupervised clustering and learning finite automata from positive examples. The learned SE for a given dataset provides an explainable and intuitive model of the observed system behavior. We demonstrate the applicability of our approach in two case studies and experimentally evaluate the implemented specification mining procedure. 6 Chapter 2 Preliminaries In this chapter, we introduce definitions, notations, operations, and background results which will be used throughout the dissertation. Definition 2.0.1 (Signals, Black-box Models). We define a signal∗ or a trajectory ξ as a function from a finite set dom ⊆ [0, T] for some T ∈ R ≥0 to a compact set of values X . The signal value at time t is denoted as ξ(t). A parameter space Θ is some compact subset of R k . A model M is a function that maps a parameter value θ ∈ Θ to an output signal ξθ. We note that the above definition permits parameterized input signalsfor the model. We can define such signals using a function known as a signal generator that maps specific parameter values to signals. For example, a piecewise linear signal containing k linear segments can be described using k + 1 parameters, k corresponding to the starting point for each segment and 1 for the end-point of the final segment. We assume that θ ∈ Θ is a random variable that follows a (truncated) distribution Dθ with probability density function (PDF) f(θ) and ∀θ /∈ Θ, f(θ) = 0. If we only wish to draw samples from a subset S ⊆ Θ ∗Conventionally, signals are defined over continuous-time; however, in a practical setting, such as in a simulator, we only obtain signal values at a finite set of time intervals. In such a case, it is common to assume that the underlying continuous-time signal can be recovered using an appropriate interpolation scheme. The discrete-time vs. continuous-time interpretation is not of particular importance in this thesis. The only bearing it has is on the satisfaction of a given STL formula – we use point-wise semantics of STL over piecewise constant interpolated signals that produce identical satisfaction values for continuous/discretetime signals. 7 (by dropping samples from Θ \ S), the corresponding distribution of the samples is denoted by Dθ ↓ S and follows the PDF shown below. f ′ (θ) = f(θ) R τ∈S f(τ)dτ if θ ∈ S 0 otherwise. (2.1) Instead of closed form descriptions of the generator for ξθ (e.g. differential or difference equations), we assume that there is a simulator that can generate signals compatible with the semantics of the model M. Definition 2.0.2. A simulator for a (deterministic) set Ξ of trajectories is a function (or a program) sim that takes as input a parameter θ ∈ Θ, and a finite sequence of time points t0, . . . , tk, and returns the signal (t0,sim(θ, t0), . . ., tk,sim(θ, tk)), where for each i ∈ {0, . . . , k}, sim(θ, ti) = ξθ(ti). In rest of the thesis, unless otherwise specified, we ignore the distinction between the signals sim(θ, ·) and ξθ. 2.1 Signal Temporal Logic Signal Temporal Logic [179] is a popular formalism that has been widely used to express safety specifications for many CPS applications. STL formulas are defined over signal predicates of the form f(ξ) ≥ c or f(ξ) ≤ c, where ξ is a signal and f : R n → R is a real-valued function and c ∈ R. STL formulas are written using the grammar shown in Eq. (2.2). Here, we assume that I = [a, b], where a, b ∈ R ≥0 , a ≤ b, and ∼∈ {≤, ≥}. φ, ψ := true | f(ξ) ∼ c | ¬φ | φ ∧ ψ | φ ∨ ψ | FIφ | GIφ | φ UI ψ (2.2) In the above syntax, F (eventually), G (always), and U (until) are temporal operators. Given t ∈ R ≥0 and I = [a, b], we use t + I to denote [t + a, t + b]. Given a signal ξ and a time t, we use (ξ, t) |= φ 8 to denote that ξ satisfies φ at time t, and ξ |= φ as short-hand for (ξ, 0) |= φ. The Boolean satisfaction semantics of an STL formula can be recursively in terms of the satisfaction of its subformulas over the a signal. For ∼∈ {≤, ≥}, ξ: (ξ, t) |= f(ξ) ∼ c if f(ξ(t)) ∼ c is true. The semantics of the Boolean operators for negation (¬), conjunction (∧) and disjunction (∨) can be obtained in the usual fashion by applying the operator to the Boolean satisfaction of its operand(s). The value of (ξ, t) |= FIφ is true iff ∃t ′ ∈ t + I s.t. (ξ, t′ ) |= φ, while (ξ, t) |= GIφ iff ∀t ′ ∈ t + I, (ξ, t′ ) |= φ. The formula φUIψ is satisfied at time t if there exists a time t ′ ≥ t s.t. ψ is true, and for all t ′′ ∈ [t, t′ ], φ is true. STL is also equipped with quantitative semantics that define the robust satisfaction value or robustness – a function mapping a formula φ and the signal ξ to a real number [97, 90]. Informally, robustness can be viewed as a degree of satisfaction of an STL formula φ. While many competing definitions for robust satisfaction value exist [10, 213, 137], we use the original definitions [90] in this paper. Definition 2.1.1. The robustness value is a function ρ mapping φ, the trajectory ξ, and a time t ∈ ξ.dom as follows: ρ(f(ξ) ≥ c, ξ, t) = f(ξ(t)) − c ρ(¬φ, ξ, t) = −ρ(φ, ξ, t) ρ(φ ∧ ψ, ξ, t) = min(ρ(φ, ξ, t), ρ(ψ, ξ, t)) ρ(φ UI ψ) = sup t1∈t+I min(ρ(ψ, ξ, t1), inf t2∈[t,t1) ρ(φ, ξ, t2)) The robustness values for other Boolean and temporal operators can be derived from the above definition; for example, GIφ and FIφ are a special case of the semantics for until (UI ) respectively evaluating to the minimum and maximum of the robustness of φ over the interval I. Example 2.1.2. Consider the time-reversed van Der Pol oscillator specified as x˙1 = −x2, x˙2 = 4(x 2 1−1)x2+ x1. Figure 2.1 illustrates the satisfaction (indicated in blue) and violation (indicated in red) of two example specifications by x1(t): (a) φ1 specifies that for any time t ∈ [0, 10], the value of the trajectory x(t) should be less than 0.5 and (b) φ2 specifies that from some time within the first 2 time units, x(t) settles in the region [−0.3, 0.3] for 8 time units. 9 å ! " (a) φ1 = G[0,10](x(t) < 0.5) #$ #% " (b) φ2 = F[0,2]G[0,8](∥x(t)∥ < 0.3) Figure 2.1: Trajectories satisfying/violating STL formulas 2.2 Conformal Prediction Conformal prediction was introduced in [258, 231] to obtain valid prediction regions for complex prediction algorithms, i.e., neural networks, without making assumptions on the underlying distribution or the prediction algorithm [19, 103, 160, 242, 64]. Let R(0), . . . , R(k) be k + 1 independent and identically distributed random variables. The variable R(i) is usually referred to as the nonconformity score. In supervised learning, it may be defined as R(i) := ∥Y (i) −µ(X(i) )∥ where the predictor µ attempts to predict an output Y (i) based on an input X(i) . A large nonconformity score indicates a poor predictive model. Our goal is to obtain a prediction region for R(0) based on R(1), . . . , R(k) , i.e., the random variable R(0) should be contained within the prediction region with high probability. Formally, given a failure probability ϵ ∈ (0, 1), we want to construct a valid prediction region C that depends on R(1), . . . , R(k) such that P(R (0) ≤ C) ≥ 1 − ϵ. As C depends on R(1), . . . , R(k) , the probability measure P is defined over the product measure of R(0), . . . , R(k) . This is an important observation as conformal prediction guarantees marginal coverage 10 but not conditional coverage, see [19] for a detailed discussion. By a surprisingly simple quantile argument, see [242, Lemma 1], one can obtain C to be the (1−ϵ)th quantile of the empirical distribution of the values R(1), . . . , R(k) and ∞. By assuming that R(1), . . . , R(k) are sorted in non-decreasing order, and by adding R(k+1) := ∞, we can equivalently obtain C := R(p) where p := ⌈(k+ 1)(1−ϵ)⌉, i.e., C is the pth smallest nonconformity score. 2.3 Risk Measures A risk measure is a function Risk : F(Ω, R) → R that maps the set of real-valued random variables F(Ω, R)to the real numbers. Typically, the input of Risk indicates a cost. There exist various risk measures that capture different characteristic of the distribution of the cost random variable, such as the mean or the variance. However, we are particularly interested in tail risk measures that capture the right tail of the cost distribution, i.e., the potentially rare but costly outcomes. In this thesis, we particularly consider the value-at-risk V aRβ and the conditional value-at-risk CV aRβ at risk level β ∈ (0, 1). The V aRβ of a random variable Z : Ω → R is defined as V aRβ(Z) := inf{α ∈ R|Prob(Z ≤ α) ≥ β}, i.e., V aRβ(Z) captures the 1 − β quantile of the distribution of Z from the right. Note that there is an obvious connection between value-at-risk and chance constraints, i.e., it holds that Prob(Z ≤ α) ≥ β is equivalent to V aRβ(Z) ≤ α. The CV aRβ of Z, on the other hand, is defined as CV aRβ(Z) := inf α∈R α + (1 − β) −1E([Z − α] +) where [Z −α] + := max(Z −α, 0) and E(·) indicates the expected value. When the function Prob(Z ≤ α) is continuous (in α), it holds that CV aRβ(Z) = E(Z|Z ≥ V aRβ(Z)), i.e., CV aRβ(Z) is the expected 1 value of Z conditioned on the outcomes where Z is greater or equal than V aRβ(Z). Finally, note that it holds that V aRβ(Z) ≤ CV aRβ(Z), i.e., CV aRβ is more risk sensitive. For our risk transference results that we present later, we will require that Risk is monotone, positive homogeneous, and subadditive: • For two random variables Z, Z′ ∈ F(Ω, R), the risk measure Risk is monotone if Z(ω) ≤ Z ′ (ω) for all ω ∈ Ω implies that Risk(Z) ≤ Risk(Z ′ ). • For a random variable Z ∈ F(Ω, R), the risk measure Risk is positive homogeneous if, for any constant K ≥ 0, it holds that Risk(KZ) = KRisk(Z). • For two random variables Z, Z′ ∈ F(Ω, R), the risk measure R is subadditive if Risk(Z + Z ′ ) ≤ Risk(Z) + Risk(Z ′ ). We remark that the V aRβ and the CV aRβ satisfies all three properties [176]. 12 Chapter 3 Statistical Verification using Surrogate Models and Conformal Inference and a Comparison with Risk-aware Verification 3.1 Introduction Most cyber-physical systems are highly complex systems with nonlinear behaviors that operate in uncertain operating environments. As these systems are often safety-critical, it is desirable to obtain strong assurances on their safe operation. To achieve this goal, recent research has been focused on effective and sound verification algorithms [94, 134, 144, 238, 244, 95, 130], and scalable best-effort approaches which lack explicit coverage guarantees [270]. However, factors like complexity and stochasticity of the operating environments, curse of dimensionality, the nonlinearity of dynamics pose a significant scalability challenge for verification procedures. In this chapter, we address the problem of analyzing the effects of uncertainty in the environment on the correctness of a given CPS model M. We assume that the uncertainty in the environment is modeled as a parameter vector (θ) that takes values from some set Θ, distributed according to some user-provided distribution DΘ. Such a parameter vector could also include time-varying parameters (representing time-discretized input signals). For a sample of θ, we assume that the output trajectories of the model (denoted ξθ) are deterministic, i.e. the model is free of any internal stochastic behavior. 13 We assume that the correctness of the given CPS model is expressed using a real-valued function of its input/output trajectories. In many of our examples, we assume this function to be the robust satisfaction value or robustness of a given Signal Temporal Logic (STL) [178]. Given a formula φ and a trajectory x(t), the robustness ρ(φ, x) approximates the degree of satisfaction of φ by x [90, 97]. We are primarily interested in building a surrogate model µˆ to approximate the joint distribution of θ and ρ(φ,M(θ)), and explore the use of such model to help answer the following specific questions: 1. Given a threshold ϵ, and θ ∼ DΘ, does the probability of the model satisfying a given STL property φ exceed 1 − ϵ? (θ ∼ DΘ) ? =⇒ P(M(θ) |= φ) ≥ 1 − ϵ (3.1) 2. For some user-provided threshold ϵ, and θ ∼ DΘ, can we find an interval [ℓ, u] s.t. the probability that the robustness value of a model behavior M(θ) w.r.t. a given STL property φ lies in [ℓ, u] greater than 1 − ϵ? I.e., θ ∼ DΘ =⇒ P(ρ(φ,M(θ)) ∈ [ℓ, u]) > 1 − ϵ (3.2) Statistical model checking (SMC) [159, 9, 285, 278, 217, 230] approaches have been used in the past to establish the above two assertions. The most popular SMC methods use statistical hypothesis testing procedures to check whether the hypothesis that (3.1) and (3.2) are true can be accepted with confidence exceeding user-specified thresholds α, β for respectively committing a type I error (i.e. rejecting the hypothesis when it is true) or a type II error (i.e. accepting the hypothesis when it is not true). SMC methods provide the user with conditions on the number of simulations required, α, β and ϵ in order to accept or reject the hypotheses. Unlike SMC that requires a certain number of samples to answer the above questions, our use of surrogate models can establish the assertions without the requirements for sampling numbers, which will be explained later. Furthermore, our use of surrogate models can help automatically provide a new distribution of θ ∼ D′ Θ, such that (3.2) holds true. 14 Approach. To establish assertions such as (3.1) or (3.2), we present an approach based on conformal inference, a technique for giving confidence intervals with marginal coverage guarantees. A unique feature of our technique is that it does not make any assumptions on the user-provided distribution on the parameter space or the dynamics represented by the model. While existing techniques based on uncertainty quantification using Gaussian Process based surrogate models assume that the joint distribution of sampled parameter values and target robustness values have a Gaussian distribution [196]. SMC techniques although make no assumption on the input distribution, assume that the Boolean outcomes of successive runs of a given program have a binomial distribution, and probabilistic guarantees are based on analyzing properties of the binomial distribution. The overview of our approach is shown in Fig. 8.2. The first step of our approach is to learn a surrogate model – this can be thought of as a statistical data-driven approximation of the black-box model. For this step, we sample N parameter values from Θ, obtain their corresponding trajectories (ξθ) and then compute the robust satisfaction value ρ(φ, ξθ) for each trajectory. We then partition this set into a training set and a test set. We then use an off-the-shelf regression technique on the training set treating the sampled parameters in the training set as inputs and the robustness values as outputs of the learned regressor. Here, we can use any parametric technique such as polynomial regression, or a non-parametric technique based on Gaussian process regression or neural networks [49]. Due to the generalizability inherent in regression, the surrogate model can predict the robustness value for all parameter values in Θ. However, most good regression techniques avoid over-fitting to the data, and hence will result in some residual error (i.e. between the predicted values and the actual values). Conformal inference is a technique that can leverage these residual errors to give confidence intervals on predicted values. The main idea in conformal inference is: for any given threshold 1 − ϵ, there is a systematic way to find a prediction interval [ˆµ(θ) − d, µˆ(θ) + d] where the answer must lie with probability greater than 1−ϵ. We couple this idea with a global optimizer to obtain confidence intervals for regions in the parameter 15 CPS Model M(θ) Param, Region Θ, Distribution Dθ STL Property φ θ, ρ (Robustness) DB Train Test Conformal Inference Threshold α Predictive Surrogate model µˆ(θ) Θ is δ-robustly safe/unsafe with prob. (1 − ϵ) OR Θ is unknown Figure 3.1: Overview of our approach. space rather than individual parameter values. This allows us to provide the guarantee specified in Eq. (3.3), where vmin and vmax are respectively under- and overapproximations of the predicted robustness over the region Θ. θ ∈ Θ =⇒ P (ρ(φ, ξθ) ∈ [vmin − d, vmax + d]) ≥ 1 − ϵ. (3.3) A strictly positive or strictly negative interval indicates that the Θ is respectively safe or unsafe. However, if the interval contains 0, then the status of Θ remains unknown. The above procedure naturally yields a refinement procedure which allows us to start with a larger region in the parameter space, and split it into smaller regions if the region is deemed unknown. In a smaller region, the accuracy of the surrogate model improves (due to more data in a smaller region), and hence previously inconclusive regions can be resolved as safe/unsafe. A naïve version of this splitting algorithm faces the curse of dimensionality – if the parameter space is high-dimensional then the branch-and-bound procedure ends up creating too many branches which can make the procedure intractable. The naïve splitting algorithm crucially does not use any smart heuristic to decide where a region should be split, and this can lead to unnecessary exploration of larger regions that can be (statistically) verified as safe or unsafe. To overcome this shortcoming, we propose the use of Gaussian Process (GP) [210] regression models. There is rich literature on the use of GP-based models for black-box function optimization, 16 and a key idea therein is to explicitly encode sample uncertainty as an optimization objective. This allows GP-based methods to trade off between exploitation (searching in the vicinity of a local minimum) and exploration (searching in neighborhoods with high sample uncertainty). We propose to use a similar heuristic that adaptively splits regions based on sample uncertainty. Our method can scale to CPS models encoding complex dynamics and large state spaces, as well as reasonably large parameter spaces. The results of our method can be used to characterize safe operating regions in the parameter space, and to build (probabilistic) safety assurance cases. With respect to analysis times, our method compares favorably with approaches based on statistical model checking (SMC). A key difference is that unlike SMC and PAC-based methods, the first step in our method is to construct a surrogate model. This crucially allows us to provided a guarantee that is not a function of the number of samples. In fact, our method can potentially provide the needed level of probabilistic guarantees with any number of samples. This is because we build a surrogate model from samples; if the surrogate model is of poor accuracy due to a limited number of samples, conformal inference will predict a wider prediction interval with the same probability 1 − ϵ, while for a more accurate model, the prediction interval will be narrower. Thus, conformal inference allows a trade-off between sample complexity and the tightness of the guarantee independent of the level of the guarantee itself. Finally, while SMC-based methods focus solely on the problem of probabilistic verification, our method can enable other model-based analyses: (1) we can give probabilistic safe regions in the parameter space; for example, these can be used to define high-confidence operating regimes for the model, (2) our technique can be used for statistical debugging approaches such as [81, 41], and (3) we can extend our technique to identify parameter sensitivity by combining the core regression procedure with dimensionality reduction techniques such as principal component analysis [49]. While conformal prediction has emerged as an important statistical technique to provide probabilistic guarantees, an important concurrent development is the work on scenario-based verification using the 17 notion of risk measures [12]. Measures such as value-at-risk and conditional-value-at-risk allow quantifying the risk of the given CPS application failing a particular quantitative specification (e.g. specified as an STL formula). In this chapter, we empirically compare the probabilistic guarantees obtained using the risk estimation formulation with those obtained using conformal prediction. To summarize, the main contributions of this chapter are: 1. A technique based on surrogate models to approximate the robustness of a given specification learned using off-the-shelf regression techniques. 2. A new technique for generating prediction intervals for the robustness of a specification with userspecified probabilistic thresholds. 3. Algorithms to partition the parameter space of a model into safe, unsafe and unknown regions based on conformal inference on the surrogate models. 4. Experimental validation on CPS models demonstrating the real-world applicability of our methods. 5. Empirical comparison with methods to provide probabilistic guarantees based on estimating tail risk measures. 3.2 Learning Surrogate Models In this section we discuss learning of surrogate models for a given black-box model M. A surrogate model is essentially a quantitative abstraction of the original black-box model. Quantitative abstractions have been explored in the theory of weighted transition systems (WTS) [66]. A WTS is a transition system where every transition is associated with weights, and a quantitative property of the WTS maps sequences of states of the WTS to a real number computed using some arithmetic operations on the weights. Quantitative abstractions focus on sound proofs for quantitative properties. We observe that we can view the robustness of an STL property as a quantitative property evaluated on the system trajectory. We introduce two new notions of quantitative abstractions defined on the trajectories of a system. 18 Definition 3.2.1 (δ-surrogate model). Let ξθ be the trajectory obtained by simulating the M with the parameter θ, where θ ∈ Θ. Let γ be a quantitative property on ξθ, i.e. γ maps ξθ to a real number. We say that a model µˆ that maps θ to a real number is an δ-distance-preserving quantitative abstraction or an δ-surrogate model of M and γ if ∀θ ∈ Θ : |γ(ξθ) − µˆ(θ)| ≤ δ (3.4) Essentially, the δ-surrogate model guarantees that the value of the quantitative property γ evaluated on ξθ (obtained from the original model M) is no more than δ away from the value that it predicts. The idea is that the δ-surrogate model could be systematically derived from the original model, and could be significantly simpler than the original model making it amenable to formal analysis. For example, if we have an δ-surrogate model, then we can prove that a given property holds by systematically sampling the parameter space Θ. In general, such models could be hard to obtain; hence, we propose a probabilistic relaxation known as the (δ, ϵ)-probabilistic surrogate model, where condition (3.4) is replaced by (3.5). Definition 3.2.2 ((δ, ϵ)-probabilistic surrogate model). Given a model M, a quantitative property γ, and a user-specified bound ϵ ∈ [0, 1), we say that µˆ is a (δ, ϵ)-probabilistic surrogate model if: P (|γ(ξθ) − µˆ(θ)| ≤ δ | θ ∼ Dθ) ≥ 1 − ϵ (3.5) We now explain how we can obtain (δ, ϵ)-probabilistic surrogate models for an arbitrary quantitative property γ. The basic idea is to use statistical learning techniques: we sample Θ in accordance with the distribution Dθ to obtain a finite set of parameter values Θb. For each θi ∈ Θb, we simulate the model to obtain ξθi and compute γ(ξθi ). We then compute the surrogate model µˆ using parametric regression 19 models (e.g. linear, polynomial functions) or nonparametric regression methods (e.g. neural networks and Gaussian Processes) [104, 34, 210]. We now briefly review some of these regression methods, and in Section 3.3 explain how we can obtain δ values for a user-provided bound ϵ. Polynomial Regression. Polynomial regression assumes a polynomial relationship between independent variables X and the dependent variable Y . It aims to fit a polynomial curve to the input and output data in a way that minimizes a suitable loss function. A commonly used loss function is the least square error (or the sum of squares of residuals). Typically, a polynomial regression requires the user to specify the degree of the polynomial to use. Polynomial regression generally has high tolerance to the function’s curvature level, but has high sensitivity to the outliers. In our experiments, we restrict the polynomial degree to 2. Neural Network Regression. Neural networks [49] offer a high degree of flexibility for regressing arbitrary nonlinear functions. While there are many different NN architectures, we use a simple multi-layer perceptron model with a stochastic gradient-based optimizer. This model simply updates its parameters based on iterative steps along the partial derivatives of the loss function. Gaussian Process based Regression Model [210]. A Gaussian Process (GP) is a stochastic process, i.e., it is a collection of random variables Wθ indexed by θ, where θ ranges over some discrete or dense set. The key property of GP is that any finite sub-collection of these random variables has a multi-variate Gaussian distribution. GP models are popular as non-parametric regression methods used for approximating arbitrary continuous functions with the appropriate kernel functions. A GP can be used to express a prior distribution on the space of functions, e.g. from a domain R n to R. Let F : R n → R be a random function. Then, we say that F is a centered Gaussian process with kernel k, if for every (x1, . . . , xn) ∈ R n , there exists a positive semi-definite matrix Σ such that [F(x1), . . . , F(xn)] ∼ (0, Σ). The (i, j) th entry of Σ, i.e. Σij = k(xi , xj ) for some kernel function k. The matrix Σ is called the covariance matrix, and the function 20 k measures the joint variability of xi and xj . There are several kernel functions that are popular in literature: the squared exponential kernel, the 5/2 Matérn kernel, etc. In our experiments, we use a sum kernel function that is the addition of a dot product kernel and a white noise kernel (explained in Section 3.3.4). 3.3 Conformal Inference Conformal inference [160, 161] is a framework to quantify the accuracy of predictions in a regression framework [258]. It can provide guarantees using a finite number of samples, without making assumptions on the distribution of data used for regression or the technique used for regression. We explain the basic idea of conformal inference, and then explain how we adapt it to our problem setting. 3.3.1 Conformal Inference Recap Consider i.i.d. regression data Z1, · · · , Zm drawn from an arbitrary joint DXY , where each Zi = (Xi , Yi) is a random variable in R n × R, consisting of n-dimensional feature vectors Xi and a response variable Yi . Suppose we fit a surrogate model to the data, and we now wish to use this model to predict a new response Ym+1 for a new feature value Xm+1, with no assumptions on DXY . Formally, given a positive value α ∈ (0, 1), conformal inference constructs a prediction band B ⊆ R n ×R based on Z1, · · · , Zn with property (3.6). P(Ym+1 ∈ B(Xm+1)) ≥ 1 − α. (3.6) Here, the probability is over m + 1 i.i.d. draws Z1, · · · , Zm+1 ∼ DXY , and for a point x ∈ R n we denote B(x) = {y ∈ R : (x, y) ∈ B}. The parameter α is called the miscoverage level and 1 − α is called the probability threshold. Let µ(x) = E(Y | X = x), x ∈ R n 21 denote the regression function, where E(W) denotes the expected value of the random variable W. The regression problem is to estimate such a conditional mean of the test response Ym+1 given the test feature Xm+1 = x. Common regression methods use a regression model g(x, η) and minimize the sum of squared residuals of such model on the m training regression data Z1, · · · , Zm, where η are the parameters of the regression model. An estimator for µ is given by µˆ(x) = g(x, ηˆ), where ηˆ = arg min η 1 m Xm i=1 (Yi − g(Xi , η))2 + R(η) and R(η) is a regularizer. In [160], the authors provide a technique called split conformal prediction that we use to construct prediction intervals that satisfy the finite-sample guarantees as in Equation (3.6). The procedure is described in Algorithm 1 as a function ConfInt which takes as input the i.i.d. training data {(Xi , Yi)} m i=1, miscoverage level α and any regression algorithm Reg. Algorithm 1 begins by splitting the training data into two equal-sized disjoint subsets. Then a regression estimator µˆ is fit to the training set {(Xi , Yi)} : i ∈ I1) using the regression algorithm Reg (Line 2). Then the algorithm computes the absolute residuals Ris on the test set {(Xi , Yi)} : i ∈ I2) (Line 3). For the desired probability threshold α ∈ [0, 1), the algorithm sorts the residuals in ascending order {Ri : i ∈ I2} and finds the residual at the position given by the expression: ⌈(n/2 + 1)(1 − α)⌉. This residual is used as the confidence range d. In [160], the authors prove that the prediction interval at a new point Xm+1 is given by such µˆ and d that Theorem 3.3.1 is valid. Algorithm 1: Conformal regression algorithm ConfInt({(Xi , Yi)} m i=1, α, Reg) input : Data {(Xi , Yi)} m i=1, miscoverage level α, regression algorithm Reg output : Regression estimator µˆ, confidence range c 1 Randomly split {1, · · · , m} into two equal-sized subsets I1, I2 2 µˆ = Reg((Xi , Yi) : i ∈ I1) 3 Ri = |Yi − µˆ(Xi)|, i ∈ I2 4 c = the kth smallest value in {Ri : i ∈ I2}, where k = ⌈(m/2 + 1)(1 − α)⌉ 5 return µ, c ˆ 22 Theorem 3.3.1 (Theorem 2.1 in [160]). If (Xi , Yi), i = 1, · · · , m are i.i.d., then for an new i.i.d. draw (Xm+1, Ym+1), using µˆ and d constructed in Algorithm 1, we have that P(Ym+1 ∈ [ˆµ(Xm+1)−c, µˆ(Xm+1) + c]) ≥ 1 − α. Moreover, if we additionally assume that the residuals {Ri : i ∈ I2} have a continuous joint distribution, then P(Ym+1 ∈ [ˆµ(Xm+1) − c, µˆ(Xm+1) + c]) ≤ 1 − α + 2 m+2 . □ Generally speaking, as we improve our surrogate model µˆ of the underlying regression function µ, the resulting conformal prediction interval decreases in length. Intuitively, this happens because a more accurate µˆ leads to smaller residuals (or ϵ in Section 3.2), and conformal intervals are essentially defined by the quantiles of the (augmented) residual distribution. Note that Theorem 3.3.1 asserts marginal coverage guarantees, which should be distinguished with the conditional coverage guarantee P(Ym+1 ∈ B(x) | Xm+1 = x) ≥ 1 − α for all x ∈ R n . The latter one is a much stronger property and hard to be achieved without assumptions on DXY . 3.3.2 Computing (c, ϵ)-probabilistic surrogate models We assume that the parameter value θ and ρ(φ, ξθ) follow a joint (unknown) distribution Dθ,ρ(φ) that we wish to empirically estimate. As indicated in Section 3.2, the first step to learning a (δ, ϵ)-probabilistic surrogate model is based on sampling Dθ,ρ(φ) and applying regression methods. We draw m i.i.d samples Θ = b {θ1, · · · , θm} from Dθ and compute the robustness values ρi = ρ(φ, ξθi ) for each model trajectory corresponding to the parameter θi . Lemma 3.3.2 follows from Theorem 3.3.1. Lemma 3.3.2. Let (ˆµ, c) = ConfInt({θi , ρi}, ϵ, Reg), where ConfInt is as defined in Algorithm 1, 1 − ϵ is a user-provided probability threshold, Reg is some regression algorithm, and c ∈ R, then µˆ is a (c, ϵ)- probabilistic surrogate model. We now show how we can use (c, ϵ)-probabilistic surrogate models to perform statistical verification. Theorem 3.3.3 shows that the confidence range returned by the conformal inference procedure can be extended over the entire parameter space. 23 Theorem 3.3.3. Let 1. (θi , ρi), i = 1, · · · , m be i.i.d. samples drawn from the joint distribution Dθ,ρ(φ) of θ ∈ Θ and ρ(φ, ξθ), 2. Reg be a regression algorithm, 3. 1 − ϵ be a user-provided probability threshold, 4. (ˆµ, c) = ConfInt({θi , ρi}, ϵ, Reg), i.e. µˆ is the surrogate model and c is the confidence range returned by Algorithm 1, 5. v ∗ max = maxθ∈Θ µˆ(θ), and, v ∗ min = minθ∈Θ µˆ(θ). Then, P (ρ(φ, ξθ) ∈ [v ∗ min − c, v∗ max + c] | θ ∼ Dθ) ≥ 1 − ϵ (3.7) Proof. From Theorem 3.3.1, we know that any new i.i.d. sample (θ ′ , ρ(φ, ξθ ′)) from Dθ,ρ(φ) satisfies: P(ρ(φ, ξθ ′) ∈ [ˆµ(θ ′ ) − c, µˆ(θ ′ ) + c]) ≥ 1 − ϵ. (3.8) By definition, v ∗ min ≤ µˆ(θ ′ ) ≤ v ∗ max. Combining this with Eq. (3.8), we get the desired result. Theorem 3.3.3 requires us to obtain the minimum/maximum values of the surrogate model over a given region in the parameter space. If µˆ(θ) is a non-convex function and the chosen optimization algorithm cannot compute the perfect optimal value v ∗ min or v ∗ max, but can only give conservative estimates of the optimal value, we can update the predicted interval in Theorem 3.3.3 as follows. Corollary 3.3.4. Let vmin and vmax be respectively under- and over-approximations of v ∗ min and v ∗ max, then P (ρ(φ, ξθ) ∈ [vmin − c, vmax + c] | θ ∼ Dθ) ≥ 1 − ϵ (3.9) 24 The bounds vmin and vmax in Corollary 3.3.4 can be computed using global optimization solvers, SMT sovlers, or range analysis tools for neural networks [244, 95] (for neural network regression). We can use the bounds obtained in Theorem 3.3.3 (similarly those with Corollary 3.3.4) to derive probabilistic bounds on the Boolean satisfaction of a given STL property φ, as expressed in Theorem 3.3.5. Theorem 3.3.5. If v ∗ min − c > 0, then PDθ (ξθ |= φ | θ ∈ Θ) ≥ 1 − ϵ. If v ∗ max + c < 0 then PDθ (ξθ ̸|= φ | θ ∈ Θ) ≥ 1 − ϵ. Proof. From [97], we know that ρ(φ, ξθ) > 0 =⇒ ξθ |= φ. Thus, if the lower bound of the prediction interval in Theorem 3.3.3 is positive, then ξθ |= φ. The second case follows by a similar argument. If the first statement in the above theorem holds, we say that Θ safe, if the second statement holds, we say that Θ is unsafe, and if neither statement holds (i.e. the predicted interval contains 0), then we say that Θ is unknown. While Theorem 3.3.5 allows us a way to identify whether a region in the parameter space is is safe (or unsafe), unfortunately there are two challenges: (1) the function mapping θ to ρ(φ,sim(θ)) is a highly nonlinear function in general, and a priori choice for a regression algorithm Reg that fits this function with small residual values may be difficult, (2) if there is large variation in the value of the regression function over Θ, it is likely that the conformal interval contains 0, thereby marking Θ as unknown. To circumvent this issue, one solution is to split the parameter space Θ into smaller regions where it may be possible to get narrow conformal intervals at the same level of probability threshold. We present a naïve algorithm based on parameter-space partitioning next. 3.3.3 Naïve Parameter Space Partitioning We now present an algorithm that uses Theorem 3.3.3 (or Corollary 3.3.4) to provide probabilistic guarantee by recursively splitting the parameter space Θ into smaller regions such that each region can be labeled as safe, unsafe or unknown. The basic idea of this algorithm is to compute the conformal interval using 25 Theorem 3.3.3 and then check if v ∗ min − c < 0 and v ∗ max + c > 0. If yes, we need to partition the region. After partitioning the region, we have to repeat the process of computing the conformal interval for each of the sub-regions. Note that the probability in Theorem 3.3.3 (and Theorem 3.3.1 ) is marginal, being taken over all the i.i.d. samples {θi , ρi} from Dθ,ρ(φ) . Therefore, when we work on each subset S ⊆ Θ after the partitions, we will have to restrict θ to be in S (according to Equation (2.1)) to ensure that the Theorem 3.3.3 is valid. We abuse the notation and denote the joint distribution Dθ,ρ(φ) when θ is restricted to be sampled from S ⊆ Θ by Dθ,ρ(φ) ↓ S. Algorithm 2: Parameter space partition with respect to STL formulas using conformal regression. input : Parameter space Θ and corresponding distribution Dθ, simulator sim and interpolation method to provide sim, miscoverage level α, regression algorithm Reg, an STL formula φ, a vector ∆ output : Parameter set Θ+ that lead to satisfaction of φ, Θ− that lead to violation of φ, and the rest parameter set ΘU that is undecided 1 Θ+, Θ−, ΘU ← ∅, Θr ← {Θ} 2 while Θr ̸= ∅ do 3 S ← Pop(Θr ) 4 θ1, · · · , θm ← IID_Sample(Dθ ↓ S) 5 for i = 1, · · · , m do 6 ρi ← ρ(φ, ξθi ) 7 µ, c ˆ ← ConfInt({(θi , ρi)} m i=1, α, Reg) 8 vmax ← maxθ∈S µˆ(θ), vmin ← minθ∈S µˆ(θ) 9 if vmin − c ≥ 0 then 10 Θ+ ← Θ+ ∪ (S, [vmin − c, vmax + c]) 11 else if vmax + d ≤ 0 then 12 Θ− ← Θ− ∪ (S, [vmin − c, vmax + c]) 13 else if Diameters(S) < ∆Diameters(Θ) then 14 ΘU ← ΘU ∪ (S, [vmin − c, vmax + c]) 15 else 16 Θr .Push(Partition(S, Reg)) 17 return Θ+, Θ−, ΘU Algorithm 2 searches over the parameter space Θ and partitions it to sets Θ+, Θ−, and ΘU , along with the prediction intervals for the robustness values in each set. We first check if the robustness value is strictly positive or negative and accordingly add the region being inspected S into Θ+ or Θ− (Lines 10 and 12). When Algorithm 2 cannot decide whether S belongs to Θ+ or Θ− the interval contains 0, 26 we first check if for all n, the diameter of S along the n th parameter dimension less than the fraction ∆nDiameters(Θ)n. We assume that the vector ∆ is provided by the user. If yes, the region is marked as unknown. Otherwise, we partition S into a number of subregions, that are then added to a worklist of regions (Line 16). In our implementation, in order to keep the number of subsets to be explored bounded, we randomly pick a dimension in the parameter space, and split the parameter space into two equal subsets along that dimension. Note that the partitioning can be accelerated by using parallel computation, but we leave that for future exploration. For each subset S, Algorithm 2 additionally gives the corresponding prediction interval, which indicates how good (or bad) the trajectories satisfy (or violate) φ. Theorem 3.3.6. In Algorithm 1, P(ξθ |= φ | θ ∼ Dθ,ρ(φ) ↓ S S ∈ Θ+) ≥ 1 − ϵ,, and P(ξθ ̸|= φ | θ ∼ Dθ,ρ(φ) ↓ S S ∈ Θ−) ≥ 1 − ϵ. Theorem 3.3.6 directly follows from Theorems 3.3.3 and 3.3.5 and the total probability theorem. 3.3.4 Gaussian Processes for Refinement A drawback of Algorithm 2 is that the naïve splitting procedure is not scalable in high dimensions, and may have poor performance if the safe/unsafe regions have arbitrary shapes. In this section, we instead suggest the use of Gaussian Processes (GP) coupled with Bayesian updates to intelligently partition regions. Recall from Section 3.2, for each parameter value θ, the GP model allows representing the mean µ(θ) and σ 2 (θ) in terms of samples already explored in the parameter space. In a GP model, at sampled parameter values, the variance is zero, but at points that are away from the sampled values, the variance could be high. We now give the symbolic expressions for the mean and variance of a GP model in terms of a kernel function k(θ, θ). A kernel function in GP is a covariance function of the random variables inside GP. It influences the flexibility and capacity of the GP, as well as its ability to generalize to new data points. For ease of exposition let Θˆ denote the vector of parameter values already sampled. Then, Let Y denote the vector of robustness values for parameter values in Θˆ . Then, from [210], Chapter 2, 27 the posterior of the distribution given observed samples we have: µ(θ) = k(θ, Θ) ˆ ⊤ k(Θˆ , Θ) ˆ −1 Y , Σ(θ) = k(θ, θ) − k(θ, Θ) ˆ k(Θˆ , Θ) ˆ −1 k(Θˆ , θ), and σ(θ) = p Σ(θ). The main idea is to use the mean and variance of the GP model to prioritize searching parameter values where: (a) the robustness may be close to zero, (b) the variance of the GP model may be high. These two choices give us two different ways to partition the parameter region that we now explain. In the literature on GP-based Bayesian optimization, there is work on defining acquisition functions that are used as targets for optimization. Examples include UCB (Upper Confidence Bound) acquisition function that is a combination of the mean and variance of the GP, EI (Expected Improvement) which focuses on the expected value of the improvement in the function value etc. Inspired by the UCB function that allows a trade off between exploration and exploitation, we consider two acquisition functions: (1) the first is focused on pure exploration and uses the variance of the GP as the objective for maximization, (2) the second is the difference between the mean and the standard deviation. The rationale for the second function is that if µ(θ) − σ(θ) is lower than 0, then it is an indicator of a low robustness region. Algorithm 3: Parameter Space Partition using GP models Partition(S, Reg) input : Parameter set S, regression algorithm Reg output : parameter sets S1, . . . that will be pushed into the set Θr 1 f(θ) ← acquisition(µ(θ), σ(θ)) 2 θ ′ ← arg maxθ f(θ) 3 return Split(S, θ′ ) Algorithm 3 presents the new Partition function of Algorithm 2 using GP-based acquisition. The function Split(S, θ′ ) partitions the given region S into 2 |θ| new regions that all share θ ′ as a vertex. 3.4 Risk Estimation In previous sections, we explored how surrogate models and conformal inference can be used to obtain probabilistic guarantees on the behavior of black-box cyber-physical system models. A different way of obtaining high-confidence probabilistic statements about correctness of systems that has recently emerged 28 is based on the idea of risk-estimation [166, 176, 68, 12]. Such a correctness statement provided by risk estimation is quantitative and takes the uncertainty of the behaviors of the system into account. A risk measure is simply a function g that maps a scalar random variable X to a real value g(X). Typically, the random variable X may represent an observed state of the system, where the probability distribution of X may depend on particular decision parameters for the system. The distribution of X allows us to estimate how risky a particular set of system’s design parameters are, and g(X) is a function that can be thought of as an empirical estimate of the magnitude of risk at a given confidence threshold ε. For example, consider the risk-aware verification problem considered in [12]. Here, the authors treat the robustness of a given STL formula with respect to a given model trajectory as a scalar random variable. The probability distribution of this random variable is induced by the uncertainty in the initial states and parameters of the model. A popular risk measure is known as Value-at-Risk (VaR). In risk-aware verification, given a risk level ε ∈ [0, 1], Value-at-Risk level∗ ε computes the value ρ ∗ s.t. the probability of the robustness for a trajectory being less than ρ ∗ is greater than ε. Clearly, this is yet another way to obtain probabilistic guarantees on a model’s behavior through simulations (which can be used to obtain empirical estimates of the probability distribution of the desired quantity, e.g., robustness w.r.t. a given formula). In what follows, we formally recap two important tail risk measures that are commonly used and compare the bounds we obtain through risk estimation with those obtained through conformal inference. 3.4.1 Risk Measures The behavior of a given CPS application can vary with the values of the system parameters. If we assume that the parameter values of a system are a priori unknown, then we can consider the system behavior as uncertain – where the uncertainty is induced by the distribution on the parameter values. Risk measures can then be used to quantify if the system is safe with a given probability threshold. We assume that the ∗Our definition is slightly different from the one in [12] and is consistent with the definitions in [212, 251, 211]. The main difference is that in [12], the authors denote VaR at level ε to denote infζ P(x < ζ) > 1 − ε, while the probability threshold in our technique is ε. 29 parameter value θ and ρ(φ, ξθ)follow a joint (unknown) distribution Dθ,ρ(φ) . A risk measure rε can provide the following probabilistic guarantee about the robustness of the system, given an STL specification and a confidence level ε: Pr (−ρ(φ, ξθ) ≤ rε) ≥ ε (3.10) We now include two important risk measures used in literature[166]. Definition 3.4.1 (Value-at-Risk (VaR), Conditional-Value-at-Risk (CVaR) [166]). Let Z be shorthand for ρ(φ, ξθ). The Value-at-Risk is defined as follows: VaRε(−Z) = inf ζ∈R {ζ | Pr(−Z ≤ ζ) ≥ ε} (3.11) The conditional-value-at-risk is defined as follows: CVaRε(−Z) = E −Z≥VaRε(−Z) (−Z) (3.12) Essentially, both risk measures provide probabilistic upper bounds on the negative of the robustness value, or provide lower bounds on the actual robustness value, as is required in risk-aware verification [12, 166]. To compare with the bounds provided by conformal inference, we also need to compute probabilistic guarantees on upper bounds on the robustness. These can be simply given by the risk measures VaRε(ρ(φ, ξθ)) and CVaRε(ρ(φ, ξθ)). For brevity, we refer to VaRε(−ρ(φ, ξθ)) as VaRℓ ε and VaRε(ρ(φ, ξθ)) as VaRu ε . We will use similar notation CVaRℓ ε and CVaRu ε . 30 Figure 3.2: Illustration of the Value-at-Risk, and the Conditional Value-at-Risk with ε = 0.7 3.4.2 Computing risk against different system parameters Algorithm 2 returns regions of space that satisfy or do not satisfy a given STL formula. We now investigate whether conformal inference and risk measures will give the same conclusion on the safety of a region. We remark that the system under consideration is deterministic, but the random choice of the model parameter values θ induces a distribution of the robustness values ρ(φ, ξθ) of the given STL formula φ. Thus, the probability appearing in equation (3.11) is associated with this distribution. Estimation of VaR, CVaR. The definition of VaRε presumes that we know the joint probability distribution of ρ(φ, ξθ) and θ. However, in our problem setting, this distribution is unknown. We use the (100 ∗ ε)-percentile of the samples values to approximate the VaRε [127]. To estimate CVaRε, we note that CVaRε can be rewritten as the following integral: CVaRε(−Z) = Z ε 0 VaRγ(−Z)dγ (3.13) The value of the above integral can be estimated using standard Monte Carlo integration by randomly sampling the values of γ. 31 (a) Partitioning using naïve split (δmin = 0.02) (b) Partitioning using GP-based split (δmin = 0.02) (c) Mountain Car Ground Truth Figure 3.3: Mountain Car parameter space partitioning using different approaches 3.5 Case Studies In this section, we present case studies of CPS models, and identify regions in the parameter space that we can mark as safe, unsafe or unknown with high probability. We tried each of the case studies with different regression algorithms, with Gaussian Process regression leading to smaller residuals, ergo, narrower conformal intervals. We tried both (a) the naïve algorithm that recursively splits the parameter space, and (b) the algorithm which adaptively partitions the parameter space exploiting the uncertainty as expressed by a Gaussian Process prior. For all case studies, we used a miscoverage level of ϵ = 0.05 (i.e. providing a correctness threshold of 95% probability). We first compared the performance of Algorithm 2 while using different partition splitting methods. For the GP-based partitioning method, we use the sum kernel k(θi , θj ) = k1(θi , θj ) + k2(θi , θj ), where k1 is the dot product kernel, i.e., k1(θi , θj ) = σ 2 0 + θi · θj , and k2 is the white kernel, where k2(θi , θj ) = 1 if θi = θj and 0 otherwise. Comparing partitioning schemes using Mountain Car. For the comparison experiment, we used a model known as mountain car popular in the reinforcement learning literature [278]. Here, the model describes an under-powered car attempting to drive up a hill. A successful strategy involves the car accumulating potential energy by going in the opposite direction and then use the gained momentum. Details 32 Partition method num. regions Ratio of Volumes (%) explored Safe Unsafe Unk. Naïve (Section 3.3.3) 457 89.01 10.99 0.00 Greatest uncertainty (Section 3.3.4) 364 88.72 11.28 0.00 Table 3.1: Comparison of Alg. 2 for different partitioning strategies. (1 − ϵ) = 0.95 of this model can be found in [278]. The parameter space for mountain car is defined by the initial position xinit and velocity vinit of the car. We wish to identify regions of space that satisfy or violate the property of reaching the goal. The region we choose for analysis is defined as as Θ = (xinit, vinit) ∈ [−0.7, 0.2] × [−0.5, 0.5], which is comparable to the region used in [278]. We consider the parameter value safe if it satisfies the STL formula φmc = F[0,10](x(t) > 0.45). In Fig. 3.3c, we show an approximation of the ground truth obtained by a uniform grid sampling of the parameter space† ; here, green and red dots respectively denote satisfaction and violation of φmc. Results of comparing the naïve splitting method (Fig. 3.3a) and GP-based partitioning (Fig. 3.3b) are shown in Table 3.1. We note that the number of regions explored is much lower than the one with naïve splitting. As fewer number of regions explored translates into fewer number of simulations, it is clear that the GP-based method has superior performance. We observed similar results for other case studies in the chapter, but we skip the results for brevity. Due to the superiority of the GP-based method, we use this method for rest of the case studies in the chapter. Reinforcement Learning Lane Keep Assist. Lane-keep assist (LKA) is an automated driver assistance technique used in semi autonomous vehicles to keep the ego vehicle traveling along the centerline of a lane. We consider a reinforcement learning (RL)-based agent to perform LKA from the Matlab® RL toolbox (based on [183]). The agent has a Deep Q-Network (DQN) inside, which makes this case study a learningenabled application. The inputs to the agent are lateral deviation e1, relative yaw angle (i.e. yaw error) e2, their derivatives and their integrals. The parameter space for this model consists of initial values for e1 †Note that the number of grid samples used to generate the approximate ground truth far exceeds the number of simulations required for the experiments, and is only provided to enable validation of our results. 33 and e2, where we looked at region Θ = (e1, e2) ∈ [−0.3, 0.3] × [−0.2, 0.2]. We are interested in checking properties such as overshoot/undershoot bounds and the settling time for the lateral deviation and yaw error signals. In this experiment, we consider two properties characterizing bounds on e2 and settling time for e1; φlka,settle : G[2,15](|e1| < 0.025) and φlka,bounds : G[0,15](e2 < 0.4∧e2 > −0.4). Figure 3.4 shows the parameter space partitioning results and the ground truth with respect to φlka,settle. Our technique was able to certify that φlka,bounds is satisfied by the entire region with 95% confidence. (a) Partitioning using greatest uncertainty split (δmin = 0.01) (b) LKA (settle) Ground Truth Figure 3.4: Lane Keep Assist F-16 Control System. Next, we consider the verification challenge presented in [124]. This is the model of a F-16 flight control system – a hierarchical control system containing an outer-loop autopilot and an inner loop tracking and stabilizing controller (ILC), and a 13 dimensional non-linear dynamical plant model. The plant dynamics are based on a 6 degrees of freedom standard airplane model [237] represented by a system of 13 ODEs describing the force equations, kinematics, moments and a first-order lag model for the afterburning turbofan engine. These ODEs describe the evolution of the system states, namely velocity vt, angle of attack α, sideslip β, altitude h, attitude angles: roll ϕ, pitch θ, yaw ψ, and their corresponding rates p, q, r, engine power and two more states for translation along north and east. The non-linear plant model uses linearly interpolated lookup tables to incorporate wind tunnel data. The control system is composed of an autopilot that sets the references on upward acceleration, stability roll rate and the throttle. The ILC uses an LQR state feedback law to track the references and computes the control input for the aileron, 34 (a) Partitioning using greatest uncertainty split (δmin = 0.1) (b) Ground truth (c) Partitioning using greatest uncertainty split (δmin = 100) (d) Ground truth Figure 3.5: F16 - Pull up (Top) Level Flight (bottom). rudder and the elevator. We consider three separate scenarios capturing specific contexts; each scenario defines the parameter set and an associated specification. F16-Pull up maneuver. This scenario demonstrates the tracking of a constant autopilot command requesting an upward acceleration (Nz = 5g). The ILC tries to track the reference without undesirable transients like pitch oscillations and exceeding pitch rate limits. We modify the controller gains to highlight the violations of the spec φf16,pullup : G[0,10]q ≤ 120◦/s. The parameter space is described by initial values of α ∈ [−10◦ , 0 ◦ ], θ ∈ [−30◦ , 0 ◦ ] and the results are shown in Figure 3.5a (results of Alg. 2) and Fig. 3.5b (ground truth). F16-Level Flight. This scenario describes straight and level flight with a constant attitude and 0 initial angular rates. The bounded parameter space is defined by the initial altitude h ∈ [500, 65000] and velocity vt ∈ [130, 1200]. The autopilot references are set to zero, and the ILC tries to maintain a constant altitude 35 and angle of attack α. As the F-16 can fly over a large range of altitudes and velocities, a single LQR computed against the linearzied model can not satisfy the goal and results in a stall defined by φf16,level : G[0,10](α ≤ 35◦ ). This is shown Fig. 3.5. F16-Ground Collision Avoidance(GCAS). The final scenario describes the F-16 diving towards the ground and the GCAS autopilot trying to prevent the collision. The GCAS brings the roll angle and its rate to 0 and then accelerates upwards to avoid ground collision as defined by the spec φf16,gcas : G[0,10](h ≥ 0f t). The parameter space is described by initial values of α ∈ [0.075, 0.1]c and φ ∈ [−0.1, −0.075]c . In this case study, the ground truth and our results seem to be less well-matched than other case studies. There are a couple of reasons for this. First, observe that the ground truth is highly non-monotonic. Given the nonlinearity of the ground truth, the fitted surrogate model could tend to fit the value of the majority of the points better, which in this case are negative values, making the optimization results errs on the negative values and resulting in a region being marked unsafe. To remedy this, we would have to increase the number of simulations per region used to train the GP regression model and possibly experiment with other regression models (such as a deep neural network regressor). We provide an illustration of the results in Figure 3.6a and 3.6b. (a) Partitioning using greatest uncertainty split (δmin=0.002) (b) GCAS Ground truth Figure 3.6: F16-Ground Collision Avoidance (GCAS) Artificial Pancreas. Type-1 diabetes (juvenile diabetes) is a chronic condition caused by the inability of the pancreas to secrete the required amount of insulin. Simglucose [268] is a Python implementation of the 36 FDA-approved Type-1 Diabetes simulator [181] which models glucose kinetics. We input a list of tuples of time and meal size to Simglucose and set the same scenario environment. Choosing patients in different age will result in different simulation trace. The parameter meal time is constrained to be strictly increasing and the last meal of a day to be taken in less than 24 hours. For each scenario, the simulator provides traces records of different blood indicators based on a given environment setting. We are interested in checking if patients do not become hyperglycemic on the first day (i.e. when the blood glucose (BG) exceeds a certain threshold). We use δ = 0.5 as the termination criteria for region splitting. We study 4 scenarios describing an adolescent patient who takes 2,3,4 and 5 meals a day respectively. The meals of size si are consumed at time ti . The parameters space is then defined by S1 × S2 × · · · × Sn, the dimension of the parameter space equals the number of meals taken. We denote n as the total number of meals. We can calculate ti using equation ti = (i − 1) ∗ 24 n + 1, i ∗ 24 n , and Si ⊆ [1, 20]. The property φhyper,c specifies that the patient should not become hyperglycemic. Our results predict the entire region as 100% safe region with 95% confidence for all cases where the patient had 3 or more meals. For the two meal case for the property φhyper,155 our implementation result of 54.44% matches well with the ground truth where we see a 52.47% unsafe volume (obtained by expensive grid-based sampling). For higher dimensional cases, the initial Si considered are different. This could imply more frequent meals with less amounts each can help control blood glucose. Impact and Discussion of Results. The results obtained for all the case studies are summarized in Table 3.2. Our tool is capable of producing heatmap style representation of the unsafe parameter regions when projected to 2 parameter dimensions. For higher number of parameter dimensions, visualization is more difficult. Hence we also report the percentage volume of regions found safe or unsafe by our method. We observe that in most cases, the volume of regions that remain unknown is quite low. As some of the case studies are those of learning-enabled CPS applications, it is expected to see a high volume of 37 Case Study Ratio of Volumes (%) Sims./ Spec. Safe Unsafe Unk. region Mountain Car 1 88.72 11.28 0.00 100 φmc Lane Keep Assist 1 100 0.00 0.00 100 φlka,bounds Lane Keep Assist 2 77.23 21.97 0.80 100 φlka,settle F16 Level Flight 67.18 32.81 0.00 100 φf16,level F16 Pull up 43.52 56.09 0.40 100 φf16,pullup F16 GCAS 3.91 96.09 0.00 100 φf16,GCAS Simglucose 2D 45.45 54.55 0.00 10 φhyper,155 Simglucose 2D 100 0.00 0.00 10 φhyper,170 Simglucose 3D 100 0.00 0.00 10 φhyper,155 Simglucose 4D 100 0.00 0.00 10 φhyper,155 Simglucose 5D 100 0.00 0.00 10 φhyper,155 Table 3.2: Performance of Algorithm 2 using the GP-based greatest uncertainty split method with 95% confidence level unsafe regions – this can happen if the learning-enabled components (LECs) are effectively trained in all parameter regions. Thus our tool can provide useful information to algorithms for training such LECs. We remark that the runtime for our method is dominated by the time required for running the simulations – a step that is easily parallelizable. We can also reuse simulations performed on a given sub-region of a coarser region when the region is split. Our prototype tool also does not include either of these optimizations. The time required for training the GP surrogate with 100 data points takes 0.035 seconds on an average. The naïve refinement procedure takes around 7.1 µs for models with 2D parameter spaces and 24 µs for 3D parameter spaces. With GP-based refinement (which requires the use of optimization with acquisition functions) the runtime is 0.006 seconds. Thus, with the ability to parallelize and reuse simulations, the additional overhead induced by our method (e.g. in comparison to an SMC method) is minimal. We do acknowledge that SMC methods can perhaps obtain guarantees with a fewer number of simulations using statistical hypothesis testing; however SMC methods typically do not learn surrogate models and cannot generate parameter space partitioning. We finally remark that if the model parameters are being chosen by an outer loop supervisory control, then the partitions that we generate create conditional contracts on the safety of the CPS model; such contracts can be used for constructing safety assurance cases[222]. 38 3.5.1 Comparing with risk measures Figures 3.7-3.9, demonstrate the VaRε and CVaRε risk assessment for each region. For the same confidence level ε = 95%, the conformal inference procedure computes [vmin, vmax] as the bounds on the robustness value, while with a given risk measure rε, let [r ℓ , ru ] indicate the (probabilistic) lower and upper bounds on the robustness values. Recall that if vmax < 0, then note the region is red, and if vmin > 0 the region is green. We use the following color coding: 1. For red regions: If vmax < ru < 0, then we use a lighter shade of red, and if 0 > vmax > ru , we use a darker shade. Intuitively, a lighter shade indicates that the risk-based probability measures deem the region “less unsafe” (as compared to the bounds computed by the split conformal predictor), and a darker shade indicates riskier or more unsafe regions (for the same comparison). 2. For green regions: If r ℓ > vmin > 0, then we use a lighter shade of green, and if 0 < rℓ < vmin, we use a darker shade. Intuitively, a darker shade indicates that the risk-based probability measures deem the region “less safe”, and a lighter shade indicates a more robustly safe region (according to risk estimation). 3. For blue regions: if vmax < 0 < ru (i.e. the conformal bound deems the region unsafe, but the risk measure either deems the region safe or inconclusive), then we color the region light blue. If r ℓ < 0 < vmin (i.e. the conformal bound deems the region safe, but the risk measure either deems it unsafe or inconclusive), then we color the region dark blue. Blue regions indicate that the two methods are unable to agree on the classification of a region as safe or unsafe. In conclusion, conformal inference bounds are not guaranteed to bound the risk (as computed by using risk measures). However, risk measures largely agree with conformal inference on region safety. Table 3.3 and Table 3.4 show risk measure performance compared to the bounds computed by conformal inference as a function of the number of samples used per region to compute either kind of bounds. The column 39 (a) CVaR with −ρ as loss. Partitioning using greatest uncertainty split (δmin = 0.1) (b) Comparing Conformal Inference and CVaR. Regions with the color blue indicate they have different conclusions on the safety of the region. (c) VaR with −ρ as loss. Partitioning using greatest uncertainty split (δmin = 0.1) (d) Comparing Conformal Inference and VaR. Regions with the color blue indicate they have different conclusions on the safety of the region. Figure 3.7: F16 - Pull up. 100 samples per region. labeled “Diff” counts regions where the conformal inference is: (1) not assigned unknown to the region, and (2) the risk measure has a different conclusion about the safety of that region. We can see that the maximum disagreement is still less than 5%, and the two methods tend to agree more when we sample more per region. We remark that a surrogate model’s training set could still miss rare unsafe points due to sampling vagaries, so the surrogate model can provide an optimistic vmin, which may mark some regions containing unsafe points as safe (because the regression-based surrogate model lacks some critical data points). However, risk-based analysis directly does quantile estimation on the robustness values (without fitting a surrogate model), and hence, in some cases, may help (conservatively) label regions as unsafe. 40 (a) CVaR with −ρ as loss. Partitioning using greatest uncertainty split (δmin = 0.1) (b) Comparing Conformal Inference and CVaR. Regions with the color blue indicate they have different conclusions on the safety of the region. (c) VaR with −ρ as loss. Partitioning using greatest uncertainty split (δmin = 0.1) (d) Comparing Conformal Inference and VaR. Regions with the color blue indicate they have different conclusions on the safety of the region. Figure 3.8: F16 - Pull up. With more samples (1000) per region. Sims./ region CVaR Lower CVaR Higher Diff # Region VaR Lower VaR Higher Diff # Region 5 17 (54.84%) 0 (0.0%) 3 (9.68%) 31 17 (60.71%) 2 (7.14%) 1(3.57%) 28 10 34 (55.74%) 7 (11.48%) 3 (4.92 %) 61 26 (65.0%) 2(5.0%) 6(15.0%) 40 50 148 (62.18%)) 0 (0.0%) 24 (10.08%) 238 158 (55.83%) 1 (0.35%) 17 (6.01%) 283 100 879 (35.37%) 1(0.04%) 61(2.45%) 2485 647 (27.79%) 0 (0.0%) 30 (1.24%) 2425 200 830 (36.24%) 3 (0.13%) 50 (2.18%) 2290 637 (28.0%) 0 (0.0%) 15 (0.66%) 2275 300 801 (33.61%) 2 (0.08%) 50 (2.10%) 2383 612 (27.70%) 2 (0.09%) 13 (0.59%) 2209 500 757 (32.50%) 2 (0.09%) 34 (1.46%) 2329 592 (25.19%) 6 (0.26%) 10 (0.43%) 2350 1000 739 (31.73%) 8 (0.34%) 35 (1.50%) 2329 579 (24.21 %) 11 (0.46%) 7 (0.29%) 2392 Table 3.3: Number of CVaR and VaR not bounded by the upper and lower bound of Conformal Inference, with −ρ as loss function and 95% as confidence level 41 (a) CVaR with ρ as loss. Partitioning using greatest uncertainty split (δmin = 0.1) (b) Comparing Conformal Inference and CVaR. Regions with the color blue indicate they have different conclusions on the safety of the region. (c) VaR with ρ as loss and Partitioning using greatest uncertainty split (δmin = 0.1) (d) Comparing Conformal Inference and VaR. Regions with the color blue indicate they have different conclusions on the safety of the region. Figure 3.9: F16 - Pull up 100 samples per region. Sim./ region CVaR Lower CVaR Higher Diff # Region VaR Lower VaR Higher Diff # Region 5 0 (0.0%) 10 (62.5%) 9(56.25%) 16 0 (0.0%) 5 (71.43%) 5 (71.43%) 7 10 0 (0.0%) 15 (48.39%) 9 (29.03%) 31 1 (3.57 %) 16 (57.14%) 13 (46.43%) 28 50 0 (0.0%) 97 (62.99%) 40 (25.97%) 154 1 (1.06%) 57 (60.64%) 26 (27.66%) 94 100 1 (0.04%) 37 (1.51%) 1 (0.04%) 2458 8 (0.34%) 1 (0.04%) 0 (0.0%) 2350 200 3 (0.13%) 150 (6.34%) 8 (0.34%) 2365 5 (0.21%) 30 (1.28%) 4 (0.17%) 2338 300 3 (0.13%) 169 (7.16%) 7 (0.30%) 2359 9 (0.37%) 27 (1.10%) 0 (0.0%) 2446 500 5 (0.21%) 208 (8.62%) 4 (0.17%) 2413 8 (0.35%) 48 (2.09%) 1 (0.04%) 2293 1000 16 (0.70%) 223 (9.80%) 14 (0.62%) 2275 17 (0.74%) 60 (2.61%) 2 (0.09%) 2299 Table 3.4: Number of CVaR and VaR not bounded by the upper and lower bound of Conformal Inference, with ρ as loss function and 95% as confidence level 3.6 Related Work and Conclusions Related Work. Methods based on Statistical Model Checking (SMC) [157, 159, 285] can overcome the hurdles like scalability and nonlinearity and provide probabilistic guarantees [278, 217, 264, 72, 4]. These 42 methods are based on statistical inference methods like sequential probability ratio tests [157, 217, 230, 72], Bayesian statistics [285], and Clopper-Pearson bounds [278]. Another line of works use Probably Approximately Correct (PAC) learning theory to give probabilistic bounds for Markov decision processes and black-box systems [106, 98]. In contrast to SMC and PAC-learning techniques, our approach is sample independent and can provide the required probabilistic guarantees with any number of samples. This is because we build a guaranteed regression model from the system parameters with respect to the robust satisfaction value of the corresponding STL properties. If the regression model is of poor quality (due to few samples), using the calibration step in conformal regression, the predicted (but wider) interval can still have the same level of guarantee. Conformal regression lets us tradeoff the quality of the regression model (w.r.t. the data) and the width of the interval for which we have high-confidence property satisfaction, and not the level of the guarantee itself. Recent work on using conformity measures is quite relevant to our work [56, 58]; the main contribution here is that in order to handle high-dimensional inputs in real-time, the authors compute a nonconformity score using an embedding representation of deep neural network models. This work however focuses on a classification problem and not on obtaining probabilistic guarantees on (closed-loop) system correctness. The idea of obtaining trusted confidence bounds is however similar, albeit applied in a different context. The work in [60, 119] focuses on detecting regions of the input space of a learning-enabled component that lack training data and hence can potentially have large (prediction) errors. These approaches are more suitable for runtime assurance or statically characterizing uncertainty in predictions performed by learning-enabled components. In our work, we use regression-style learning algorithms to approximate the model itself and use such surrogate models to obtain probabilistic guarantees. While the models we consider may themselves have learning enabled components (LECs), our approach is black-box: it does not reason about the LECs themselves. 43 Risk is an excellent way to analyze the robustness of systems. Control design sees risk as essential to calculate and optimize during the process. Works on stochastic system verification find risk measures give more information than just expectation values of cost. We perform a detailed comparison of our work in [204] with the guarantees obtained using risk measures. We see that the verification results of our proposed methods agree with each other in most cases. We argue that when the two guarantees do not agree, regions where such disagreements occur would require further investigation. A potential reason for mismatch is that the empirical distribution of residual error (when training the surrogate model) may differ from the distribution of robustness values in this region (due to non-linearities in the CPS model’s dynamics, how well the chosen surrogate model fits the data, etc.). Conclusions. In this Chapter, we proposed a verification framework that can search the parameter space to find the regions that lead to satisfaction or violation of given specification with probabilistic coverage guarantees. There are a couple of directions we aim to explore as future work: 1) We used a very basic version of conformal regression in Algorithm 1, which gives a constant confidence range c across all X. Techniques based on quantile regression [216] and locally-weighed conformal [160] can make c a function of X and give much shorter prediction intervals. 2) We plan to explore probabilistic regret bounds for Gaussian process optimization to help obtain (probabilistic) upper and lower bounds on the value of the surrogate model when using GP-based regression. 44 Chapter 4 Conformal Prediction for STL Runtime Verification 4.1 Introduction Cyber-physical systems may be subject to a small yet non-zero failure probability, especially when using data-enabled perception and decision making capabilities, e.g., self-driving cars using high-dimensional sensors. Rare yet catastrophic system failures hence have to be anticipated. In this chapter, we aim to detect system failures with high confidence early on during the operation of the system. Verification aims to check the correctness of a system against specifications expressed in mathematical logics, e.g., linear temporal logic [199] or signal temporal logic (STL) [179]. Automated verification tools were developed for deterministic systems, e.g., model checking [28, 73] or theorem proving [233, 232]. Non-deterministic system verification was studied using probabilistic model checking [47, 123, 151, 135] or statistical model checking [274, 275, 158, 157]. Such offline verification techniques have been applied to verify cyber-physical systems, e.g., autonomous race cars [134, 133, 166], cruise controller and emergency braking systems [245, 243], autonomous robots [238], or aircraft collision avoidance systems [30, 31]. These verification techniques, however, are: 1) applied to a system model that may not capture the system sufficiently well, and 2) performed offline and not during the runtime of the system. We may hence certify a system to be safe a priori (e.g., with a probability of 0.99), but during the system’s runtime we may observe an unsafe system realization (e.g., belonging to the fraction of 0.01 unsafe realizations). 45 Figure 4.1: Overview of the proposed STL predictive runtime verification algorithms. Both algorithms use past observations (x0, . . . , xt) to obtain state predictions (ˆxt+1, xˆt+2, . . .). The direct algorithm calculates the satisfaction measure ρ(ϕ, xˆ) of the specification ϕ based on these predictions, and obtains a prediction region C for the unknown satisfaction measure ρ(ϕ, x) using conformal prediction. The indirect method obtains prediction regions for the unknown states xt+1, xt+2, . . . using conformal prediction first, and then obtains a lower of the unknown satisfaction measure ρ(ϕ, x) based on the state prediction regions. Runtime verification aims to detect unsafe system realizations by using online monitors to observe the current realization (referred to as prefix) to determine if all extensions of this partial realization (referred to as suffix) either satisfy or violate the specification, see [43, 163, 63] for deterministic and [267, 234, 136] for non-deterministic systems. The verification answer can be inconclusive when not all suffixes are satisfying or violating. Predictive runtime verification instead predicts suffixes from the prefix to obtain a verification result more reliably and quickly [27, 272, 202]. We are interested in the predictive runtime verification of a stochastic system, modeled by an unknown distribution D, against a system specification ϕ expressed in STL. Particularly, we want to calculate the probability that the current system execution violates the specification based on the current observed trajectory, see Figure 4.1. To the best of our knowledge, existing predictive runtime verification algorithms do not provide formal correctness guarantees unless restrictive assumptions are placed on the prediction algorithm or the underlying distribution D. We allow the use of complex prediction algorithms such 46 as recurrent neural networks (RNNs) and long short-term memory (LSTM) networks, while making no assumptions on D. Our contributions are as follows: • We present two predictive runtime verification algorithms that are illustrated in Figure 4.1 and that use: i) trajectory predictors to predict future system states, and ii) conformal prediction to quantify prediction uncertainty. • We show that our algorithms enjoy valid verification guarantees, i.e., the verification answer is correct with a user-defined confidence, with minimal assumptions on the predictor and the underlying distribution D. We provide technical proofs of Theorems and Lemmas in an appendix. • We provide realistic empirical validation of our approach of an F-16 aircraft and a self-driving car, and compare the two proposed runtime verification algorithms. 4.1.1 Related Work Statistical model checking. Statistical model checking is a lightweight alternative to computationally expensive probabilistic model checking used to verify black-box systems [274, 275, 158, 157]. The idea is to sample system trajectories and use statistical tools to get valid verification guarantees. Statistical model checking has gained popularity due to the complexity of modern machine learning architectures for which it is difficult to obtain meaningful, i.e., not overly conservative, analytical results. We focus on signal temporal logic (STL) as a rich specification language [179] that admits robust semantics to quantify how robustly a system satisfies a specification spatially and/or temporally [97, 90, 214]. Statistical model checking under STL specifications was first considered in [36, 37], while [225, 224, 135] proposed a combination of a statistical and a model-based approach. The authors in [264, 278, 263, 217] use statistical testing to derive high confidence bounds on the probability of a cyber-physical system satisfying an STL specification. In [67, 166, 169, 12, 13] risk verification algorithms were proposed using mathematical notions of risk. 47 Predictive Runtime Verification. Runtime verification complements system verification by observing the current system execution (prefix) to determine if all extensions (suffixes) either satisfy or violate the specification [43, 163, 63, 267, 234, 136]. Runtime verification is an active research area [173, 220, 57], and algorithms were recently proposed for verifying STL properties and hyperproperties in [116, 82, 229] and [102, 121], respectively. While the verification result in runtime verification can be inconclusive, predictive runtime verification predicts a set of possible suffixes (e.g., a set of potential trajectories) to provide a verification result more reliably and quickly. In [272, 198, 277, 276, 14, 150], knowledge of the system is assumed to obtain predictions of system trajectories. However, the system is not always exactly known so that in [100, 26, 27] a system model is learned first, while in [273, 71, 202, 240] future system trajectories are predicted from past observed data using trajectory predictors. To the best of our knowledge, none of these works provide valid verification guarantees unless the system is exactly known or strong assumptions are placed on the prediction algorithm. Conformal Prediction. Conformal prediction was introduced in [258, 231] as a statistical tool to quantify uncertainty of prediction algorithms. In [174], conformal prediction was used to obtain guarantees on the false negative rate of an online monitor. Conformal prediction was used for verification of STL properties in [204] by learning a predictive model of the STL semantics. For reachable set prediction, the authors in [54, 55, 61] used conformal prediction to quantify uncertainty of a predictive runtime monitor that predicts reachability of safe/unsafe states. However, the works in [204, 54, 55, 61] train task-specific predictors while we use task-independent trajectory predictors to predict future system states from which we infer information about the satisfaction of the task. This is significant as no expensive retraining is required when the specification changes. The authors of the work in [62], which appeared concurrently with our chapter, also consider predictive runtime verification under STL specifications. Similar to our work, they provide probabilistic guarantees for the quantitative semantics of STL, but consider a different runtime 48 verification setting in which systems have to be Markovian. Again, their predictors are task-specific while our predictors are task-independent so that we avoid expensive retraining when specifications change. 4.2 Problem Formulation Let D be an unknown distribution over system trajectories that describe our system, i.e., let X := (X0, X1, . . .) ∼ D be a random trajectory where Xτ denotes the state of the system at time τ that is drawn from R n . Modeling stochastic systems by a distribution D provides great flexibility, and D can generally describe the motion of Markov decision processes. It can capture stochastic systems whose trajectories follow the recursive update equation Xτ+1 = f(Xτ , θτ ) where θτ is a random variable and where the (unknown) function f describes the system dynamics. Stochastic systems can describe the behavior of engineered systems such as robots and autonomous systems, e.g., drones or self-driving cars, but they can also describe weather patterns, demographics, and human motion. We use lowercase letters xτ for realizations of the random variable Xτ . We make no assumptions on the distribution D, but assume availability of training and calibration data drawn from D. Assumption 4.2.1. We have access to K independent realizations x (i) := (x (i) 0 , x (i) 1 , . . .) of the distribution D that are collected in the dataset D := {x (1), . . . , x(K)}. Informal Problem Formulation. Assume now that we are given a specification ϕ for the stochastic system D, e.g., a safety or performance specification defined over the states Xτ of the system. In “offline” system verification, e.g., in statistical model checking, we are interested in calculating the probability that (X0, X1, . . .) ∼ D satisfies the specification. In runtime verification, on the other hand, we have already observed the partial realization (x0, . . . , xt) of (X0, . . . , Xt) online at time t, and we want to use this information to calculate the probability that (X0, X1, . . .) ∼ D satisfies the specification.∗ In this chapter, we use predictions xˆτ|t of future states Xτ for this task in a predictive runtime verification approach. ∗We note that we consider unconditional probabilities in this chapter. 49 Figure 4.2: Ten realizations of two stochastic systems (solid lines) and corresponding LSTM predictions at time t := 100 (red dashed lines). The specification is that trajectories should be within the green box between 150 and 250 time units. While in “offline” verification all realizations of D are taken into account, only a subset of these are relevant in runtime verification. One hence gets different types of verification guarantees, e.g., consider a stochastic system (X0, X1, . . .) ∼ D of which we have plotted ten realizations in Figure 4.2 (left). In an offline approach, this system satisfies the specification infτ∈[150,250] Xτ ∈ [0, 3] ≥ 0 with a probability of 0.5. However, given an observed partial realization (x1, . . . , x100), we are able to give a better answer. In this case, we used LSTM predictions xˆτ|100 (red dashed lines), to more confidently say if the specification is satisfied. While the stochastic system in Figure 4.2 (left) has a simple structure, the same task for the stochastic system in Figure 4.2 (right) is already more challenging. Assumption 4.2.2. We consider bounded STL formulas ϕ, i.e., all time intervals I within the formula ϕ are bounded. 4.2.1 Trajectory Predictors Given an observed partial sequence (x0, . . . , xt) at the current time t ≥ 0, we want to predict the states (xt+1, . . . , xt+h) for a prediction horizon of h > 0. Our runtime verification algorithm is in general compatible with any trajectory prediction algorithm. Assume therefore that Predict is a measurable function that maps observations (x0, . . . , xt) to predictions (ˆxt+1|t , . . . , xˆt+h|t ) of (xt+1, . . . , xt+h). Trajectory predictors are typically learned. We therefore split the dataset D into training and calibration datasets Dtrain and Dcal, respectively, and learn Predict from Dtrain. 50 A specific example of Predict are recurrent neural networks (RNNs) that have shown good performance [171, 221]. For τ ≤ t, the recurrent structure of an RNN is given as a 1 τ := N (xτ , a1 τ−1 ), a i τ := N (xτ , ai τ−1 , ai−1 τ ), ∀i ∈ {2, . . . , n} yτ+1|τ := O(a n τ ), where xτ is the input that is sequentially applied to the RNN and where N is a function that can parameterize different types of RNNs, e.g., LSTMs [126]. Furthermore, n is the RNN’s depth and a 1 τ , . . . , an τ are the hidden states. The output yt+1|t := (ˆxt+1|t , . . . , xˆt+h|t ) provides an estimate of (xt+1, . . . , xt+h) via the function O which typically parameterizes a linear last layer. 4.2.2 Predictive Runtime Verification We recall that(x0, x1, . . .) denotes a realization of X := (X0, X1, . . .) ∼ D. Assume that we have observed xobs := (x0, . . . , xt) at time t, i.e., all states up until time t are known, while the realizations of xun := (xt+1, xt+2, . . .) are not known yet. Consequently, we have that X := (Xobs, Xun). † In this chapter, we are interested in calculating the probability that (X, τ0) |= ϕ as formally stated next.‡ Problem 4.2.3. Given a distribution (X0, X1, . . .) ∼ D, the current time t, the observations xobs := (x0, . . . , xt), a bounded STL formula ϕ that is enabled at τ0, and a failure probability ϵ ∈ (0, 1), determine if P (X, τ0) |= ϕ ≥ 1 − ϵ holds. Several comments are in order. Note that we use the system specification ϕ (and not its negation ¬ϕ) to determine if ϕ is satisfied. From P (X, τ0) |= ϕ ≥ 1 − ϵ, we can infer that P (X, τ0) |= ¬ϕ ≤ ϵ, † For convenience, we chose the notations of Xobs, Xun, and X that do not explicitly reflect the dependence on the current time t. ‡We remark that the semantics and the robust semantics are measurable so that probabilities over these functions are well defined [lindemann2022risk, 37]. i.e., we get an upper bound on the probability that the specification is violated. We further remark that, as a byproduct of our solution to Problem 4.2.3, we obtain a probabilistic lower bound C¯ ∈ R on the robust semantics ρ(ϕ, X, τ0), i.e., so that P ρ(ϕ, X, τ0) ≥ C¯ ≥ 1 − ϵ. We would like to point out two special instances of Problem 4.2.3. When τ0 := 0, we recover the “standard” runtime verification problem in which a specification is enabled at time zero, such as in the example infτ∈[150,250] xτ ∈ [0, 3] shown in Figure 4.2. When τ0 := t, the current time coincides with the time the specification is enabled. This may, for instance, be important when monitoring the current quality of a system, e.g., when monitoring the output of a neural network used for perception in autonomous driving. 4.3 Conformal Prediction for Predictive Runtime Verification In this section, we first provide an introduction to conformal prediction for uncertainty quantification. We then propose two predictive runtime verification algorithms to solve Problem 4.2.3. We refer to these algorithms as direct and indirect. This naming convention is motivated as the direct method applies conformal prediction directly to obtain a prediction region for the robust semantics ρ(ϕ, X, τ0). The indirect method uses conformal prediction to get prediction regions for future states Xτ first, which are subsequently used indirectly to obtain a prediction region for ρ(ϕ, X, τ0), see Figure 4.2. 4.3.1 Direct STL Predictive Runtime Verification Recall that we can obtain predictions xˆτ|t of xτ for all future times τ > t using the Predict function. However, the predictions xˆτ|t are only point predictions that are not sufficient to solve Problem 4.2.3 as they do not contain any information about the uncertainty of xˆτ|t . 5 We first propose a solution by a direct application of conformal prediction. Let us therefore define h := τ0 + L ϕ − t as the maximum prediction horizon that is needed to estimate the satisfaction of the bounded STL specification ϕ. Define now the predicted trajectory xˆ := (xobs, xˆt+1|t , . . . , xˆt+h|t ) (4.1) which is the concatenation of the current observations xobs and the predictions of future states xˆt+1|t , . . . , xˆt+h|t . For an a priori fixed failure probability ϵ ∈ (0, 1), our goal is to directly construct a prediction region defined by a constant C so that P ρ(ϕ, x, τ ˆ 0) − ρ(ϕ, X, τ0) ≤ C ≥ 1 − ϵ. (4.2) Note that ρ(ϕ, x, τ ˆ 0) is the predicted robust semantics for the specification ϕ that we can calculate at time t based on the observations xobs and the predictions xˆt+1|t , . . . , xˆt+h|t . Now, if equation (4.2) holds, then we know that ρ(ϕ, x, τ ˆ 0) > C is a sufficient condition for P(ρ(ϕ, X, τ0) > 0) ≥ 1 − ϵ to hold. To obtain the constant C, we thus consider the nonconformity score R := ρ(ϕ, x, τ ˆ 0) − ρ(ϕ, X, τ0). In fact, let us compute the nonconformity score for each calibration trajectory x (i) ∈ Dcal as R (i) := ρ(ϕ, xˆ (i) , τ0) − ρ(ϕ, x(i) , τ0) where xˆ (i) := (x (i) obs, xˆ (i) t+1|t , . . . , xˆ (i) t+h|t ) resembles equation (4.1), but now defined for the calibration trajectory x (i) . § A positive nonconformity score R(i) indicates that our predictions are too optimistic, i.e., the predicted robust semantics ρ(ϕ, xˆ (i) , τ0) is greater than the actual robust semantics ρ(ϕ, x(i) , τ0) obtained §This means that xˆ (i) is the concatenation of the observed calibration trajectory x (i) obs := (x (i) 0 , . . . , x (i) t ) and the predictions xˆ (i) t+1|t , . . . , xˆ (i) t+h|t obtained from x (i) obs. 5 when using the ground truth calibration trajectory x (i) . Conversely, a negative value of R(i) means that our prediction are too conservative. We can now directly obtain a constant C that makes equation (4.2) valid, and use this C to solve Problem 4.2.3, by a direct application of [242, Lemma 1]. Therefore assume, without loss of generality, that the values of R(i) are sorted in non-decreasing order and let us add R(|Dcal|+1) := ∞ as the (|Dcal| + 1)th value. Theorem 4.3.1. Given a distribution (X0, X1, . . .) ∼ D, the current time t, the observations xobs := (x0, . . . , xt), a bounded STL formula ϕ that is enabled at τ0, the dataset Dcal, and a failure probability ϵ ∈ (0, 1). Then the prediction region in equation (4.2) is valid with C defined as C := R (p) where p := (|Dcal| + 1)(1 − ϵ) , (4.3) and it holds that P (X, τ0) |= ϕ ≥ 1 − ϵ if ρ(ϕ, x, τ ˆ 0) > C. Proof. The nonconformity scores R(i) are independent and identically distributed by their definition and Assumption 4.2.1. By [242, Lemma 1], we hence know that equation (4.2) is valid by the specific choice of C in equation (4.3). Consequently, we have that P ρ(ϕ, X, τ0) ≥ ρ(ϕ, x, τ ˆ 0) − C ≥ 1 − ϵ. If now ρ(ϕ, x, τ ˆ 0) > C, it holds that P(ρ(ϕ, X, τ0) > 0) ≥ 1 − ϵ by which it follows that P (X, τ0) |= ϕ ≥ 1 − ϵ since ρ(ϕ, X, τ0) > 0 implies (X, τ0) |= ϕ [90, 97]. It is important to note that the direct method, as well as the indirect method presented in the next subsection, do not need to retrain their predictor when the specification ϕ changes, as in existing work such as [204, 54]. This is since we use trajectory predictors to obtain state predictions xˆτ|t that are specification independent. Remark 4.3.2. Note that Theorem 4.3.1 assumes a fixed failure probability ϵ. If one wants to find the tightest bound with the smallest failure probability δ so that P (X, τ0) |= ϕ ≥ 1 − ϵ holds, we can (approximately) find the smallest such ϵ by a simple grid search over ϵ ∈ (0, 1) and repeatedly invoke Theorem 4.3.1. Remark 4.3.3. We emphasize that the prediction regions in equation (4.2), and hence the result that P (X, τ0) > 0 |= ϕ ≥ 1 − ϵ if ρ(ϕ, x, τ ˆ 0) > C, guarantee marginal coverage. This means that the probability measure P is defined over the randomness of the test trajectory X and the randomness of the calibration trajectories in Dcal. We thereby obtain probabilistic guarantees for the verification procedure, but we do not obtain guarantees conditional on Dcal. 4.3.2 Indirect STL Predictive Runtime Verification We now present the indirect method where we first obtain prediction regions for the state predictions xˆt+1|t , . . . , xˆt+h|t , and then use these prediction regions to solve Problem 4.2.3. We later discuss advantages and disadvantages between the direct and the indirect method (see Remark 4.3.7), and compare them in simulations (see Section 5.6). For a failure probability of ϵ ∈ (0, 1), our first goal is to construct prediction regions defined by constants Cτ so that P ∥Xτ − xˆτ|t∥ ≤ Cτ , ∀τ ∈ {t + 1, . . . , t + h} ≥ 1 − ϵ, (4.4) i.e., Cτ should be such that the state Xτ is Cτ -close to our predictions xˆτ|t for all relevant times τ ∈ {t + 1, . . . , t + h} with a probability of at least 1 − δ. Let us thus consider the following nonconformity score that we compute for each calibration trajectory x (i) ∈ Dcal as R (i) τ := ∥x (i) τ − xˆ (i) τ|t ∥ where we recall that xˆ (i) τ|t is the prediction obtained from the observed calibration trajectory x (i) obs. A large nonconformity score indicates that the state predictions xˆ (i) τ|t of x (i) τ are not accurate, while a small score indicates accurate predictions. Assume again that the values of R (i) τ are sorted in non-decreasing order and define R (|Dcal|+1) τ := ∞ as the (|Dcal|+ 1)th value. To obtain the values of Cτ that make equation (4.4) valid, we use the results from [235, 165]. Lemma 4.3.4 ([235, 165]). Given a distribution (X0, X1, . . .) ∼ D, the current time t, the observations xobs := (x0, . . . , xt), the dataset Dcal, and a failure probability ϵ ∈ (0, 1). Then the prediction regions in equation (4.4) are valid with Cτ defined as Cτ := R (p) τ where p := (|Dcal| + 1)(1 − ϵ¯) and ϵ¯ := ϵ/h. (4.5) Note the scaling of ϵ by the inverse of h, as expressed in ϵ¯. Consequently, the constants Cτ increase with increasing prediction horizon h, i.e., with larger formula length L ϕ , as larger h result in smaller ϵ¯and consequently in larger p according to (4.5). We can now use the prediction regions of the predictions xˆτ|t from equation (4.4) to obtain prediction regions for ρ(ϕ, X, τ0) to solve Problem 4.2.3. The main idea is to calculate the worst case of the robust semantics ρ(ϕ) over these prediction regions. To be able to do so, we assume that the formula ϕ is in positive normal form, i.e., that the formula ϕ contains no negations. This is without loss of generality as every STL formula ϕ can be re-written in positive normal form, see e.g., [223]. Let us next define a worst 56 case version ρ¯(ϕ) of the robust semantics ρ(ϕ)that incorporates the prediction regions from equation (4.4). For predicates µ, we define these semantics as ρ¯(µ, x, τ ˆ ) := h(xτ ) if τ ≤ t infζ∈Bτ h(ζ) otherwise where we recall the definition of the predicted trajectory xˆ in equation (4.1) and where Bτ := {ζ ∈ R n |∥ζ − xˆτ|t∥ ≤ Cτ } is a ball of size Cτ centered around the prediction xˆτ|t , i.e., Bτ defines the set of states within the prediction region at time τ . The intuition behind this definition is that we know the value of the robust semantics ρ(µ, X, τ ) = ¯ρ(µ, x, τ ˆ ) if τ ≤ t since xτ is known. For times τ > t, we know that Xτ ∈ Bτ holds with a probability of at least 1 − ϵ by Lemma 4.3.4 so that we compute ρ¯(µ, x, τ ˆ ) := infζ∈Bτ h(ζ) to obtain a lower bound for ρ(µ, X, τ ) with a probability of at least 1 − ϵ. Remark 4.3.5. For convex predicate functions h, computing infζ∈Bτ h(ζ) is a convex optimization problem that can efficiently be solved. However, note that the optimization problem infζ∈Bτ h(ζ) may need to be solved for different times τ and for multiple predicate functions h. For non-convex functions h, we can obtain lower bounds of infζ∈Bτ h(ζ) that we can use instead. Particularly, let Lh be the Lipschitz constant of h, i.e., let |h(ζ) − h(ˆxτ|t )| ≤ L∥ζ − xˆτ|t∥. Then, we know that inf ζ∈Bτ h(ζ) ≥ h(ˆxτ|t ) − LhCτ . For instance, the constraint h(ζ) := ∥ζ1 − ζ2∥ − 0.5, which can encode collision avoidance constraints, has Lipschitz constant one. The worst case robust semantics ρ¯(ϕ)for the remaining operators (True, conjunctions, until, and since) are defined recursively in the standard way, i.e., the same way as for the robust semantics ρ(ϕ). 57 ρ¯(True, x, τ ˆ ) := ∞, ρ¯(µ, x, τ ˆ ) := h(xτ ) if τ ≤ t infζ∈Bτ h(ζ) otherwise ρ¯(¬ϕ, x, τ ˆ ) := −ρ¯(ϕ, x, τ ˆ ), ρ¯(ϕ ′ ∧ ϕ ′′ , x, τ ˆ ) := min(¯ρ(ϕ ′ , x, τ ˆ ), ρ¯(ϕ ′′ , x, τ ˆ )), ρ¯(ϕ ′UIϕ ′′ , x, τ ˆ ) := sup τ ′′∈(τ⊕I)∩N min ρ¯(ϕ ′′ , x, τ ˆ ′′), inf τ ′∈(τ,τ′′)∩N ρ¯(ϕ ′ , x, τ ˆ ′ ) , ρ¯(ϕ ′UIϕ ′′ , x, τ ˆ ) := sup τ ′′∈(τ⊖I)∩N min ρ¯(ϕ ′′ , x, τ ˆ ′′), inf τ ′∈(τ ′′,τ)∩N ρ¯(ϕ ′ , x, τ ˆ ′ ) . We can now use the worst case robust semantics to solve Problem 4.2.3. Theorem 4.3.6. Let the conditions of Lemma 4.3.4 hold. Given a bounded STL formula ϕ in positive normal form that is enabled at τ0. Then it holds that P (X, τ0) |= ϕ ≥ 1 − ϵ if ρ¯(ϕ, x, τ ˆ 0) > 0. Proof. Note first that Xτ ∈ Bτ for all times τ ∈ {t + 1, . . . , t + h} with a probability of at least 1 − ϵ by Lemma 4.3.4. For all predicates µ in the STL formula ϕ and for all times τ ∈ {0, . . . , t + h}, it hence holds that ρ(µ, X, τ ) ≥ ρ¯(µ, x, τ ˆ ) with a probability of at least 1 − ϵ by the definition of ρ¯(µ). Since the formula ϕ does not contain negations¶ , it is straightforward to show (inductively on the structure of ϕ) that ρ(ϕ, X, τ ) ≥ ρ¯(ϕ, x, τ ˆ ) with a probability of at least 1 − ϵ. Consequently, if ρ¯(ϕ, x, τ ˆ ) > 0, it holds that P (X, τ0) |= ϕ ≥ 1 − ϵ since ρ(ϕ, X, τ0) > 0 implies (X, τ0) |= ϕ [90, 97]. Finally, let us point out conceptual differences with respect to the direct STL predictive runtime verification method. ¶Negations would in fact flip the inequality in an unfavorable direction, e.g., for ¬µ it would hold that ρ(¬µ, X, τ ) ≤ ρ¯(¬µ, x, τ ˆ ) with a probability of at least 1 − ϵ. Figure 4.3: LSTM predictions of the altitude h on Dtest (left, left-mid) and direct predictive runtime verification method (right-mid, right). Left: five best (in terms of mean square error) predictions on Dtest, left-mid: five worst predictions on Dtest, right-mid: histogram of the nonconformal score R(i) on Dcal for direct method, right: predicted robustness ρ(ϕ, xˆ (i) , τ0) and ground truth robustness ρ(ϕ, x(i) , τ0) on Dtest. Remark 4.3.7. The state prediction regions (4.4) obtained in Lemma 4.3.4 may lead to conservatism in Theorem 4.3.6, especially for larger prediction horizons h due to the scaling of ϵ with the inverse of h. In fact, we require larger calibration datasets Dcal compared to the direct method to achieve p ≤ |Dcal| (recall that Cτ = ∞ if p > |Dcal|). On the other hand, the indirect method is more interpretable and allows to identify parts of the formula ϕ that may be violated by analyzing the uncertainty of predicates via the worst case robust semantics ρ¯(µ, x, τ ˆ ). This information may be helpful and can be used subsequently in a decision making context for plan reconfiguration. 4.4 Case Studies We present two case studies in which we verify an aircraft and a self-driving car. We remark upfront that, in both case studies, we fix the calibration dataset Dval a-priori and then evaluate our proposed runtime verification method on several test trajectories. As eluded to in Remark 4.3.3, one would technically have to resample a calibration dataset for each test trajectory. This is impractical and, in fact, shown to not be needed when the size of the calibration dataset is large enough, see [19, Section 3.3] for a detailed discussion on this topic. 59 Figure 4.4: Left: F-16 Fighting Falcon within the high fidelity aircraft simulator from [124]. Right: Self-driving car within the autonomous driving simulator CARLA [92]. 4.4.1 F-16 Aircraft Simulator In our first case study, we consider the F-16 Fighting Falcon, which is a highly-maneuverable aircraft. The F-16 has been used as a verification benchmark, and the authors in [124] provide a high-fidelity simulator for various maneuvers such as ground collision avoidance, see Figure 4.4 (left). The F-16 aircraft is modeled with 6 degrees of freedom nonlinear equations of motion, and the aircraft control system consists of an outer and an inner control-loop. The outer loop encodes the logic of the maneuver in a finite state automaton and provides reference trajectories to the inner loop. In the inner loop, the aircraft (modeled by 13 continuous states) is controlled by low-level integral tracking controllers (adding 3 additional continuous states), we refer the reader to [124] for details. In the simulator, we introduce randomness by uniformly sampling the initial conditions of the air speed, angle of attack, angle of sideslip, roll, pitch, yaw, roll rate, pitch rate, yaw rate, and altitude from a compact set. We use a ground collision avoidance maneuver, and are thus primarily interested in the plane’s altitude that we denote by h. We collected Dtrain := 1520 training trajectories, Dcal := 5680 calibration trajectories, and Dtest := 100 test trajectories. From Dtrain, we trained an LSTM of depth two and width 50 to predict future states of h. ∥ We show the LSTM performance in predicting h in Figure 4.3. Particularly, we ∥We only used the observed sequence of altitudes (h0, . . . , ht) as the input of the LSTM. Additionally using other states is possible and can improve prediction performance. 60 Figure 4.5: Indirect predictive runtime verification method. Left, left-mid, and right-mid: histograms of the nonconformal scores R(i) of τ step ahead prediction on Dcal for τ ∈ {50, 100, 200} and the indirect method, right: worst case predicted robustness ρ¯(ϕ, xˆ (i) , τ0) and ground truth robustness ρ(ϕ, x(i) , τ0) on Dtest. show plots of the best five and the worst five LSTM predictions, in terms of the mean square error, on the test trajectories Dtest in Figure 4.3 (left and left-mid). We are interested in a safety specification expressed as ϕ := G[0,T] (h ≥ 750) that is enabled at time τ0 := t, i.e., a specification that is imposed online during runtime. Hereby, we intend to monitor if the airplane dips below 750 meters within the next T := 200 time steps (the sampling frequency is 100 Hz). Additionally, we set ϵ := 0.05 and fix the current time to t := 230. Let us first use the direct predictive runtime verification algorithm and obtain prediction regions of ρ(ϕ, x, τ ˆ 0)−ρ(ϕ, X, τ0) by calculating C according to Theorem 4.3.1. We show the histograms of R(i) over the calibration data Dcal in Figure 4.3 (right-mid). The prediction regions C (i.e., the R(p) th nonconformity score) are highlighted as vertical lines. In a next step, we empirically evaluate the results of Theorem 4.3.1 by using the test trajectories Dtest. In Figure 4.3 (right), we plot the predicted robustness ρ(ϕ, xˆ (i) , τ0) and the ground truth robustness ρ(ϕ, x(i) , τ0). We found that for 100 of the 100 = |Dtest| trajectories it holds that ρ(ϕ, xˆ (i) , τ0) > C implies (x (i) , τ0) |= ϕ, confirming Theorem 4.3.1. We also validated equation (4.2) and found that 96/100 trajectories satisfy ρ(ϕ, xˆ (i) , τ0) − ρ(ϕ, x(i) , τ0) ≤ C. Let us now use the indirect predictive runtime verification algorithm. We first obtain prediction regions of ∥Xτ −xˆτ|t∥ by calculating Cτ according to Lemma 4.3.4. We show the histograms for three different τ in Figure 4.5 (left, left-mid, right-mid). We also indicate the prediction regions Cτ by vertical lines (note that ϵ¯ = ϵ/200 in this case). We can observe that larger prediction times τ result in larger prediction regions 61 Cτ . This is natural as the trajectory predictor is expected to perform worse for larger τ . In a next step, we empirically evaluate the results of Theorem 4.3.6 by calculating the worst case robust semantic ρ¯(ϕ, xˆ (i) , τ0) for the test trajectories Dtest. In Figure 4.5 (right), we plot the worst case robustness ρ¯(ϕ, xˆ (i) , τ0) and the ground truth robustness ρ(ϕ, x(i) , τ0). We found that for 100 of the 100 = |Dtest| trajectories it holds that ρ¯(ϕ, xˆ (i) , τ0) > 0 implies (x (i) , τ0) |= ϕ, confirming Theorem 5.4.4. By a direct comparison of Figures 4.3 (right) and 4.5 (right), we observe that the indirect method is more conservative than the direct method in the obtained robustness estimates. Despite this conservatism, the indirect method allows us to obtain more information in case of failure by inspecting the worst case robust semantics ρ¯(ϕ, x, τ ˆ t) as previously remarked ins Remark 4.3.7. 4.4.2 Autonomous Driving in CARLA We consider the case study from [166] in which two neural network lane keeping controllers, an imitation learning (IL) controller [218] and a learned control barrier function (CBF) controller [168], are verified within the autonomous driving simulator CARLA [92] using offline trajectory data. The controllers are supposed to keep the car within the lane during a long 180 degree left turn, see Figure 4.4 (right) in the Appendix. The authors in [166] provide offline probabilistic verification guarantees, and find that not every trajectory satisfies the specification. This motivates our predictive runtime verification approach in which we would like to alert of potential violations of the specification already during runtime. For the analysis, we consider the cross-track error ce (deviation of the car from the center of the lane) and the orientation error θe (difference between the orientation of the car and the lane). Within CARLA, the control input of the car is affected by additive Gaussian noise and the initial position of the car is drawn uniformly from (ce, θe) ∈ [−1, 1] × [−0.4, 0.4]. We obtained 1000 trajectories for each controller, and use |Dtrain| := 700 trajectories to train an LSTM, while we use |Dcal| := 200 62 Figure 4.6: LSTM predictions of the imitation learning controller on Dtest. Left: five best (in terms of mean square error) ce predictions, left-mid: five worst ce predictions, right-mid: five best θe predictions, right: five worst θe predictions. Figure 4.7: LSTM predictions of the control barrier function controller on Dtest. Left: five best (in terms of mean square error) ce predictions, left-mid: five worst ce predictions, right-mid: five best θe predictions, right: five worst θe predictions. trajectories to obtain conformal prediction regions. The remaining |Dtest| := 100 trajectories are used for testing. We have trained two LSTMs for each controller from Dtrain using the same settings as in the previous section. In Figures 4.6 and 4.7, we show the LSTMs performances in predicting ce and θe for each controller, respectively. Particularly, the plots show the best five and the worst five LSTM predictions (in terms of the mean square error) on the test trajectories Dtest. For the verification of the car, we consider the following two STL specifications that are enabled at τ0 := 0: ϕ1 := G[10,∞)] |ce| ≤ 2.25 , ϕ2 := G[10,∞) (|ce| ≥ 1.25) =⇒ F[0,5]G[0,5](|ce| ≤ 1.25) . The first specification is a safety specification that requires the cross-track error to not exceed a threshold of 2.25 in steady-state (after 10 seconds of driving). The second specification is a responsiveness requirement that requires that a cross-track error above 1.25 is followed immediately within the next 5 seconds by a phase of 5 seconds where the cross-track error is below 1.25. As previously mentioned, we can use the same LSTM for both specifications, and we do not need any retraining when the specification changes which is a major advantage of our method over existing works. We set ϵ := 0.05 and fix the current time to t := 273 for the IL controller and t := 190 for the CBF controller. At these times, the cars controlled by each controller are approximately at the same location in the left turn (this difference is caused by different sampling times). As we have limited calibration data Dcal available (CARLA runs in real-time so that data collection is time intensive), we only evaluate the direct STL predictive runtime verification algorithm for these two specifications.∗∗ We hence obtain prediction regions of ρ(ϕ, x, τ ˆ 0) − ρ(ϕ, X, τ0) for each specification ϕ ∈ {ϕ1, ϕ2} by calculating C according to Theorem 4.3.1. For the first specification ϕ1, we show the histograms of R(i) for both controllers over the calibration data Dcal in Figure 4.8 (left: IL, left-mid: CBF). The prediction regions C are again highlighted as vertical lines, and we can see that the prediction regions C for the CBF controller are smaller, which may be caused by an LSTM that predicts the system trajectories more accurately (note that the CBF controller causes less variability in ce which may make it easier to train a good LSTM). In a next step, we empirically evaluate the results of Theorem 4.3.1 by using the test trajectories Dtest. In Figure 4.9 (left: IL, left-mid: CBF), we plot the predicted robustness ρ(ϕ1, x, τ ˆ 0) and the ground truth robustness ρ(ϕ1, X, τ0). We found that for 99 of the 100 = |Dtest| trajectories under the IL controller and for 100/100 trajectories under the CBF controller it holds that ρ(ϕ1, xˆ (i) , τ0) > C implies (x (i) , τ0) |= ϕ1, confirming Theorem 4.3.1. We also validated equation (4.2) and found that 95/100 trajectories under the IL controller and 95/100 trajectories under the CBF controller satisfy ρ(ϕ1, xˆ (i) , τ0) − ρ(ϕ1, x(i) , τ0) ≤ C. ∗∗The indirect STL predictive runtime verification algorithm would require more calibration data, recall the discussion from Remark 4.3.7. 64 Figure 4.8: Histograms of the nonconformal scores R(i) on Dcal and prediction region C. Left: IL controller and ϕ1, left-mid: CBF controller and ϕ1, right-mid: IL controller and ϕ2, right: CBF controller and ϕ2. For the second specification ϕ2, we again show the histograms of R(i) for both controllers over the calibration data Dcal in Figure 4.8 (right-mid: IL, right: CBF). We can now observe that the prediction region C for both controllers are relatively small. However, the absolute robustness is also less as in the first specification as can be seen in Figure 4.9 (right-mid: IL, right: CBF). We again empirically evaluate the results of Theorem 4.3.1 by using the test trajectories Dtest. In Figure 4.9 (right-mid: CBF, right: IL), we plot the predicted robustness ρ(ϕ2, x, τ ˆ 0) and the ground truth robustness ρ(ϕ2, X, τ0). We found that for 99/100 trajectories under the IL controller and for 98/100 trajectories under the CBF controller it holds that ρ(ϕ2, xˆ (i) , τ0) > C implies (x (i) , τ0) |= ϕ2, confirming Theorem 4.3.1. We also validated equation (4.2) and found that 98/100 trajectories under the IL controller and 92/100 trajectories under the CBF controller satisfy ρ(ϕ2, xˆ (i) , τ0) − ρ(ϕ2, x(i) , τ0) ≤ C. Finally, we would like to remark that we observed that the added Gaussian random noise on the control signals made the prediction task challenging, but the combination of LSTM and conformal prediction were able to deal with this particular type of randomness. In fact, poorly trained LSTMs lead to larger prediction regions. 4.5 Conclusion In this chapter, we presented two predictive runtime verification algorithms to compute the probability that the current system trajectory violates a signal temporal logic specification. Both algorithms use i) 65 Figure 4.9: Predicted robustness ρ(ϕ, xˆ (i) , τ0) and ground truth robustness ρ(ϕ, x(i) , τ0) on Dtest. Left: IL controller and ϕ1, left-mid: CBF controller and ϕ1, right-mid: IL controller and ϕ2, right: CBF controller and ϕ2. trajectory predictors to predict future system states, and ii) conformal prediction to quantify prediction uncertainty. The use of conformal prediction enables us to obtain valid probabilistic runtime verification guarantees. To the best of our knowledge, these are the first formal guarantees for a predictive runtime verification algorithm that applies to widely used trajectory predictors such as RNNs and LSTMs, while being computationally simple and making no assumptions on the underlying distribution. An advantage of our approach is that a changing system specification does not require expensive retraining as in existing works. We concluded with experiments of an F-16 aircraft and a self-driving car equipped with LSTMs. 66 Chapter 5 Conformance Testing for Stochastic Cyber-Physical Systems 5.1 Introduction Cyber-physical systems (CPS) are usually designed using a model-based design (MBD) paradigm. Here, the designer models the physical parts and the operating environment of the system and then designs the software used for perception, planning, and low-level control. Such closed-loop systems are then rigorously tested against various operating conditions, where the quality of the designed software is evaluated against model properties such as formal design specifications (or other kinds of quantitative objectives). Examples of such property-based analysis techniques include requirement falsification [39, 241, 11, 85, 201], nondeterministic and statistical verification [29, 264, 157, 159, 9, 1, 282, 204], and risk analysis [166, 13]. MBD is a fundamentally iterative process in which the designer continuously modifies the software to tune performance or increase safety margins, or change plant models to perform design space exploration [197], e.g., using model abstraction or simplification [227, 16, 45], or to incorporate new data [200]. Any change to the system model, however, requires repeating the property-based analyses as many times as the number of system properties. The fundamental problem that we consider in this chapter is that of conformance [215, 83, 3, 5, 7]. The notion of conformance is defined w.r.t. the input-output behavior 67 of a model. Typically, model inputs include exogenous disturbances or user-inputs to the model, usercontrollable design parameters, and initial operating conditions. For a given input u, let y = S(u) denote the observable behavior of the model S. Furthermore, let d(y1, y2) be a metric defined over the space of the model behaviors. For deterministic models, two models S1, S2 are said to be δ-conformant if for all inputs u it holds that d(y1, y2) < δ where y1 = S1(u) and y2 = S2(u) [83, 3, 7]. This notion of deterministic conformance is useful to reason about worst-case differences between models. However, most CPS applications use components that exhibit stochastic behavior; for example, sensors have measurement noise, actuators can have manufacturing variations, and most physical phenomena are inherently stochastic. The central question that this chapter considers is: What is the notion of conformance between two stochastic CPS models? There are some challenges in comparing stochastic CPS models; even if two models are repeatedly excited by the same input, the pair of model behaviors that are observed may be different for every such simulation. Thus, the observable behavior of a stochastic model is more accurately characterized by a distribution over the space of trajectories. A possible way to compare two stochastic models is to use measure-theoretic techniques to compare the distance between the trajectory distributions. A number of divergence measures such as the f-divergences, e.g., the Kullback-Leibler divergence and the total variation distance, or the Wasserstein distance may look like candidate tools to compare the trajectory distributions. However, we argue in this chapter that a divergence is not the right notion to use to compare stochastic CPS models. There can be two stochastic models whose output trajectories are very close using any trajectory space metric, but the divergence between their trajectory distributions can be infinite. On the other hand, there can be two trajectory distributions with zero divergence for which the distance between trajectories can be arbitrarily far apart. 68 This raises an interesting question: how do we then compare two stochastic models? In this chapter, we argue that probabilistic bounds derived from the distribution of the distances between model trajectories (excited by the same input) gives us a general definition of conformance that has several advantages, as outlined below. We complement this probabilistic viewpoint further and capture the risk that the distribution of the distancs between model trajectories is large leveraging risk measures [176]. First, we show that two stochastic systems that are conformant under our definition inherit the property of transference [83]. In simple terms, transference is the property that if the first model has certain logical or quantitative properties, then the second model also satisfies the same (or nearly same) properties. This property brings several benefits. Consider the scenario where probabilistic guarantees that a model has certain quantitative properties have been established after an extensive and large number of simulations. Ordinarily, if there were any changes made to the model, establishing such probabilistic guarantees would require repeating the extensive simulation-based procedure. However, transference allows us to potentially sample from existing simulations for the first model and sample a small number of simulations from the modified model to establish stochastic conformance between the models, thereby allowing us to establish probabilistic guarantees on the second model. We demonstrate examples of such transference w.r.t. quantitative properties arising from quantitative semantics of temporal logic specifications and control-theoretic cost functions. Next, we show how we can efficiently compute these probabilistic bounds using the notion of conformal prediction [242, 258] from statistical learning theory. At a high-level, conformal prediction involves computing quantiles of the empirical distribution of non-conformity scores over a validation dataset to obtain prediction intervals at a given confidence threshold. The contributions of this chapter are summarized as follows: 69 • We define stochastic conformance as a probabilistic bound over the distribution of distances between model trajectories. We also define the non-conformance risk to detect systems that are at risk of not being conformant. • We show that both notions have the desirable transference property, meaning that conformant systems satisfy similar system specifications. • We show how stochastic conformance and the non-conformance risk can be estimated using statistical tools from risk theory and conformal prediction. 5.2 Problem Statement and Preliminaries Consider the probability space (Ω, F, P) where Ω is the sample space, F is a σ-algebra∗ of Ω, and P : F → [0, 1] is a probability measure. In this chapter, our goal is to quantify conformance of stochastic systems, i.e., systems whose inputs and outputs form a probability space with an appropriately defined measure. Let the two stochastic systems be denoted by S1 and S2. The inputs and outputs of stochastic systems are signals, i.e., functions from a bounded interval of positive reals known as the time domain T ⊆ R≥0 to a metric space, e.g., the standard Euclidean metric. Each stochastic system Si then describes an input-output relation Si : U × Ω → Y where U and Y denote the sets of all input and output signals. We allow input signals† to be stochastic, and we use the notation U : T × Ω → R m to denote a stochastic input signals.‡ Modeling stochastic systems this way provides great flexibility, and Si can e.g., describe the motion of stochastic hybrid systems, Markov chains, and stochastic difference equations. Assume now that we apply the input signal U : T×Ω → R m to systems S1 and S2, and let the resulting output signals be denoted by Y1 : T × Ω → R n and Y2 : T × Ω → R n , respectively. We assume that the ∗A σ-algebra on a set Ω is a nonempty collection of subsets of Ω closed under complement, countable unions, and countable intersections. † Probability spaces over signals are defined by standard notions of cylinder sets [29]. ‡We will instead of the probability measure P, defined over (Ω, F), use more generally the notation Prob to be independent of the underlying probability space that we induce, e.g., as a result of transformations via U. 70 functions S1, S2, and U are measurable so that the output signals Y1 and Y2 are well-defined stochastic signals. One can hence think of Y1 and Y2 to be drawn from the distributions D1 and D2, respectively, which are functions of the probability space (Ω, F, P) as well as the functions S1, S2, and U. In this chapter, we make no restricting assumptions on the functions S1, S2, and U, and consequently we make no assumptions on the distributions D1 and D2. Informal Problem Statement. Let Y1 and Y2 be stochastic output signals of the stochastic systems S1 and S2, respectively, under the stochastic input signal U. How can we quantify closeness of the stochastic systems S1 and S2 under U? To answer this question, we will explore different ways of defining system “closeness” of Y1 and Y2, and we will present algorithms to compute these stochastic notions of closeness. A subsequent problem that we consider is related to transference of properties from one system to another system. Particularly, given a signal temporal logic specification, can we infer guarantees about the satisfaction of the specification of one system from another system if the systems are close under a suitable definition of closeness? 5.2.1 Distance Metrics To define a general framework for quantifying closeness of stochastic systems, we will use i) different signal metrics to capture the distance between individual realizations y1 := Y1(·, ω) and y2 := Y2(·, ω) of the stochastic signals Y1 and Y2 where ω ∈ Ω is a single outcome, and ii) probabilistic reasoning and risk measures to capture stochastic conformance and non-conformance, respectively, under these signal metrics. We first equip the set of output signals Y with a function d : Y × Y → R that quantifies distance between signals. A natural choice of d is a signal metric that results in a metric space (Y, d). We use general signal metrics such as the metric induced by the Lp signal norm for p ≥ 1. Particularly, define 71 dp(y1, y2) := R T ∥y1(t) − y2(t)∥ pdt 1/p so that the L∞ norm can also be expressed as d∞(y1, y2) := supt∈T ∥y1(t) − y2(t)∥. It is now easy to see that a signal metric d(Y1, Y2) evaluated over the stochastic signals Y1 and Y2 results in a distribution over distances between realizations of Y1 and Y2. To reason over properties of d(Y1, Y2), we will use probabilistic reasoning but we will also consider risk measures [176] as introduced next. 5.3 Conformance for Stochastic Input-Output Systems Our goal is now to quantify closeness of two stochastic systems S1 and S2 under the input U. We present our definitions for stochastic conformance and non-conformance risk upfront, and provide motivation for these afterwards. Definition 5.3.1. Let U : T × Ω → R m be a stochastic input signal, S1, S2 : U × Ω → Y be stochastic systems, and Y1, Y2 : T × Ω → R n be stochastic output signals with Y1 := S1(U, ·) and Y2 := S2(U, ·). Further, let δ ∈ R be a conformance threshold, ϵ ∈ (0, 1) be a failure probability, and d : Y × Y → R be a signal metric. Then, we say that the systems S1 and S2 under the input U are (δ, ϵ)-conformant if Prob(d(Y1, Y2) ≤ δ) ≥ 1 − ϵ. (5.1) Additionally, let R : F(Ω, R) → R be a risk measure and r ∈ R be a risk threshold. Then, S1 and S2 under the input U are at risk of being r-non-conformant if R(d(Y1, Y2)) > r. (5.2) 72 Eq. (5.1) is referred to as stochastic conformance and Eq. (5.2) as non-conformance risk. Let us now motivate and discuss these two definitions. While the definition of conformance in equation (5.1) appears natural at first sight, there are at least two competing ways of defining stochastic conformance. First, as Y1 and Y2 are distributions, it would be possible to define conformance as D(Y1, Y2) where D is a distance function that measures the difference between two distributions, such as a divergence (Kullback–Leibler or f-divergence). However, our definition provides an intuitive interpretation in the signal space where system specifications are typically defined, while it is usually difficult to provide such an interpretation for divergences between distributions. Additionally, the divergence between Y1 and Y2 may be unbounded (or zero) even when equation (5.1) holds (does not hold). Proposition 5.3.2. There exist stochastic systems S1 and S2 and distance metrics d where equation (5.1): i) is satisfied for δ > 0 and ϵ = 0, i.e., w.p. 1, but where the divergence between the systems is unbounded, and ii) is not satisfied for any given δ > 0 and ϵ ∈ (0, 1), but where the divergence between the systems is zero. Proof. Let us first prove i). For simplicity, consider systems S1 and S2 where the stochastic input and output signals are defined over the time domain T := {t0, . . . , tT }. Further, for all t ∈ T let y1(t) := 0 and y2(t) := δ. Clearly, equation (5.1) is satisfied, e.g., for d∞. The distributions D1 and D2 (joint distributions of Y1(t) and Y2(t), respectively) are Dirac distributions centered at 0 and δ, respectively. The Kullback–Leibler divergence between these two distributions is ∞ [115]. Let us now prove ii). Let T consist of a single time point for simplicity so that Y1 and Y2 are random variables defined over a sample space R. Let D1 and D2 be the same uniform distribution over [0, a]. Clearly, the divergence between D1 and D1 is zero. We know that the distribution of Y := Y1 − Y2 has support on [−a, a], and that the probability density function of Y is p(y) := 1/a − |y|/a2 . We can now compute Prob(|y| ≤ δ) = 2δ/a − δ 2/a2 . Given δ > 0 and ϵ ∈ (0, 1), we pick an ϵ¯ ∈ (0, 1) such that ϵ > ϵ ¯ . We then solve the quadratic equation 2δ/a − δ 2/a2 = 1 − ϵ¯ subject to the constraint that δ ≤ a. 73 Consequently, we find that a ≥ δ/(1 − ϵ¯)(1 + √ ϵ¯) results in Prob(|y| ≤ δ) < 1 − ϵ so that (5.1) is not satisfied. Another way of defining stochastic conformance was presented in [263] where the authors consider a task-specific definition of stochastic conformance where satisfaction probabilities are required to be approximately equal. In other words, two stochastic systems are called p-approximately probabilistically conformant if |Prob((Y1, τ ) |= ϕ) − Prob((Y2, τ ) |= ϕ)| ≤ p. In this definition, it may happen that two systems are p-approximately probabilistically conformant for a small value of p, while the systems produce completely different behaviors and individual realizations y1 and y2 are vastly different. Additionally to not being task specific, our definition covers the risk of being r-non-conformant in equation (5.2). Finally, we would like to remark that the definition of conformance in equation (5.1) is related to the definition of non-conformance risk in equation (5.2). In fact, when the risk measure R is the value-at-risk V aRβ, then we know that V aRβ(d(Y1, Y2)) > r ⇔ Prob(d(Y1, Y2) ≤ r) < β since V aRβ(d(Y1, Y2)) ≤ r is equivalent to Prob(d(Y1, Y2) ≤ r) ≥ β according to Section 5.2. Consequently, if β := 1 − ϵ and r := δ then V aRβ(d(Y1, Y2)) > r implies that the systems S1 and S2 under U are not (δ, ϵ)-conformant. The notion of conformance in Definition 5.3.1 is useful when the input U describes internal inputs such as system parameters (an unknown mass), exogeneous disturbances from known sources, or initial system conditions. In other words, the distribution U is known, making U a known unknown. However, in case of external inputs that could be manipulated (e.g. user inputs that represent rare malicious attacks), the input U may be unknown, making U an unknown unknown. We therefore provide an alternative definition of conformance. 74 Definition 5.3.3. Let U ∈ U be an unknown deterministic input signal, S1, S2 : U × Ω → Y be stochastic systems, and Y1, Y2 : T × Ω → R be stochastic output signals with Y1 := S1(U, ·) and Y2 := S2(U, ·). Further, let δ ∈ R be a conformance threshold, ϵ ∈ (0, 1) be a failure probability, and d : Y × Y → R be a signal metric. Then, we say that the systems S1 and S2 are (δ, ϵ)-conformant if Prob sup U∈U d(Y1, Y2) ≤ δ ≥ 1 − ϵ. (5.3) Additionally, let Risk : F(Ω, R) → R be a risk measure and r ∈ R be a risk threshold. Then, we say that the systems S1 and S2 under the input U are at risk of being r-non-conformant if Risk sup U∈U d(Y1, Y2) > r. (5.4) Based on this definition, note that it will be inherently more difficult to verify Definition 5.3.3 compared to Definition 5.3.1 due to the sup-operator. 5.4 Transference of System Properties under Conformance We expect two systems S1 and S2 that are (δ, ϵ)-conformant in the sense of Definitions 5.3.1 and 5.3.3 to have similar behaviors with respect to satisfying a given system specification. Therefore, we will define the notion of transference with respect to a performance function Q : Y → R that measures how well a signal y ∈ Y satisfies this system specification. Towards capturing similarity between S1 and S2 with respect to Q, the signal metric d has to be chosen carefully. 75 Definition 5.4.1. Let d : Y × Y be a signal metric and Q : Y → R be a performance function. Then, we say that Q is Hölder continuous w.r.t. d if there exists constants K, γ > 0 such that, for any two signals y1, y1 : T → R n , it holds that |Q(y1) − Q(y2)| ≤ Kd(y1, y2) γ (5.5) A specific example of the performance function Q is the robust semantics ρ(ϕ) of an STL specification ϕ. In fact, the robust semantics ρ(ϕ) are Hölder continuous w.r.t. the sup-norm d∞ for constants K = 1 and γ = 1 [74, Lemma 2]. A commonly used performance function in control is Q(y) = R T 0 y(t) ⊤y(t)dt, and we note that this choice of Q is Hölder continuous w.r.t. d1 as shown in Appendix D in [203]. Finally, note that the Hölder continuity condition in equation (5.5) implies that, for any constants z, δ ∈ R, it holds that Q(y1) ≥ z ∧ d(y1, y2) ≤ δ ⇒ Q(y2) ≥ z − Kδγ . (5.6) 5.4.1 Transference under stochastic conformance With the definition of Q being Hölder continuous w.r.t. d, we can now derive a stochastic transference result under stochastic conformance as per Definition 5.3.1. Theorem 5.4.2. Let the premises in Definitions 5.3.1 and 5.4.1 hold. Further, let the systems S1 and S2 under the input U be (δ, ϵ)-conformant so that equation (5.1) holds and let Q be Hölder continuous w.r.t. d so that equation (5.5) holds. Then, it holds that Prob Q(Y1) ≥ z ≥ 1 − ϵ¯ ⇒ Prob Q(Y2) ≥ z − Kδγ ≥ 1 − ϵ − ϵ.¯ Proof. By assumption, it holds that Prob(d(Y1, Y1) ≤ δ) ≥ 1 − ϵ and Prob(Q(Y1) ≥ z) ≥ 1 − ϵ¯ so that we know that Prob(d(Y1, Y1) > δ) ≤ ϵ and Prob(Q(Y1) < z) ≤ ϵ¯. We can now apply the union bound over these two events so that Prob d(Y1, Y1) > δ ∨ Q(Y1) < z ≤ ϵ + ¯ϵ. From here, we can simply see that Prob d(Y1, Y1) ≤ δ ∧ Q(Y1) ≥ z ≥ 1 − ϵ − ϵ.¯ Since Q is Hölder continuous w.r.t. d, which implies that equation (5.6) holds, it is easy to conclude that Prob(Q(Y2) ≥ z − Kδγ ) ≥ 1 − ϵ − ϵ¯. Theorem 5.4.2 tells us that i) (δ, ϵ)-conformance of systems S1 and S2 under U, and ii) Hölder continuity of the performance function Q w.r.t. the metric d enables us to derive a probabilistic lower bound for the performance of system S2 w.r.t. Q from the performance of system S1. We can derive a transference result similar to Theorem 5.4.2 when we assume that the systems S1 and S2 are (δ, ϵ)-conformant in the sense of Definition 5.3.3 instead of Definition 5.3.1. Theorem 5.4.3. Let the premises in Definitions 5.3.3 and 5.4.1 hold. Further, let the systems S1 and S2 be (δ, ϵ)-conformant so that equation (5.3) holds and let Q be Hölder continuous w.r.t. d so that equation (5.5) holds. Then, it holds that Prob inf U∈U Q(Y1) ≥ z ≥ 1 − ϵ¯ ⇒ Prob inf U∈U Q(Y2) ≥ z − Kδγ ≥ 1 − ϵ − ϵ¯ Proof. By assumption, it holds that Prob(supU∈U d(Y1, Y1) ≤ δ) ≥ 1 − ϵ and Prob(infU∈U Q(Y1) ≥ z) ≥ 1 − ϵ¯ so that we know that Prob(supU∈U d(Y1, Y1) > δ) ≤ ϵ and Prob(infU∈U Q(Y1) < z) ≤ ϵ¯. We can now apply the union bound over these two events so that Prob sup U∈U d(Y1, Y1) > δ ∨ inf U∈U Q(Y1) < z ≤ ϵ + ¯ϵ. From here, we can simply see that Prob sup U∈U d(Y1, Y1) ≤ δ ∧ inf U∈U Q(Y1) ≥ z ≥ 1 − ϵ − ϵ.¯ This equation tells us that, for each U ∈ U, we have Prob d(Y1, Y1) ≤ δ ∧ Q(Y1) ≥ z ≥ 1 − ϵ − ϵ.¯ Since Q is Hölder continuous w.r.t. d, we know that equation (5.5) holds for each U ∈ U. Consequently, we can conclude that Prob(infU∈U Q(Y2) ≥ z − Kδγ ) ≥ 1 − ϵ − ϵ¯ . 5.4.2 Transference under non-conformance risk On the other hand, by considering the notion of r-non-conformance risk, we expect that two systems S1 and S2 that are not at risk of being r-non-conformant have a similar risk of violating a specification. Here, we define the risk of violating a specifications by following ideas from [lindemann2022risk] as Risk(−Q(Y1)) and Risk(−Q(Y2)). Theorem 5.4.4. Let the premises in Definitions 5.3.1 and 5.4.1 hold. Further, let the systems S1 and S2 under the input U not be at risk of being r-non-conformant so that equation (5.2) does not hold (i.e., Risk(d(Y1, Y2)) ≤ r) and let Q be Hölder continuous w.r.t. d with γ = 1 so that equation (5.5) holds. If the risk measure Risk is monotone, positive homogeneous, and subadditive, it holds that Risk(−Q(Y2)) ≤ Risk(−Q(Y1)) + Kr. Proof. We can derive the following chain of inequalities Risk(−Q(Y2)) (a) ≤ Risk(−Q(Y1) + Kd(Y1, Y1)) (b) ≤ Risk(−Q(Y1)) + Risk(Kd(Y1, Y1)) (c) = Risk(−Q(Y1)) + KRisk(d(Y1, Y1)) (d) ≤ Risk(−Q(Y1)) + Kr where (a) follows since Q is Hölder continuous w.r.t. d and since Risk is monotone, (b) follows since Risk is subadditive, and (c) follows since Risk is positive homogeneous, while the inequality (d) follows since S1 and S2 under U are not at risk of being r-non-conformant, i.e., Risk(d(Y1, Y2)) ≤ r. This result implies that the risk of system S2 w.r.t. the performance function Q is upper bounded by the risk of system S1 w.r.t. Q if the systems S1 and S2 are not at risk of being r-non-conformant. We remark that a similar result appeared in our prior work [74]. Here, we present these results in the more general context of conformance and extend the result as we use general performance functions Q, which additionally requires Risk to be positive homogeneous. Additionally, we derive a transference result similar to Theorem 5.4.4 when we assume that the systems S1 and S2 are not at risk of being r-nonconformant in the sense of Definition 5.3.3 instead of Definition 5.3.1. Theorem 5.4.5. Let the premises in Definitions 5.3.3 and 5.4.1 hold. Further, let the systems S1 and S2 not be at risk of being r-non-conformant so that equation (5.4) does not hold (i.e., Risk(supU∈U d(Y1, Y2)) ≤ r) 79 and let Q be Hölder continuous w.r.t. d with γ = 1 so that equation (5.5) holds. If the risk measure Risk is monotone, positive homogeneous, and subadditive, it holds that Risk(− inf U∈U Q(Y2)) ≤ Risk(− inf U∈U Q(Y1)) + Kr. Proof. We can derive the following chain of inequalities Risk(− inf U∈U Q(Y2)) (a) ≤ Risk(− inf U∈U Q(Y1) + K sup U∈U d(Y1, Y1)) (b) ≤ Risk(− inf U∈U Q(Y1)) + Risk(K sup U∈U d(Y1, Y1)) (c) = Risk(− inf U∈U Q(Y1)) + KRisk(sup U∈U d(Y1, Y1)) (d) ≤ Risk(− inf U∈U Q(Y1)) + Kr where (a) follows since − infU∈U Q(Y2) = supU∈U −Q(Y2), since Q is Hölder continuous w.r.t. d, and since Risk is monotone, (b) follows since Risk is subadditive, and (c) follows since Risk is positive homogeneous. The inequality (d) follows since S1 and S2 are not at risk of being r-non-conformant, i.e., supU∈U Risk(d(Y1, Y2)) ≤ r. 5.5 Statistical Estimation of Stochastic Conformance We propose algorithms to compute stochastic conformance and the non-conformance risk. In practice, note that one will be limited to discrete-time stochastic systems to apply these algorithms. 5.5.1 Estimating stochastic conformance To estimate stochastic conformance, we use conformal prediction which is a statistical tool introduced in [258, 231] to obtain valid uncertainty regions for complex prediction models without making assumptions 80 on the underlying distribution or the prediction model [19, 103, 160, 242, 64]. Let R, R(1), . . . , R(k) be k+1 independent and identically distributed random variables modeling a quantity known as the nonconformity score. Our goal is to obtain an uncertainty region for R based on R(1), . . . , R(k) , i.e., the random variable R should be contained within the uncertainty region with high probability. Formally, given a failure probability ϵ ∈ (0, 1), we want to construct a valid uncertainty region over R (defined in terms of a value R¯) that depends on R(1), . . . , R(k) such that Prob(R ≤ R¯) ≥ 1 − ϵ. By a surprisingly simple quantile argument, see [242, Lemma 1], one can obtain R¯ to be the (1 − ϵ)th quantile of the empirical distribution of the values R(1), . . . , R(k) and ∞. By assuming that R(1), . . . , R(k) are sorted in non-decreasing order, and by adding R(k+1) := ∞, we can equivalently obtain R¯ := R(p) where p := ⌈(k + 1)(1 − ϵ)⌉ with ⌈·⌉ being the ceiling function. We can now use conformal prediction to estimate stochastic conformance as defined in Definition 5.3.1 by setting R := d(Y1, Y2). We therefore assume that we have access to a calibration dataset Dcal that consists of realizations y (i) 1 and y (i) 2 from the stochastic signals Y1 ∼ D1 and Y2 ∼ D2, respectively. Theorem 5.5.1. Let the premises of Definition 5.3.1 hold and Dcal be a calibration dataset with datapoints (y (i) 1 , y (i) 2 ) drawn from D1 × D2. Further, define R(i) := d(y (i) 1 , y (i) 2 ) for all i ∈ {1, . . . , |Dcal|} and R(|Dcal|+1) := ∞, and assume that the R(i) are sorted in non-decreasing order. Then, it holds that Prob(d(Y1, Y2) ≤ R¯) ≥ 1 − ϵ with R¯ defined as R¯ := R(p) where p := (|Dcal| + 1)(1 − ϵ) . Thus, the systems S1 and S2 under the input U are (δ, ϵ)-conformant if R¯ ≤ δ. We see that checking stochastic conformance as defined in Definition 5.3.1 is computationally simple when we have a calibration dataset Dcal. Checking stochastic conformance as defined in Definition 5.3.3, however, is more difficult due to the existence of the sup-operator. To compute this notion of conformance, we make two assumptions: i) the set U is compact, and ii) for every realization ω ∈ Ω, the function d(Y1(·, ω), Y2(·, ω)) is Lipschitz continuous with Lipschitz constant L. While knowledge of the Lipschitz constant L would presume knowledge about the closeness of the systems S1 and S2, it would only provide 81 a conservative over-approximation. We will, however, not need to know the Lipschitz constant L and estimate L instead along with probabilistic guarantees. Our approach is summarized in Algorithms 4 and 5. Algorithm 4 computes R¯ such that Prob(supU∈U d(Y1, Y2) ≤ R¯ + Lκ) ≥ 1 − ϵ when L is known and where κ is a gridding parameter, while Algorithm 5 estimates the Lipschitz constant. We present a description of these algorithms upfront and state their theoretical guarantees afterwards. In line 1 of Algorithm 4, we construct a κ-net U¯ of U, i.e., we construct a finite set U¯ so that for each U ∈ U there exists a U¯ ∈ U¯ such that ¯d(U,U¯) ≤ κ where ¯d : U × U → R is a metric. For this purpose, simple gridding strategies can be used as long as the set U has a convenient representation. Alternatively, randomized algorithms can be used that sample from U [254]. In lines 2-4, we apply Theorem 5.5.1 for each element U¯ ∈ U¯. Therefore, we obtain realizations (y (i) 1 , y (i) 2 ) from D1 × D2 under U¯ (line 3). We then compute R¯ U¯ so that Prob(d(Y1(U, ¯ ·), Y2(U, ¯ ·)) ≤ R¯ U¯ ) ≥ 1 − ϵ (line 4). Finally, we set R¯ := maxU¯∈U¯ R¯ U¯ (line 5). In Algorithm 5, we compute L¯ such that Prob(L ≤ L¯) ≥ 1 − ϵL. We uniformly sample control inputs (U ′ , U′′) (line 2), obtain realizations(y ′ 1 , y′ 2 ) from D1×D2 under U ′ and realizations(y ′′ 1 , y′′ 2 ) from D1×D2 under U ′′ (line 3), and compute the non-conformity score L (i) (line 4). In line 5, we obtain an estimate L¯ of the Lipschitz constant L that holds with a probability of 1 − ϵL over the randomness introduced in Algorithm 4. Theorem 5.5.2. Let the premises of Definition 5.3.3 hold. If the Lipschitz constant L of d(Y1(·, ω), Y2(·, ω)) is known uniformly over ω ∈ Ω, then, for a gridding parameter κ > 0, the output R¯ of Algorithm 4 ensures that Prob(sup U∈U d(Y1, Y2) ≤ R¯ + Lκ) ≥ 1 − ϵ 82 Algorithm 4: Conformance Estimation as per Definition 5.3.3 Input: Failure probability ϵ ∈ (0, 1) and grid size κ > 0 Output: R¯ such that Prob(supU∈U d(Y1, Y2) ≤ R¯ + Lκ) ≥ 1 − ϵ 1 Construct κ-net U¯ of U 2 for U¯ ∈ U¯ do 3 Obtain calibration set DU¯ cal consisting of realizations (y (i) 1 , y (i) 2 ) under U¯ 4 Compute R¯ U¯ := R(p) by applying Theorem 5.5.1 but instead using dataset DU¯ cal 5 R¯ := maxU¯∈U¯ R¯ U¯ Algorithm 5: Lipschitz Constant Estimation of L Input: Failure probabilities ϵL ∈ (0, 1), grid size κ > 0, calibration size KL > 0 Output: L¯ such that Prob(supU∈U d(Y1, Y2) ≤ R¯ + Lκ¯ ) ≥ 1 − ϵ − ϵL 1 for i ← 1 to KL do 2 Sample (U ′ , U′′) uniformly from U × U 3 Obtain realizations (y ′ 1 , y′ 2 ) under U ′ and (y ′′ 1 , y′′ 2 ) under U ′′ 4 Compute L (i) := |d(y ′ 1 ,y′ 2 )−d(y ′′ 1 ,y′′ 2 )| d¯(U′ ,U′′) 5 Compute L¯ := L (p) where p := (KL + 1)(1 − ϵL) Thus, the systems S1 and S2 are (δ, ϵ)-conformant if R¯ + Lκ ≤ δ. Otherwise, let ϵL ∈ (0, 1) be a failure probability, then the output L¯ of Algorithm 5 ensures that Prob(sup U∈U d(Y1, Y2) ≤ R¯ + Lκ¯ ) ≥ 1 − ϵ − ϵL where Prob is defined over the randomness introduced in Algorithm 5. Proof. From line 4 of Algorithm 4 we know that Prob(d(Y1(U, ¯ ·), Y2(U, ¯ ·)) ≤ R¯ U¯ ) ≥ 1−ϵ for each U¯ ∈ U. Due to Lipschitz continuity, we can conclude that for each U ∈ U that is such that ¯d(U,U¯) ≤ κ it holds that Prob(d(Y1, Y2) ≤ R¯ U¯ + Lκ) ≥ 1 − ϵ. Since U¯ is a κ-net of U, it follows that Prob(supU∈U d(Y1, Y2) ≤ R¯ + Lκ) ≥ 1 − ϵ. 83 For the second part of the proof, note that from line 5 of Algorithm 5 we know that Prob(L ≤ L¯) ≥ 1 − ϵL. We can now union bound over this event and Prob(d(Y1(U, ¯ ·), Y2(U, ¯ ·)) ≤ R¯ U¯ ) ≥ 1 − ϵ so that Prob(d(Y1(U, ¯ ·), Y2(U, ¯ ·)) ≤ R¯ U¯ ∧ L ≤ L¯) ≥ 1 − ϵ − ϵL. The rest of the proof follows as in the first part. 5.5.2 Estimating non-conformance risk We next briefly summarize how to estimate the value-at-risk and the conditional value-at-risk following standard results such as from [166, 182] and [262], respectively. Proposition 5.5.3. Let the premises of Definition 5.3.1 hold and Dcal be a calibration dataset with datapoints (y (i) 1 , y (i) 2 ) drawn from D1 × D2. Let β ∈ (0, 1) be a risk level and γ ∈ (0, 1) be a failure threshold. Define R(i) := d(y (i) 1 , y (i) 2 ) for each i ∈ {1, . . . , |Dcal|} and assume that Prob(R ≤ α) is continuous in α. Then, Prob V aRβ ≤ V aRβ(d(Y1, Y2)) ≤ V aRβ ≥ 1 − γ. where we have V aRβ := inf n α ∈ R Prob d(R ≤ α) − qln(2/γ) 2|Dcal| ≥ β o and V aRβ := inf n α ∈ R Prob d(R ≤ α) + qln(2/γ) 2|Dcal| ≥ β o with the empirical cumulative distribution function Prob d(R ≤ α) := 1 |Dcal| P|Dcal| i=1 I(R(i) ≤ α) and the indicator function I. For estimating CV aRβ(R), we assume that the random variable d(Y1, Y2) has bounded support, i.e., that Prob(d(Y1, Y2) ∈ [a, b]) = 1. Note that d(Y1, Y2) is usually bounded from below by a := 0 if d is a metric. To obtain an upper bound, we assume that the distance function saturated at b, e.g., by clipping values larger than b to b. In practice, this means that realizations that are far apart already have a large distance and are capped to b. 8 (a) Targets in Dubin’s car (b) CARLA: Cross-track error signals for S1, S2 (c) F16: altitude signals for S1, S2 (d) Spacecraft Trajectories Figure 5.1: The solid lines refer to Y1 and the dashed lines refer to Y2; in each of the displayed plots, the initial condition for each pair of realizations is the same. Proposition 5.5.4. Let the premises of Definition 5.3.1 hold and Dcal be a calibration dataset with datapoints (y (i) 1 , y (i) 2 ) drawn from D1 × D2. Let β ∈ (0, 1) be a risk level and γ ∈ (0, 1) be a failure threshold. Define R(i) := d(y (i) 1 , y (i) 2 ) for each i ∈ {1, . . . , |Dcal|} and assume that Prob(d(Y1, Y2) ∈ [a, b]) = 1. Then, it holds that Prob CV aRβ ≤ CV aRβ(d(Y1, Y2)) ≤ CV aRβ ≥ 1 − γ. where CV aRβ := CV aR \β + q 5 ln(3/γ) |Dcal|(1−β) (b − a) and CV aRβ := CV aR \β − q 11 ln(3/γ) |Dcal|(1−β) (b − a) where the empirical estimate of CV aRβ(R) is CV aR \β := infα∈R α + (|Dcal|(1 − β))−1 P|Dcal| i=1 [Ri − α] + . As a consequence of these two lemmas, we know that with a probability of 1 − γ the systems S1 and S2 under the input U are at risk of not being conformant if V aRβ ≥ α or CV aRβ ≥ α based on the risk measure of choice. (a) Trajectory distance on validation set (b) Robustness on validation set controller 1 (c) Robustness on validation set controller 2 Figure 5.2: Distance and robustness histogram for Dubin’s car with ϵ = ¯ϵ = 0.05. We use CV aR(d) to denote CV aR(d(Y1, Y2)). The z1 and δ are the values of conformal prediction on the calibration set of ρ(ϕdubin, Y1) and d∞(Y1, Y2). Distance |Dcal| δ d(Y1, Y2) Metric V S(d(Y1, Y2)) V aR(d(Y1, Y2)) CV aR(d(Y1, Y2)) d∞ 50 0.7825 0.987 0.7183 0.7947 1000 0.7163 0.956 0.7148 0.7647 2000 0.7122 0.952 0.712 0.7814 3000 0.7118 0.952 0.7117 0.7862 dsk (Skorokhod Distance) 50 0.6723 0.953 0.6517 0.7181 1000 0.6722 0.972 0.6711 0.7156 2000 0.6645 0.96 0.6639 0.7106 3000 0.6619 0.952 0.6613 0.7079 d2 50 2.6086 0.937 2.503 2.612 1000 2.7339 0.96 2.732 3.048 2000 2.7071 0.944 2.706 3.044 3000 2.7238 0.955 2.722 3.0929 Table 5.1: Effect of calibration set size on the validation score and risk measures. The size of the test set, i.e., |Dtest|, is 1000. We use the conformal prediction procedure from Section 5.5 to obtain δ as defined in Definition 5.3.1 for ϵ = 0.05. 5.6 Case Studies We now demonstrate the practicality of stochastic conformance and risk analysis through various case studies. For validation, if we obtain the value R¯ using a conformal prediction procedure for a nonconformity score defined by the random variable R, i.e., such that Prob(R ≤ R¯) ≥ 1 − ϵ. Then, given a test set Dtest, the validation score is defined as V S(R) := |{k ∈ Dtest | k ≤ R¯}|/|Dtest|. 86 |Dcal| z1 δ V S(ρ1) V S(d∞) z2 Thm 5.4.2 CV aR Thm.5.4.4 valid? −d∞ −ρ1 −ρ2 valid? ϵ = 0.2, 100 0.31 0.59 0.95 0.76 0.21 Y 0.90 -0.28 0.00 Y ϵ¯ = 0.05 3K 0.30 0.60 0.95 0.79 0.20 Y 0.93 -0.27 0.03 Y ϵ = 0.1, 1K 0.30 0.67 0.96 0.92 0.15 Y 0.79 -0.27 0.02 Y ϵ¯ = 0.05 3K 0.30 0.66 0.95 0.91 0.15 Y 0.81 -0.27 0.03 Y ϵ = 0.05, 2K 0.31 0.71 0.94 0.95 0.11 Y 0.78 -0.27 0.02 Y ϵ¯ = 0.05 3K 0.30 0.71 0.95 0.95 0.11 Y 0.79 -0.27 0.03 Y Table 5.2: Empirical evaluation of transference. Let ρi be short-hand for ρ(φdubin, Yi) for i = 1, 2, and d∞ be short-hand for d∞(Y1, Y2). Using Theorem 5.5.1, we show Prob(ρ1 ≥ z1) > 1 − ϵ¯, and Prob(d∞ ≤ δ) > 1 − ϵ. The validity scores for each guarantee on a test set Dtest with 1000 samples are shown. The value z2 is obtained using Theorem 5.5.1 on ρ2 and observe that it exceeds z − δ, validating Theorem 5.4.2. Similarly, we report the CV aR values for −ρ1 and d∞, and CV aR(−ρ1) + CV aR(d∞) ≥ CV aR(−ρ2) for all cases, validating Theorem 5.4.4. Case Study Spec |Dcal| |Dtest| V S(ρ1) V S(d∞) δ V aR(d∞) F-16 ϕgcas 1K 3K 0.95 0.98 200 200 CARLA ϕcte 700 300 0.94 0.96 1.88 1.87 Satellite ϕsat 7K 3K 0.96 0.97 0.18 0.18 Table 5.3: Transference results for various case studies. We use ϵ = 0.05 and ϵ¯ = 0.05. As before, ρ1 is used as short-hand for ρ(ϕ, Y1) for each spec, and d∞ is used as short-hand for d∞(Y1, Y2). 5.6.1 Dubin’s car. Dubin’s car models the motion of a point mass vehicle. The state variables are the x and y position, θ denotes the steering angle and v the longitudinal velocity. While both θ and v are typically assumed to be control inputs, we adapt the case study from [256] where the angular velocity ω(t) at each time t is assumed to be given so that θ(t) := Tsπ+ Pt i=1 ω(i)Ts where Ts := 0.1s. In this example, we assume that ω(i) := π 50Ts for i ∈ [1, 25], and ω(i) := − π 50Ts for i ∈ [26, 50]. The velocity v(t) is provided by a feedback 87 Case Study Spec |Dcal| |Dtest| ϵ CV aR d∞ −ρ1 −ρ2 F-16 ϕgcas 1K 3K 0.01 200.3 -62.3 -62.3 CARLA ϕquad 7K 3K 0.01 2.04 -0.31 0.88 Satellite ϕsat 7K 3K 0.01 0.19 0.0 0.08 Table 5.4: Empirical validation of risk transference for all case studies. As before, ρi is short-hand for ρ(ϕ, Yi), and d∞ is short-hand for d∞(Y1, Y2). Here, we set the risk level β = ϵ in each case. controller. The dynamics are assumed to have additive white Gaussian noise η x (t), ηy (t) ∼ N (0, 0.005). The dynamical equations of motion are as described below: x(t+1) = x(t)+Tsv(t) cos(θ(t)) + η x (t) y(t+1) = y(t)+Tsv(t) sin(θ(t)) + η y (t) The two systems that we compare have two different feedback controllers. The first feedback controller uses the method from [162, 256] and the second controller uses the method from [257]. We plot a set of sampled trajectories in Fig. 5.1a. This figure also shows the set of initial states I := [−1, 0] × [−1, 0]. The controller aims to ensure that the system trajectory stays within a series of sets T1 through T50, the corresponding STL specification is ϕdubin := V50 i=1 F[i−1,i] ([xi yi ] ∈ Ti). For the experiments that follow, we uniformly sampled initial states from I and noise η x , ηy from the described Gaussian distribution. Effect of calibration set size. In the first experiment, we wish to benchmark the effect of the size of the calibration set Dcal for various distance metrics. The results are shown in Table 5.1. The table shows that with smaller sizes of the calibration set, we get a more conservative δ for d∞ (which translates into a higher validation score). The V aR is almost identical to the value of δ at larger Dcal sizes. We note that the CV aR values change with the value of V aR. A similar trend can be observed the Skorokhod distance and the L2-metric. 88 Empirical evaluation of transference. We empirically demonstrate that Theorem 5.4.2 holds. We use Q(Y ) = ρ(ϕdubin, Y ), i.e., the robust semantics w.r.t. the property ϕdubin, and the L∞ signal metric d∞. The results are shown in Table 5.2. We can see that the predicted upper bound for the robustness of realizations of Y2 w.r.t. ϕdubin is negative (z1 − δ), so it is not possible to conclude that the second system satisfies ϕdubin with probability greater than 1−ϵ−ϵ¯. However, we note that z2 is indeed greater than the bound (z1 −δ). Similarly, we show that Theorem 5.4.4 is also empirically validated by computing the CV aR values for the first system and the risk measure on d∞(Y1, Y2). We show the empirical distributions of d∞(Y1, Y2), and ρ(ϕdubin, Yi) for i = 1, 2 in Figure 5.2. Empirical evaluation of Theorem 5.4.3. We next apply Algorithms 4 and 5 to this case study. We grid the initial set of states evenly into 25 cells with a grid size of κ = 0.02. We sample 650 trajectories on each cell to obtain their calibration sets. Algorithm 5 gives R¯ = 0.7562 and Lκ = 0.0687, giving R¯ +Lκ = 0.8249. We then evaluate on two test sets of unseen initial conditions with |Dtest| = 1000, 2500. The success rate on the test sets are 0.9996 and 1.0, with the goal success rate being 0.9. The experiments demonstrate the effectiveness of Theorem 5.5.2. 5.6.2 F-16 aircraft. The F-16 aicraft control system from [124] uses a 13-dimensional non-linear plant model based on a 6 d.o.f. airplane model, and its dynamics describe force equations, moments, kinematics, and engine behavior. We alter the original system S1 from [124] to a modified version S2 by changing the controller gains. We evaluate the performance of the two systems on the ground collision avoidance scenario with the specification ϕgcas := G[0,T] (halt ≥ 1000) where T is the mission time and halt is the altitude. For data collection, we perform uniform sampling of the initial states. We assume that the x-center of gravity (xcg) of the aircraft is a stochastic parameter with uniform distribution on [0, 0.8]. We obtain a calibration set 89 Dcal of size 1000 by uniform sampling of the initial states and the xcg parameter. We separately sample 3000 signals for Dtest. The results of transference and risk estimates are shown in Table 5.3. 5.6.3 Autonomous Driving using the CARLA simulator. CARLA is a high-fidelity simulator for testing of autonomous driving systems [92]. We consider two learning-based lane-keeping controllers from [166], one being an imitation learning controller (S1) and another being a learned barrier function controller (S2). We obtain 1000 trajectories from each controller during a 180 degree left turn, and we use 700 of them for calibration and 300 for testing. In this data, the initial states (ce, θe) are drawn uniformly from [−1, 1] × [−0.4, 0.4] where ce is the deviation from the center of the lane center (cross track error) and θe is the orientation error. The STL specification ϕcte := G(|ce| ≤ 2.25) restricts |ce| to be bounded by 2.25. The results are shown in Table 5.3. 5.6.4 Spacecraft Rendezvous Next, we consider a spacecraft rendezvous problem from [255]. Here, a deputy spacecraft is to rendezvous with a master spacecraft while staying within a line-of-sight cone. The system is a 4D model s = [x, y, vx, vy] ⊤ where x, y ∈ R are the relative horizontal and vertical distances between the two spacecrafts and vx, vy ∈ R are the relative vertical and horizontal velocities. There are two different feedback controllers, using the same control algorithms we used in Dubin’s car example (i.e., the controllers from [162, 256] and [257]). The STL specification is a reach-avoid specification (visually depicted in Fig. 5.1d), which requires the system to always stay in the yellow region and eventually reach the target rectangle T shown: ϕsat := G[1,5] (y, |y|, |vx|, |vy| ≤ −|x|, ymax, vx,max, vy,max) ∧ F[1,5]s ∈ T . The set of initial states is I = [−0.1, 0.1] × [−0.1, 0.1]. The system is assumed to have additive Gaussian process noise with zero mean and a diagonal covariance matrix with variances 10−4 , 10−4 , 5×10−8 , 5×10−8 . We 90 uniformly sample 100 different initial states from I and 100 noise values sampled from the noise distribution. We divide the dataset into Dcal and Dtest with sizes 7K and 3K respectively. The results are shown in Table 5.3. Discussion on results for Transference across case studies. We omit the column for Table 5.3 that shows the proportion of Dtest of the realizations of Y2 for which the bound z1 − δ exceeds z2, where the zi ’s are the conformal bounds on ρi ’s. For all case studies this ratio was either 1.0 or close to 1.0, establishing the empirical validity of Theorem 5.4.2. We also observe that above results show that it is feasible to use stochastic conformance in a control improvization loop, where we want to change a system controller (perhaps for optimizing a performance objective) while allowing only some degradation on probabilistic safety guarantees. 5.7 Related Work Conformance has found applications in cyber-physical system design [139, 21] as well as in drug testing and other applications [86, 48, 86]. Our work is inspired by existing works for conformance of deterministic systems by which we mean that systems are non-stochastic, see [215, 147] for surveys. The authors in [3, 5, 7] considered conformance testing between hybrid system. To capture distance between hybrid system trajectories that may exhibit discontinuities, signal metrics were considered that simultaneously quantify distance in space and time, resembling notions of system closeness in the hybrid systems literature [112, 113]. For instance, [3] proposes (T, J,(τ, ϵ))-closeness where τ and ϵ capture both timing distortions and state value mismatches, respectively, and where T and J quantify limits on the total time and number of discontinuities, respectively. A stronger notion compared to (T, J,(τ, ϵ))-closeness was proposed in [83] by using the Skoroghod metric. The benefit of [83] over the other notion is that it preserves the timing structure. All these works derive transference results with respect to timed linear temporal logic or metric interval temporal logic specifications. 91 Conformance of stochastic systems has been less studied. The authors in [155] propose precision and recall conformance measures based on the notion of entropy of stochastic automata. The authors in [156] use the Wasserstein distance to quantify distance between two stochastic systems, which is fundamentally different from our approach. (Bi)simulation relations for stochastic systems were studied in [143, 141, 120]. Such techniques can define behavioral relations for systems [46, 142], and they can be used to transfer verification results between systems [279]. The authors in [80] utilize such behavioral relations to verify RL policies between a concrete and an abstract system. We remark that bisimulations are difficult to compute, see e.g., [110], unlike our approach. Probably closest to our work is [263]. However, in this chapter conformance is task specific which allows two systems to be conformant w.r.t. a system specification even when the systems produce completely different trajectories. Additionally, we consider a worst-case notion of conformance where no information about the input that excites both stochastic systems is available. 5.8 Conclusion We studied conformance of stochastic dynamical systems. Particularly, we defined conformance between two stochastic systems as probabilistic bounds over the distribution of distances between model trajectories. Additionally, we proposed the non-conformance risk to reason about the risk of stochastic systems not being conformant. We showed that both notions have the transference property, meaning that conformant systems satisfy similar system specifications. Lastly, we showed how stochastic conformance and the non-conformance risk can be estimated from data using statistical tools such as conformal prediction. 92 Chapter 6 Robust Testing for Cyber-Physical Systems using Reinforcement Learning 6.1 Introduction Autonomous and semi-autonomous cyber-physical systems (CPSs) such as vehicles with advanced driver assist systems (ADAS), unmanned aerial vehicles (UAVs), and medical devices use sophisticated control and planning algorithms to safely accomplish their mission objectives. However, in order to enable autonomous operation in uncertain and previously unseen environments, such CPSs increasingly use learning-enabled components (LECs) for perception and decision-making. There have been many approaches for open-loop testing of LECs (See [129, 206, 88] for excellent surveys on this topic). Of greater relevance to this chapter is work on closed-loop testing of learning-enabled CPSs. The closed-loop testing problem seeks to identify environment scenarios under which the CPS behaves in an undesired fashion. Most techniques for closed-loop testing (including the one presented in this chapter) are search-based methods; they can be divided into two classes based on the LEC that is being tested: (1) techniques to test perception components [94, 93, 247], and (2) those to test decision-making/control logic [149, 75, 76, 246]. Irrespective of the LEC being tested, a key challenge for closed-loop testing is appropriately scoping the search problem. If the environment model to generate test scenarios is allowed to be too liberal, falsifying 93 safety conditions of the CPS becomes a trivial exercise. For example, consider the ADAS subsystem of adaptive cruise control (ACC); here, the system-under-test (SUT) or the ego car attempts to maintain a safe following distance from a lead vehicle. If the lead vehicle is allowed to travel backwards on a highway, then it is impossible to design safe ACC logic, and finding SUT violations of safety is trivial. Furthermore, it is important to have an unambiguous mathematical description of the desired behavior of the SUT. In this Chapter, we address both challenges through the use of the logic-based specification language of Signal Temporal Logic (STL) [179] to express both constraints on the environment as well as safety specifications for the SUT. At a high level, our approach is similar to input-constrained falsification of STL properties [101, 35]. Falsification of STL properties is a well-studied area with many approaches (see [85, 39] for surveys). More recent work on falsification has focused on the use of deep reinforcement learning (RL) for falsifying STL formulas [11][281, 280]. Related work on adaptive stress testing [76] also uses deep RL, but the authors incorporate environment constraints and undesirable behavior by manually encoding them in the reward function used by the deep RL algorithm. However, none of the previous approaches have considered the robust closed-loop testing problem. In closed-loop testing, the emphasis is on identifying not only the environment scenarios that cause the SUT to violate its safety specifications, but also to discover a test policy that can react to changes in the SUT. For example, we would like a test generation algorithm to be produce vulnerable scenarios even when there is a change to the initial conditions of the SUT’s state variables, changes to the SUT’s system dynamics, or minor changes to the LECs. Such an approach is particularly useful in industrial development techniques that rely on the paradigm of continuous integration and pre-merge tests. Here, changes to the software of the CPS should be small, and each should pass a suite of pre-merge tests before they are allowed to be merged into the main development branch. More complex tests and analytics may be run nightly or on longer timelines, but pre-merge tests are meant to be lightweight, and need to be able to run quickly 94 to avoid hindering developer productivity. Running a full falsification procedure at pre-merge time is not feasible, and pre-recorded falsification traces are not robust to changes in the SUT. In this chapter, we show that our specific use of STL-based environment constraints and SUT specifications allows us to train adversarial policies that are robust test generators. In other words, the policy learned by our deep RL algorithm transfers to the modified model under certain conditions that characterize the degree of model change. Thus, our procedure has the potential to be invaluable in an incremental design framework where restarting closed-loop testing from scratch after every modification to the LEC may be expensive. Furthermore, the value function induced by the RL policy allows quantifying regions of the state space that are more sensitive to counterexamples, allowing designers to focus on those simulation-based scenarios that are likely to transfer to real-world settings. In summary, our main contributions are: 1. We propose a deep reinforcement learning based framework where various sources of uncertainties in the environments are modeled as (one or more) agents that behave according to a reactive policy that we train through simulations. 2. We restrict the agents to respect dynamic constraints (expressed in STL) while causing an ego agent to violate its specification (also expressed in STL). 3. We formulate an automatic reward shaping mechanism that guarantees that the joint behavior of the environment agents and the SUT is such that: the environment constraints are satisfied, while the SUT violates its specifications. 4. We identify assumptions under which the learned adversarial policies are robust. In particular we show that if the learned adversarial policy demonstrates a violation of the SUT specification, then this policy will transfer to agents that (1) start from nearby initial configurations, and (2) have different dynamics than the original ego agent. 95 5. We demonstrate the efficacy of our approach on three case-studies from the autonomous driving domain. We show that aspects such as other cars, traffic lights, pedestrians, etc. can be modeled as adversarial agents. We consider (1) an adaptive cruise control example where the leading car is modeled as an adversarial agent, (2) a controller that ensures safety during a lane merge scenario, and (3) a controller that ensures safety during a yellow light scenario. The rest of the chapter is organized as follows. In Section 6.2 we provide the background and problem definition. We define rewards to be used by our RL-based testing procedure in Sec. 8.3. We show how the adversarial agents generalize in Section 6.4, and provide detailed evaluation of our technique in Sec. 6.5. Finally, we conclude with a discussion on related work in Section 6.6. (a) Case Study I: Driving in lane with lead vehicle. (b) Case Study II: Left vehicle merges in front. (c) Case Study III: Yellow light running. Figure 6.1: Simulation environments for case studies in the CARLA simulator[91]. 6.2 Problem Statement and Background We first introduce the formal description of a multi-agent system as a collection of deterministic dynamical agents. Definition 6.2.1 (Deterministic Dynamical Agents). An agent H is a tuple (X, A, T, Xinit, π), where X is a set of agent states, A is the set of agent actions, T is a set of transitions of the form (x, a, x ′ ), where a ∈ A, 96 Xinit ⊆ X is a set of designated initial states for the agent, and finally the policy∗ π is a function mapping a state in X to an action in A. A multi-agent system S = {ego, ado1, . . . , adok} is a set of agents, with a designated ego agent ego, and a non-empty set of adversarial agents ado1, . . . , adok. The state-space of the multi-agent system can be constructed as a product space of the individual agent state spaces, and the set of transitions of the multi-agent system corresponds to the synchronous product of the transitions of individual agents. The transitions of the multi-agent system when projected to individual agents are consistent with individual agent behaviors. A behavior trajectory for an agent is thus a finite or infinite sequence (t0, x0),(t1, xi), . . ., where xi ∈ X and ti ∈ R ≥0 . We use ξ to denote a trajectory variable, i.e. a function mapping ti to xi , i.e. ξ(ti) = xi . In many frameworks used for simulating multi-agent systems, it is common to consider timed trajectories with a finite time horizon tN , and a fixed, discrete time step, ∆ = ti+1 − ti , ∀i. Problem Definition: Testing with Dynamically Constrained Adversaries. Given a behavior of the multi-agent system, let the projection of the behavior onto agent H be denoted by the signal variable sH. Formally, the problem we wish to solve can be stated as follows: 1. Given a spec ψego on the ego agent, 2. Given a set of constraints φadoi on adversarial agent Hi , 3. Find a multi-agent system policy that can generate behaviors such that: ∀i : sadoi |= φadoi ∧ sego ̸|= ψego. In other words, we aspire to generate a compact representation for a possibly infinite number of counterexamples to the correct operation of the ego agent. ∗Our framework can alternatively include stochastic dynamical agents, where T is defined as a distribution over(X×A×X), and the control policy π is a stochastic policy representing a distribution over actions conditioned on the current state of the agent, i.e. π(a | x). Also, states X and actions A can be finite sets, or can be dense, continuous sets. 97 6.2.1 Adversarial Testing through Policy Synthesis In contrast to falsification approaches, we assume a deterministic (or stochastic) dynamical agent model for the adversarial agents (as defined in Def. 6.2.1), i.e. the i th adversarial agent is specified as a tuple of the form (Xi , Ai , Ti , Xiniti , πi). We assume that initially all agents have a randomly chosen policy πi . For adversarial agent adoi , let Πi = A Xi i denote the set of all possible policies. Let Πi(φadoi) be the set of policies such that for any π ∈ Πi(φadoi), using π guarantees that the sequence of states of agent adoi satisfies φadoi . Similarly, let Πi(¬ψego) be the set of policies that guarantees that the sequence of states for the ego agent ego does not satisfy ψego. The problem we wish to solve is: for each i, find a policy in Πi(¬ψego) ∩ Πi(φadoi). One approach to solve this problem is to use a reactive synthesis approach, when specifications are provided in a logic such as LTL or ATL [50, 87, 172]. There is limited work on reactive synthesis with STL objectives [208, 109], mainly requiring encoding STL constraints as Mixed-Integer Linear Programs; this may suffer from scalability in multi-agent settings. We defer detailed comparison with reactive synthesis approaches to future work. In this chapter, we propose using the framework of deep reinforcement learning (RL) for controller synthesis with a procedure for automatically inferring rewards from specifications and constraints. 6.2.2 Policy synthesis through Reinforcement Learning Reinforcement learning (RL) [239] and related deep reinforcement learning (DRL) [184] are procedures to train agent policies in deterministic or stochastic environments. In our setting, given a multiagent system S = {ego, ado1, . . . , adok}, we wish to synthesize a policy πk for each adversarial agent adok. We can model multiple adversarial agents as a single agent whose state is an element of the Cartesian product of the state spaces of all agents, i.e., X = Xego × Xado1 × · · · × Xadok , and the action of this single agent is a tuple of actions of all adversarial agents, i.e. A = Aado1 × . . . × Aadok . 98 In each step, we assume that the agent is in state x ∈ X and interacts with the environment by taking action a ∈ A. Then, the environment (i.e., the transition relation of the adversarial and the ego agents) picks a next state x ′ s.t. (x, a, x ′ ) ∈ T, and a reward reward(x, a). The reward provides reinforcement for the constrained adversarial behavior, and will be elaborated in Section 8.3. The goal of the RL agent is to learn a deterministic policy π(x), such that the long term payoff of the agent from the initial state (i.e. the discounted sum of all rewards from that state) is maximized. As is common in RL, we define the notion of a value function V in Eq. (6.1); this is the expected reward over all possible actions that may be taken by the agent. In a deterministic environment, the expectation disappears. Vπ(xt) = Eπ " X∞ k=0 γ k reward((xt+k), at+k) at+k = π(xt+k) # (6.1) RL algorithms use different strategies to find an optimal policy π∗, that for all x is defined as π ∗ (x) = arg maxπ Vπ(x). We assume that the state of the agent at time t0, i.e. x0 is in Xinit. We now briefly review a classic model-free RL algorithm called Q-learning and summarize two deep RL algorithms: Deep Q-learning and Proximal Policy Optimization (PPO). In Q-learning, we learn a stateaction value function q(x, a), which represents the believed value of taking action a when in state x. Note that V (x) = maxa q(x, a). In Q-learning, the agent maintains a table whose rows correspond to the states of the system and columns correspond to the actions. The entry q(x, a), encodes an approximation to the state-action value function computed by the algorithm. The table is initialized randomly. At each time step t, the agent uses the table to select an action at based on an ε-greedy policy, i.e. it chooses a random action with probability ε, and with probability 1 − ε, chooses argmaxa∈Aq(x, a). Next, at time step t + 1, the agent observes the reward received Rt+1 as well as the new state xt+1, and it uses this information to update its beliefs about its previous behavior. During the training process, we sample the initial state of each episode with a probability distribution µ(x) that is nonzero at all states. After a sufficient number of 99 iterations, all states will eventually be selected as the initial state. We also fix the policies of the adversarial agents to be ε-soft. This means that for each state x and every action a, π(a|x) ≥ ε, where ε > 0 is a parameter. Random sampling of initial states and ε-soft policies ensure that the agent explores and avoids converging prematurely to local optima. We assume that during the training process the agent policies are stochastic, epsilon-soft policies, i.e. the policy represents a distribution over a set of actions conditioned on the current state. However, at the end of training, we interpret the policy as deterministic by picking the most probable action for each state. Deep RL. Deep RL is a family of algorithms that make use of Deep Neural Networks (DNNs) to represent either the value or the policy of an agent. Deep Q-learning [184], is an extension of Q-learning where the table q(s, a) is approximated by a DNN, q(s, a, w), where w are the network parameters. Deep Qlearning observes states and selects actions similarly to Q-learning, but it additionally uses experience replay, in which the agent stores previously observed tuples of states, actions, next states, and rewards. At each time step, the agent updates its q-function with the currently observed experience as well as with a batch of experiences sampled randomly from the experience replay buffer† . The agent then updates its approximation network by gradient descent on the quadratic loss function L = (yt − q(xt , at , w))2 , where yt = Rt+1 + γ maxa ′ q(xt+1, a, w). In the case that xt is a terminal state, it is common to assume that all transitions are such that xt+1 = xt and Rt = 0. Although these learning algorithms learn the state-action value function q(x, a), in the theoretical exposition that follows, we will use the state value function V (x) for simplicity. The optimal state value function can be obtained from the optimal stateaction value function by V ⋆ (x) = maxa q ⋆ (x, a). PPO [228] is a state-of-the-art policy gradient algorithm that performs gradient-based updates on the policy space directly while ensuring that the new policy is not too far from the old policy. †Tabular Q-learning is guaranteed to converge to the optimal value function [239]. On the other hand, DQN may not converge, but it will eventually find a counterexample trace if it exists. In practice, DQN performs well and finds effective value functions, even if its convergence cannot be theoretically guaranteed. 100 6.3 Learning Constrained Adversarial Agents Next, we describe how we construct a reward function that enables training constrained adversarial agents. The reward function needs to encode two aspects: (1) satisfying adversarial constraints, (2) violating ego specification. We assume that adversarial constraints are hierarchically ordered with priorities, inspired by the Responsibility-Sensitive Safety rules for traffic scenarios in [65]. Definition 6.3.1 (Constrained adversarial reward). Suppose from initial state x0 the agent has produced a behavior trace ξ(x0) of duration T. We distinguish two cases. • Case 1: All adversarial constraints are strictly satisfied, i.e. ρ(ϕ, ξ(x0)) > 0 for each constraint ϕ ∈ φado. In this case, the reward will be the robustness of the adversarial specification. rewardt = 0 if t < T ρ(¬ψego, ξ(x0)) if t = T (6.2) • Case 2: Not all adversarial constraints are strictly satisfied. Let ϕ be the highest priority rule that is not strictly satisfied, i.e. the highest priority rule such that ρ(ϕ, ξ(x0)) ≤ 0. Then, every constraint with priority higher than ϕ will contribute zero, whereas every constraint with priority less than or equal to ϕ will contribute ρmin. Let M be the number of constraints with priority less than or equal to ϕ. rewardt = 0 if t < T −M ρmin if t = T (6.3) The following lemma shows that it is not possible for an adversarial agent to attain a high reward for satisfying lower priority constraints at the expense of higher priority constraints. The proof is straightforward (shown in the appendix). 101 Lemma 6.3.2 (Soundness of the reward function). Consider two traces, ξ1 and ξ2. Suppose that the highest priority constraint violated by ξ1 is ϕ1 and the highest priority constraint violated by ξ2 is ϕ2. Suppose ϕ1 has lower priority than ϕ2. Then, the reward for trajectory 1 will be higher than the reward for trajectory 2, i.e. reward(ξ1) > reward(ξ2). Proof. Let n1 be the number of rules with priority less than or equal to ϕ1. Similarly, let n2 the number of rules with priority less than or equal to ϕ2. Then, reward(ξ1) = −n1ρmin and reward(ξ2) = −n2ρmin. Since n2 > n1 by the assumptions of the lemma, the result follows. Pipeline. Our training pipeline is illustrated in Figure 6.2. Given a scenario composed of interacting agents, (E, Je),(H1, J1), . . . ,(Hn, Jn), the goal is to learn transition distributions D1, . . . , Dn and policies π1, . . . , πn for the adversarial agents such that each adversary satisfies its rules Jk, but the ego is not able to satisfy its rules Je. The ego agent interacts with several adversarial agents as part of a simulation. The adversarial agents are able to observe the state of the ego as well as of the other adversarial agents, and they may update their policies. Figure 6.2: The ego agent E is embedded in a simulation with a collection of adversarial agents Hθ i , which learn (possibly from a bank of past experience) to stress-test the ego by a particular reward function as derived from the constraints for the adversary and the ego specification. 102 6.4 Rationale for Robust Testing In this section we show how our RL-based testing approach makes the testing procedure itself robust by learning a closed-loop policy for testing. We first introduce some definitions that help us state the lemmas and theorems about robust testing. Definition 6.4.1 (Definition 2 of [89]). Let fk : X → R be any computable function that appears in an STL formula φ. We call the vector of variables in ξ the primary signals of φ, and their images by f secondary signals, {yk}. Next, we formalize the notion of distance between signals, and Lemma 6.4.3 then tells us that if two signals have “nearby values” and also generate nearby secondary signals, then if one of them robustly satisfies an STL formula, the other will also satisfy that formula. Definition 6.4.2 (Distance between signals). Given two signals ξ and ξ ′ with identical value domains X and identical time domains T, and a metric dS on X , the distance between ξ and ξ ′ , denoted as ∥ξ − ξ ′∥ is defined as: supt∈T dS(ξ(t), ξ′ (t)). Now we are ready to present our theorems about robust testing. First, we show generalizability across initial conditions in two steps: (1) In Theorem 6.4.4, we assume that the RL algorithm has converged to the optimal value function, and that it has used this value function to find a counterexample trace ξ(x0). We consider a new state x ′ 0 , and want to bound the degradation of the robustness function of the specification ρ(¬ψego, ξ(x ′ 0 )). (2) In Theorem 6.4.5, we relax this strong assumption and identify conditions under which generalization can be guaranteed even with approximate convergence. Lemma 6.4.3 (Theorem 1 in [89]). If ρ(φ, ξ, t) = δval, then for every signal ξ ′ s.t. every secondary signal satisfies ∥yk − y ′ k ∥ < δval, the following is true: (ξ |= φ) =⇒ (ξ ′ |= φ). Theorem 6.4.4. Suppose that the adversarial agent has converged to the optimal value function V ⋆ (x), and that it has found a trace ξ(x0) that falsifies the target specification ψego with robustness ρ(¬ψego, ξ(x0)) = 103 ρfalsif y > 0 while satisfying all of the adversarial constraints. Given a new state x ′ 0 such that |V ⋆ (x0) − V ⋆ (x ′ 0 )| < δval with δval < γT |ρmin|, the adversary will be able to find a new trajectory ξ(x ′ 0 ) that satisfies all of the constraints. Furthermore, the robustness of the specification ¬ψego over the new trace will be at least ρfalsif y − δval/γT . Proof. We can expand the optimal value function at state x0 as V ⋆ (x0) = PT t=0 γ t rewardt , where rewardt is the reward function defined in Definition 6.3.1. Then, the optimal value function at x0 is V ⋆ (x0) = γ T ρfalsif y. Suppose for a contradiction that the new trajectory ξ(x ′ 0 ) violates some number M of constraints. Then, the following equations show that the two states must actually differ by a large amount, much larger than δval, leading to a contradiction. From Definition 6.3.1 we have V ⋆ (x ′ 0 ) = −γ TM ρmin, (6.4) V ⋆ (x ′ 0 ) − V ⋆ (x0) ≥ γ T ρfalsif y + M γT ρmin > δval, (6.5) which contradicts the assumption of the theorem. As the constraints will be satisfied by ξ(x ′ 0 ), their contribution to the value function at x ′ 0 will be zero. Then, for the 2 nd part of the theorem we can expand the optimal value function as: V ⋆ (x0)−V ⋆ (x ′ 0 ) = γ T ρ(¬ψego, ξ(x0))−γ T ρ(¬ψego, ξ(x ′ 0 )) (6.6) By assumption the above terms are ≤ δval, which gives us that ρ(¬ψego, ξ(x0)) − ρ(¬ψego, ξ(x ′ 0 ))) ≤ δval γ T (6.7) and the theorem follows. 104 Note that if δval is chosen as a small enough perturbation such that ρfalsif y − δval/γT > 0, then the new trace is also a trace in which the adversary causes the ego to falsify its specification. The tabular Q-learning algorithm converges asymptotically to the optimal value function, meaning that for any δ, there exists a k such that at the k-th iteration, the estimate Vk differs from the optimal value function by at most α, i.e. ∀x ∈ X, |V ⋆ (x) − Vk(x)| < α. Some RL algorithms have even stronger guarantees. For example, Theorem 2.3 of [99], states that running the value iteration algorithm until iterates of the value function differ by at most α(1−γ) 2γ produces a value function that converges within α of the value function |Vk(x0) − V ⋆ (x0)| ≤ α. While useful for theoretical results, value iteration does not scale to problems with large state spaces. The following theorem states that if the RL algorithm has found a value function that is near optimal, the agent will be able to generalize counterexamples across different initial states. Finally, in Theorem 6.4.7, we show generalizability when the ego agent dynamics change. The main idea is that if the new dynamics have an δ-approximate bisimulation relation to the original dynamics, then we can guarantee generalizability. Theorem 6.4.5. Suppose that we have truncated an RL algorithm at iteration k. Suppose that, from the guarantees of this particular RL algorithm, we are within α of the optimal value function, i.e. for every x, |Vk(x0) − V ⋆ (x0)| ≤ α. Further suppose that the adversarial agent has found a falsifying trace from state x0 with robustness ρfalsif y, i.e. ρ(¬ψego, ξ(x0)) = ρfalsif y. Now consider a new state x ′ 0 such that the degradation of our approximate value function is at most β, i.e. |Vk(x0) − Vk(x ′ 0 )| ≤ β. Then, the adversarial agent will be able to produce a new trace with robustness at least ρ(¬ψego, ξ(x ′ 0 ), T) ≥ ρfalsif y − 2α + β γ T (6.8) 105 Proof. Note that by the triangle inequality α + β > |Vk(x0) − V ⋆ (x0)| + V ⋆ (x ′ 0 ) − Vk(x0) (6.9) > Vk(x ′ 0 ) − V ⋆ (x0) (6.10) Further, 2α + β > Vk(x ′ 0 ) − V ⋆ (x0) > V ⋆ (x ′ 0 ) − V ⋆ (x0) (6.11) The result follows from Theorem 6.4.4 by substituting δval = 2α + β. Finally, we will show that the agent may cope with limited changes to the multiagent system. This is useful in a testing and development situation, because we would like to be able to reuse a pre-trained adversarial agent to stress-test small modifications of the ego without expensive retraining. To do this, we will define an δ-bisimulation relation that will allow us to formally characterize the notion of similarity between different multiagent systems. Definition 6.4.6 (δ-approximate bisimulation, [111]). Let δ > 0, and S1, S2 be systems with state spaces X1, X2 and transition relations T1, T2, respectively. A relation Rδ ⊆ X1 × X2 is called an δ-approximate bisimulation relation between T1 and T2 if for all x1, x2 ∈ Rδ, 1. d(x1, x2) ≤ δ where d is a distance metric 2. ∀a ∈ A, ∀x ′ 1 ∈ T1(x1, a), ∃x ′ 2 ∈ T2(x2, a) such that (x ′ 1 , x ′ 2 ) ∈ Rδ 3. ∀a ∈ A, ∀x ′ 2 ∈ T2(x2, a), ∃x ′ 1 ∈ T1(x1, a) such that (x ′ 1 , x ′ 2 ) ∈ Rδ Theorem 6.4.7. Suppose the adversarial agent has trained to convergence as part of a multiagent system S1, and it has found a trace that satisfies the adversarial specification with robustness ρfalsif y. Consider a new multi-agent system S2 and suppose there exists an δ-approximate bisimulation relation between the two 106 (a) Case Study I: Driving in lane with lead vehicle. (b) Case Study II: Left vehicle merges in front. (c) Case Study III: Yellow light. Figure 6.3: Simulation environments for case studies systems, including the secondary signals of the formula ¬ψego. Further suppose that δ < ρfalsif y Then, the trajectory of the new system will also violate the ego specification while respecting the adversarial constraints. Proof. If there is an δ-approximate simulation between the primary and secondary signals of the traces of the two systems, then for a trace ξ1(x0) of system S1 starting from initial state x0, and a trace ξ2(x) of system S2 also starting from initial state x0, the δ-approximate bisimulation relation ensures that both the primary and secondary signals differ by at most epsilon, i.e. (|ξ1(x0) − ξ2(x0)| ≤ δ) ∧ (|y1 − y2| ≤ δ). From Lemma 6.4.3, ξ2 also causes the ego to falsify its specification while satisfying the adversarial specifications. 6.5 Case Studies In this section we first empirically demonstrate the robustness of our adversarial testing procedure. Then, we demonstrate scalability of adversarial testing by applying it to three case studies from the autonomous driving domain. 6.5.1 Benchmarking Generalizability We introduce a grid world example environment which consists of an n × n grid containing the ego agent and the adversarial agent. The objective of the ego agent is to escape adversarial agents, assuming that the 107 game begins with the ego and the adversarial agent at (cego, cado). The ego agent can move k cells in any time step. The ego specification and adversarial constraints are as specified in Section 8.1. The ego policy is hand-crafted: it observes the position of the adversarial agents and selects the direction (up, down, left, or right) that maximizes its distance from the adversarial agent. If the target cell lies outside the map, it chooses a fixed direction to move away. It should not be trivial for the adversarial agent to capture the ego agent; thus, for every experiment, we provide a baseline comparison with an adversarial agent that has a randomly chosen policy. A random policy has some likelihood of succeeding from a given initial configuration of the ego and adversarial agents. Thus, the ratio of the number of initial conditions from which the random adversarial agent succeeds to the total number of initial conditions being tested quantifies the degree of difficulty for the experiment. We denote the random adversarial agent as ado[rand]. In all the experiments in this section, we train the adversarial agent using our RL-based procedure on a training arena characterized by the vector λtrain = (ntrain, Cpos, ktrain), i.e. a fixed grid size (ntrain×ntrain), a set of initial positions (cego, cado) ∈ Cpos, and (3) a fixed step size for the ego agent movement (ktrain). We use Proximal Policy Optimization (PPO)-based deep-RL algorithm to train the adversarial agent[228]. We denote this trained agent as ado[λtrain] for brevity. We frame the empirical validation of our robust adversarial testing in terms of the following research questions: RQ1. How does the performance of ado[λtrain] compare against 10 uniformly sampled ado[rand] agents on the same set of initial positions used to train ado[λtrain] when all other arena parameters remain the same? [To demonstrate degree of difficulty.] RQ2. How does the performance of ado[λtrain] compare against ado[rand] in an arena of varying map sizes when all other parameters remain the same? RQ3. How does the performance of ado[λtrain] compare against ado[rand] in an arena with varying ego agent step-sizes where all other parameters remain the same? 108 Experiment Parameter Success Initial Condition Rate (%) ado[λtrain] ado[rand] Num. ado[rand] RQ1 10 67.92 (163/240) 9.83 (237/2400) 20 67.92 (163/240) 10.23 (491/4800) RQ2 Map Size 2 × 2 66.67 (8/12) 33.33 (4/12) 4 × 4 67.92 (163/240) 2.91(7/240) 5 × 5 70.33 (422/600) 1.0 (6/600) 10 × 10 9.26 (917/9900) 0.33 (33/9900) RQ3 Ego Step-size 4 72.50 (174/240) 2.92 (7/240) 3 72.50 (174/240) 2.92 (7/240) 2 67.92 (163/240) 2.92 (7/240) 1 67.92 (163/240) 7.08 (17/240) RQ4 δval Avg. Success Initial Condition Rate 1 0.9735 0.5 0.9742 0.1 0.9883 0.01 1.0 RQ5 δval Avg. Success Initial Condition Rate 3 0.89 5 0.78 Table 6.1: Empirical demonstration of the robustness of adversarial testing. RQ4. Does the adversarial agent generalize across initial conditions, i.e. if we pick a small subset of the initial conditions to train an adversarial policy, does the policy discover counterexamples on initial states that were not part of the training set? RQ5. Does the adversarial agent generalize to arenas with δval-bisimilar dynamics? For the first three RQs, we use λtrain = (4, Cpos, 2), where Cpos is the set of all possible initial conditions for the agents. Table 6.1 shows the results for RQ1. Our trained agent easily surpasses the average performance of both 10 and 20 random adversarial agents across all initial locations in the training set Cpos. The average number of violations found by a random adversarial agent is around 10%, while our trained adversarial agent captures the ego within the given time limit from 68% of the initial conditions. 109 Thus, finding an adversarial agent policy that works for a majority of the initial cells is sufficiently difficult. From the results for RQ2, ado[λtrain] successfully causes the ego agent to violate its specification for varying map sizes, even as large as 10 × 10, though it was trained on a 4 × 4 map. In contrast a random adversarial agent is rarely successful. From the results for RQ3, we observe that the trained adversarial agent succeeds even against an ego that uses different step sizes than those on which the adversarial agent was trained. For RQ4, we used λtrain = (10, C100 pos , 1), where C 100 pos is a set of 100 randomly sampled initial positions (note that the total number of initial configurations is 4950). We found 100 counterexample states during training, of which we chose 5 at random. For each of these states x, we obtained the value of the state as maintained by the PPO algorithm, and found all states x ′ s.t. |V (x) − V (x ′ )| < δval. We computed the fraction of these states that also led to counterexamples. For four of the identified counterexample states, δval satisfied the conditions outlined in Theorem 6.4.4, and the results are shown in Table 6.1. As expected, smaller the value of δval, higher is the number of failing states with nearby values. For RQ5, we used λtrain = (10, C200 pos , 1), where C 200 pos was a randomly chosen set of 200 initial states. We defined a refinement of the map, basically an δval × δval grid was imposed on each grid cell of the original map. We considered an adversarial policy that basically used the same action as that of the original coarser grid cell, while the ego agent used a refined policy. We can establish that the resulting transition system is actually δval-bisimilar to the original transition system. For different values of δval, we picked 3 sets of 300 random initial states and tested if they led to counterexamples. The average success rates are shown in Table 6.1. We see that an abstract adversarial policy can violate the ego spec surprisingly often. 110 Case Study Description STL Formula ACC Ego: Avoid collision G[0,T](d ≥ dsafe) (6.12) Adversarial agent: Velocity bounds G(vmin ≤ vado ≤ vlim) (6.13) Lane Ego: Avoid collision G[0,T](d2 ≥ dsafe) (6.14) Change Adversarial agent: Init. pos. dlong > dsafe (6.15) Yellow Ego: Don’t run red light ¬F[0,T](ℓR ∧ dℓ,ego ∈ [−δval, 0]) (6.16) Light Adversarial agent: Speed Limits G[0,T](vado < vlim) (6.17) Adversarial agent: Don’t run red light ¬F[0,T](ℓR ∧ dℓ,ado ∈ [−δval, 0]) (6.18) Table 6.2: Ego Specifications and Adversarial Rules for case studies (a) (b) (c) Figure 6.4: Traces in (a) show an early episode in the driving in the lane case study. The adversary is unable to cause a collision, and the distance between the ego vehicle and the adversarial vehicle remains above the collision threshold for the duration of the episode. Traces in (b) show a later episode in which the adversary successfully causes a collision. Traces in (c) show a different behavior that the adversary successfully learned to cause a collision. 6.5.2 Autonomous Driving Case studies We apply our adversarial testing framework to three case studies from the autonomous driving domain‡ . We used the Carla driving simulator [91] as a means to stress-test a controller driving a car in three different scenarios, (1) freeway driving on a straight lane, (2) freeway driving with a car merging into the ego lane, (3) driving through a yellow light. The adversarial agent was developed in python, and the neural networks used in the DQN examples were developed in pytorch [193]. Figure 6.6 illustrates the main implementation steps of our testing framework. Since CARLA cannot make a car travel directly at a given speed. We let the ego and adversarial car accelerate until it reaches the target speed according to the initial condition, then teleport it to the distance specified by the initial condition, and then start training. ‡We provide one more case study in the appendix. 111 Adaptive Cruise Control. In this experiment, two vehicles are driving in a single lane on a highway. The lead vehicle is the adversarial agent. The follower vehicle avoids colliding into the leader using an adaptive cruise controller (ACC). The purpose of adversarial testing is to find robust adversarial policies that cause the ACC system to collide with the adversarial agent. The ACC controller modulates the throttle (α) by observing the distance (d) to the lead vehicle and attempting to maintain a minimum safe following distance dsafe. The ACC controller is a Proportional-Derivative (PD) controller with saturation. The PD term u is equal to Kp(d − dsafe) + Kd(vado − vego), and the controller action α is defined as αmax if u > αmax, αmin if u < αmin, and u otherwise. The ego specification is given in Eq. (6.12). Here, T is the maximum duration of a simulation episode and dsafe is the minimum safe following distance. The adversarial agent should cause the ego to violate its spec in less than T seconds. The adversarial constraint specifies that it should not exceed the speed limit vlim and that it should maintain a minimum speed vmin. For our experiment, we choose vmin = 0.1, and dsafe = 4.7m. The distance d is computed between the two front bumpers. This represents a car length of 4.54m, plus a small safety margin. The state of the adversarial agent is the tuple d, vego, vado. At each time step, the adversarial agent chooses an acceleration from a discretized space which contains 3 possible actions. In this experiment, we explore two different RL algorithms: Q-learning and a DQN algorithm with replay buffers [184]. The average runtime using the DQN (1.93 hours) is less than that using a Q-table (4.83 hours) and gives comparable success rates: 54.8% for the DQN agent vs. 55.79% for the agent using Q-tables. The average time to run a single episode is between 29 and 30 seconds. Fig. 6.4 shows 3 episodes from the same initial position for the ego and adversarial vehicles where initially the adversarial agent is not able to find an adversarial behavior, then in the later two episodes the adversarial agent is able to cause the ego to collide with it. Lane Change Maneuvers. In this experiment, 2 vehicles are driving on a two-lane highway. The ego vehicle is controlled by a switching controller that alternates between cruising and avoiding a collision by applying a “hard” brake. The ego controller predicts future adversarial agent positions based on the current 112 (a) Adversarial vehicle changes lane far away from the ego vehicle. (b) Adversarial vehicle shifts left, then changes lane. (c) Adversarial vehicle shifts left substantially, then changes lane with a steep angle. (d) Adversarial vehicle attempts to change lane smoothly while staying close to the ego vehicle. (e) The adversarial vehicle changes lane aggressively and hits the ego vehicle, violating traffic rules. (f) Adversary changes lane and induces a crash without breaking the traffic rules Figure 6.5: Adversarial vehicle behaviors across episodes in the lane change maneuvers case study. state using a look-ahead distance dlka = dlat − vado,lattlka, where dlat is the lateral distance between the vehicles, vado,lat is the adversarial agent’s lateral velocity and tlka is a fixed look-ahead time. Based on dlka, it switches between two control policies: if dlka > dsafe, then it continues to cruise, but if dlka ≤ dsafe, it applies the brakes. The adversarial agent is in the left lane and attempts to merge to the right in a way that causes the ego to collide with it. We add a constraint to ensure that the adversarial agent should always be longitudinally in front of the ego car when it tries to merge as specified in Eq. (6.15); here dlong is the longitudinal distance between the cars. Without this constraint, the adversarial agent can always induce a sideways-crash. The ego spec is given in Eq. (6.14). Here, d2 = q d 2 long + d 2 lat is the Euclidean distance between the two cars. In the course of training, we observe that the behavior of the adversarial agent improves with time. Training for 106 episodes requires 2.53 hours and gives us a success rate of 71%, i.e. 71% of the episodes 113 β bound µρ σρ num success init failure init 0.1 0.03367 0.1287 0.0064 1225 1192 (97.3%) 33 0.2 -0.08860 0.09965 0.0074 1633 1565 (95.8%) 68 0.5 -0.45539 0.04782 0.0254 1883 1565 (83.1%) 318 1 -1.0667 -0.2932 0.1973 3159 1565 (49.5%) 1594 2 -2.2893 -0.4013 0.2678 3571 1565 (43.8%) 2006 Table 6.3: Demonstration of Theorem 6.4.5. In all cases, the mean robustness degradation is bounded below as predicted by the theorem. Initial conditions with small value function degradation β are more likely to yield counterexamples, as predicted by the theory. The column success init and failure init shows the number of initial conditions that leads to a successful falsifying case or a non-successful falsifying case. lead to a collision. With 206 episodes, the success rate improves to 72.35% but requires 4.71 hours of runtime. After 371 episodes, the success rate improves to 75.76% after 9.44 hours. This experiment demonstrates that even with a relatively small time budget, the constrained RL agent can learn a policy that induces failures with high probability. Generalizability. In both case studies, we observed generalizability of the adversarial policy to different initial conditions. In the ACC case study, we found several initial states within δval = 6.5×10−6 that were also counterexample states. Overall the states have smaller values as the episode lengths are longer and the γ T term causes values to be small. We observed that for the failing initial state (vego 7→ 12, vado 7→ 12, d 7→ 15), we found failing initial states with values of both vado and d that were both smaller and larger than those in the original initial state. However, some failing initial states did not have states with nearby values that were violating. This can be attributed to the fact that the RL algorithm may not have converged to a value close to optimal. For the second case study, we found that the failing initial condition (vego 7→ 12, vado 7→ 12, d2 7→ 16) has several nearby failing initial states with a small value of δval that were not previously encountered during training. Table 6.3 demonstrates the generalization statistics for the ACC case study. The leftmost column, β, is a degradation of the RL value function that we wish to consider. Given this degradation of the value function, the bound column is the bound predicted by Theorem 6.4.5. The values of the table are computed by first 114 Figure 6.6: An illustration of our implementation structure taking an initial condition x0 that produced a counterexample and then sampling multiple new initial conditions x ′ 0 so that |V (x0) − V (x ′ 0 )| ≤ β. The column labeled num denotes the number of such initial conditions that are sampled. Then, we run a simulation from x ′ 0 with a frozen version of the adversarial agent, i.e. one that is not learning anymore. The column labeled µρ denotes the mean robustness of the simulation traces, and σρ denotes the standard deviation of such robustnesses. We note that in all cases, the mean robustness is larger than the lower bound predicted by the theorem, and that the table demonstrates that there is a high density of counterexamples where the value function degradation is smaller, and fewer counterexamples where the value function degradation is larger. This demonstrates that the adversarial agent has learned a generalizable policy, which correctly reflects the landscape of initial conditions that lead to counterexamples. In this table, the episode length is T = 20 and the discount factor is γ = 0.99. Yellow Light. In this experiment, the ego vehicle is approaching a yellow traffic light led by an adversarial vehicle. Let the signed distance of the ego, adversarial vehicle from the light be respectively dℓ,ego, dℓ,ado, and Boolean variables ℓY and ℓR be true if the light is respectively yellow and red. We use the convention that dℓ,ego > −δval if the ego vehicle is approaching the light and dℓ,ego < −δval if it has passed the light (resp. for adversarial vehicle). By traffic rules, a vehicle is expected to stop δval meters away from the traffic light (e.g. δval could be the width of the intersection being controlled by the light), I.e. the vehicle should 115 Figure 6.7: Yellow light case study. The green region represents the region in which the ego vehicle will run the yellow light. The adversary learns to drive the ego car into the target region. stop at 0. Thus, if dℓ,ego ∈ [−δval, 0] when the light turns red, it has run the red light. This ego specification is shown in Eq. (6.16). The traffic light is modeled as a non-adversarial agent, it merely changes its state based on a pre-determined schedule. The goal of the adversarial vehicle is to make the ego vehicle run the red light. The rule-based constraints on the adversarial vehicle are that it may not drive backwards and it may not run the red light (shown in Eqs. (6.17),(6.18) resp.). The state of the adversarial agent includes the speed of both vehicles, and relative distance between the vehicles. At the start of an episode, dℓ,ado = 30, and ℓY is true, and ℓR becomes true after τ = 2 seconds. The ego controller is a switched mode controller that either uses an ACC controller or applies the maximum available deceleration aego,max. At time t < τ , let d(t) denote the distance required for the ego vehicle to come to a stop before the light turns red by applying aego,max. We can calculate d(t) as vego·(τ−t)+0.5·aego,max·(τ−t) 2 . Then, at time t, the ego controller chooses to cruise if d(t)+δval < dℓ,ego, and brakes otherwise. Figure 6.7 shows that the ego vehicle maintains an appropriate distance to the lead car, but that it starts decelerating too late and is thus caught in the intersection when light turns red. The adversarial vehicle successfully clears the intersection while the light is still yellow, consistent with its constraints. The adversarial agent we train uses the DQN RL algorithm. After 162 episodes of training, approximately 60% of its episodes find a violation of the ego specification with a runtime of 1.88 hours. After 247 episodes, 116 the success rate increases to 64% with a runtime of 2.59 hours. This case study demonstrates that our adversarial testing procedure succeeds even in the presence of multiple adversarial rules and an interesting ego specification. 6.6 Related Work and Conclusions Adaptive Stress Testing. The work of [149, 75] is closely related to our work. In this work, the authors also use deep RL (and related Monte Carlo Tree search) algorithms to seek behaviors of the vehicle under test that are failure scenarios. There a few key differences in our approach. In [149], reward functions (that encode failure scenarios) are hand-crafted and require manual insight to make sure that the RL algorithms converge to behaviors that are failure scenarios. Furthermore, the constraints on the adversarial environment are also explicitly specified. The approach in [75] uses a subset of RSS (Responsibility-Sensitive Safety) rules that are used to augment hand-crafted rewards to encode failure scenarios by the ego and responsible behavior by other agents in a scenario. In specifying STL constraints, we remove the step of manually crafting rewards. And the robustness value of STL makes our theorem possible. Falsification. There is extensive related work in falsification of cyber-physical system. Most falsification techniques use fixed finitary parameterization of system input signals to define a finite-dimensional search space, and use global optimizers to search for parameter values that lead to violation of the system specification. A detailed survey of falsification techniques can be found in [85]. A control-theoretic view of falsification tools is that they learn open-loop adversarial policies for falsifying a given ego model while our approach focuses on closed-loop policies. Falsification using RL. Also close to our work are recent approaches to use RL [281] and deep RL [11] for falsification. The key focus in work [281] is on solving the problem of automatically scaling quantitative semantics for predicates and effective handling of Boolean connectives in an STL formula. The work in 117 [11] focuses on a smooth approximation of the robustness of STL and thoroughly benchmarks the use of different deep RL solvers for falsification. Comparison. Compared to previous approaches, the focus of our chapter is on reusability of dynamically constrained adversarial agents trained using RL techniques. We identify conditions under which a trained adversarial policy is applicable to a system with a different initial condition or different dynamics with no retraining. This can be of immense value in an incremental design and verification approach. The other main contribution is that instead of using a monolithic falsifier, our technique packs multiple, dynamically constrained falsification engines as separate agents; dynamic constraints allow us to specify hierarchical traffic rules. Also, previous approaches for falsification do not consider dynamic constraints on the environment at all, only simple bounds on the parameter space. Finally, in our approach, both specifications and constraints are combined into a single reward function which can then utilize off-the-shelf deep RL algorithms. In comparison to [281, 11, 164, 122], our encoding of STL formulas into reward functions is simplistic as it is not the main focus of this chapter; we defer extensions that consider nuanced encoding of STL constraints to future work. The emphasis in [11] is to use the training process (which includes exploration) to find a (possibly non-robust) single falsifying behavior. Work [284] use fuzzing algorithms to find multiple scenarios that cause the ego to fail with coverage measures, whereas, in our work we focus on training the RL agent to obtain a robust falsifying policy. Conclusions. Our work addresses the problem of automatically performing constrained stress-testing of cyberphysical systems. We use STL to specify the target against which we are testing and constraints that specify reasonableness of the testing regime. We are using STL as a lightweight, high-level programming language to loosely specify the desired behaviors of a test scenario, and leveraging RL algorithms to determine how to execute those behaviors. The learned adversarial policies are reactive, as opposed to testing schemes that rely on merely replaying pre-recorded behaviors, and under limited conditions can even provide valuable testing capability to modified versions of the system. 118 Chapter 7 Shape Expressions for Specifying and Extracting Signal Features 7.1 Introduction Cyber-physical systems (CPS) and Internet-of-Things (IoT) applications are everywhere around us - smart buildings that adapt heating control to the user’s habit, intelligent transportation systems that optimize traffic based on the continuous monitoring of the road conditions, wearable health monitoring devices, and medical devices that fine-tune a given therapy depending on sensing a patient’s health. These applications are inherently data-driven – the decisions of the system rely on the measurement and analysis of the dynamic behavior of the environment. Low-cost sensing solutions combined with the availability of powerful edge and cloud devices to store and process data has led to a tremendous increase in the generation, measurement and recording of time-series data. Processing these huge streams of available data in an efficient manner to extract useful information is challenging. It is often the case that only specific segments of the time series contain interesting and relevant patterns. For instance, an electricity provider may be interested in observing spikes or oscillations in the voltage signals. A medical device manufacturer may want to detect anomalous cardiac behavior. A wearable device maker would like to associate specific patterns in the measurements from accelerometer and gyroscope sensors to a concrete user activity, such as running or walking. 119 Such patterns can be often characterized with geometric shapes observed in the time-series data; e.g., a spike can be specified as an “upward triangle”, i.e. a sequence of two contiguous line segments with slopes that have opposite signs. There are also instances where the time-series data is multi-dimensional (say (x(t), y(t))), and the user may be interested in knowing if a “pulse” shape in x(t) is followed by an “exponential decay” shape in y(t). In this chapter, we propose shape expressions, a declarative language for specifying sophisticated temporal patterns over (possibly multi-dimensional) time series. A shape expression is in essence a regular expression where atomic predicates are arbitrary (linear, exponential, sinusoidal, etc.) shapes with (slope, offset, frequency, etc.) parameters, and with additional parameter constraints. We associate to shape expressions a noisy language that allows observed data to approximately match the expression. The noisy expression semantics combines classical regular expression semantics with statistical regression, which is used to match atomic shapes and infer parameter valuations that minimize the noise between the ideal shape and the observation. We allow either using mean squared error (MSE) or the coefficient of determination (CoD), statistical measures of how close the observed data are to the fitted regression (atomic) shape, as our noise metric. We define shape automata as an executable formalism for matching shape expressions and propose a heuristic for querying time series with shape expressions efficiently. We apply this algorithm to several case studies from diverse CPS and IoT domains to demonstrate its applicability. Illustrating Example We use the example depicted in Figure 7.1a to illustrate the concepts presented in this chapter. This figure shows a raw noisy signal that contains two pulses. The two pulses differ both in duration, depth and offset, but have the same qualitative shape that characterizes them as pulses. Fig. 7.1a shows a specification of an ideal pulse. We characterize a pulse as a sequence of 5 segments: (1) constant segment at some b; (2) linearly decreasing segment with slope a2 < 0; (3) constant segment at some b3; (4) linearly increasing 120 (a) b t1 t2 t3 t4 t5 t6 b2 b3 (b) Figure 7.1: (a) Two pulses shapes (b) Idealized Pulse shape (Color figure online) segment with slope a4 > 0; and (5) constant segment at b. We observe that the above specification uses parametric shapes, where the parameters are possibly constrained (e.g. a2 < 0) or shared between shapes (e.g. b), and describes a perfect shape without accounting for noise. Related Work Regular expressions and temporal logics are the most common general purpose specification languages for expressing temporal patterns in the formal methods community. However, specifying temporal patterns in data is a problem that has been pervasively studied. For instance, specification and recognition of a pulse in pulse-based communications is an IEEE standard [131] in its own right. Extracting unspecified motifs in time series has been studied in data-mining [207], and feature extraction using patterns has been studied in machine learning [190, 108]. More recently, time series shapelets were introduced in [271] as a data mining primitive. A shapelet is a time series segment representing a certain shape identified from data. Our work is partially motivated by the concept of shapelets. In contrast to shapelets that are extracted from unlabelled data, shape expressions provide a more supervised feature extraction mechanism, in which domain-specific knowledge is used to express shapes of interest. In the context of CPS, timed regular expressions (TRE) [25, 24], quantitative regular expressions (QRE) [15, 6, 17, 180], and Signal Temporal Logic (STL) [179] have been used as popular formalisms for specifying properties of continuous-time and real-valued behaviors. QREs is a powerful formalism that combines quantitative computations over data with regular expression-based matching. An offline algorithm for 121 matching TREs was proposed in [250, 248]. This thread of work was extended to online pattern matching in [249]. Automata-based matching for TREs has been developed in [259, 260, 261]. In contrast to our approach, pattern matching with QREs and TREs is sensitive to noise in data. The problem of uncertainty has been studied through parameterized TRE specifications, either by having parameters in time bounds [18] or in spatial atomic predicates [32]. These approaches are orthogonal to ours – instead of having parameters on standard TRE operators, we focus on a rich class of parameterized atomic shapes. Finally, a sophisticated algorithm to incrementally detect exponential decay patterns in CO2 measurements was proposed in [266] in the context of smart building applications. We adapt and extend this basic idea to a general purpose specification language that allows combining such atomic shapes with regular operators. 7.2 Shape Expressions and Automata In this section, we define shape expressions as our pattern specification language. In essence, they are regular expressions over parametrized signal shapes, such as linear, exponential or sine segments, and with additional parameter constraints. We then define shape automata, which translate shape expressions and provide an executable formalism for recognizing composite signals made of several types of segments. This executable formalism captures exactly the notion of shape expression, and will allow us to define a family of pattern matching algorithms as we will see in Section 7.3. We first give a few basic definitions necessary to our framework, such as notions of signals, parameters, and shapes. 7.2.1 Definitions Let P = {θ1, . . . , θn} be a set of parameter variables. A parameter valuation val maps variables θ ∈ P to values val(θ) ∈ R ∪ {⊥}, where ⊥ represents the undefined value. We use the shortcut val(P) to denote {val(θ1), . . . , val(θn)}. A constraint γ over P is a Boolean combination of inequalities over P. We write val |= γ when the constraint γ is satisfied by the valuation val. Given θ ∈ P and θ ◦k for ◦ ∈ {<, ≤, >, ≥} 122 and some k ∈ R, we have that val(θ) = ⊥ implies that val ̸|= θ ◦ k. We denote by Γ(P) the set of all constraints over P. Let X be a set of signal variables. A signal ξ over X is a function ξ : X × [0, l) → R, where [0, l) is the time domain of ξ, which we assume to be discrete, hence a subset of Z. We denote by |ξ| = l the length of ξ. Given two signals ξ1 : X × [0, l1) → R and ξ2 : X × [0, l2) → R, we denote by ξ ≡ ξ1 · ξ2 their concatenation ξ : X × [0, l1 + l2) → R, where for all x ∈ X, ξ(x, t) = ξ1(x, t) if t ∈ [0, l1) and ξ(x, t) = ξ2(x, t−l1) if t ∈ [l1, l1 +l2). Let ξ : X ×[0, l) → R be a signal, and l1 and l2 be two constants such that 0 ≤ l1 < l2 ≤ l. We denote by ξ [l1,l2) : X × [0, l2 − l1) → R the restriction of ξ to the time domain [l1, l2), such that for all x ∈ X and t ∈ [0, l2 − l1), ξ [l1,l2) (x, t) = ξ(x, t + l1). We allow signals of null duration l = 0, which results in the unique signal with the empty time domain∗ . Consider two sequences y = y1, . . . , yn and f = f1, . . . , fn of values, where y represents a sequence of observations and f the corresponding sequence of predictions given by a model which approximates the distribution of y. The mean squared error MSE(y,f) of f relative to y is a statistical measure of how well the predictions of a (regression) model approximates the observations, and is defined as follows. MSE(y,f) = 1 n Σ n i=1(yi − fi) 2 Another statistical measure in a regression analysis of how well the predictions of a (regression) model approximates the observations is the coefficient of determination R2 , defined in terms of the mean y¯ of the sequence y, its total sum of squares SStot and the residual sum of squares SSres as follows: R2 (y,f) = 1 − SSres(y,f) SStot(y) y¯ = 1 nΣ n i=1yi SStot(y) = Σn i=1(yi − y¯) 2 SSres(y,f) = Σn i=1(yi − fi) 2 ∗The signal with the empty time domain is equivalent to the empty word in the classical language theory 123 The coefficient of determination R2 typically ranges from 0 to 1. An R2 of 1 indicates that the predictions are a perfect match of the observations. On the contrary, an R2 of 0 indicates that the model explains none of the variability of the response data around its mean. Negative values of R2 can occur if the predictions fit the observations worse than a horizontal hyperplane. 7.2.2 Shape Expressions We now define the syntax and semantics of shape expressions defined over the set X of signals and the set P of parameter variables. A shape σx(P ′ ) is an expression that maps parameter variables P ′ ⊆ P and the signal variable x ∈ X to a parameterized family of idealized signals. To every shape σx, we associate a special duration variable lσ,x that is included in the set P of parameter variables.† Consider the basic shapes below. linx(a, b, l) ≡ {w | ∃val.|w| = val(l) ∧ w(x, t) = t · val(a) + val(b)} (7.1) expx (a, b, c, l) ≡ {w | ∃val.|w| = val(l) ∧ w(x, t) = val(a) + val(b)e t·val(c) } (7.2) sinx(a, b, c, d, l) ≡ {w | ∃val.|w| = val(l) ∧ w(x, t) = val(a) + val(b) sin(val(c)t + val(d))} (7.3) In (7.1), we describe a line segment parameterized by its slope a, and intercept b. In (7.2), we describe an exponential shape with parameters a, b, c, and l, while (7.3) describes a parameterized family of sinusoidal shapes with the specified parameters‡ . Given a valuation val and a shape σx(P ′ ), we denote by ξ(x) = σx(v(P ′ )) the signal ξ that instantiates the shape σx to concrete parameter values defined by v. We assume a finite set Σ of shapes, without imposing further restrictions. Shape expressions (SE) are regular †We use l instead of lσ,x whenever its association to σx is clear from the context, and omit lσ,x altogether when not interested in the duration of the shape. ‡We omit the duration variable l whenever we are not interested in the duration of a shape - for instance we then use the notation sin(a, b, c, d). 124 expressions, where shapes with unknown parameters play the role of atomic primitives, and which have an additional restriction operator for enforcing parameter constraints. Definition 7.2.1 (SE syntax). The shape expressions are given by the grammar φ ::= ϵ | σx(P ′ ) | φ1 ∪ φ2 | φ1 · φ2 | φ ∗ | φ : γ where σ ∈ Σ, x ∈ X, P ′ ⊆ P, and γ ∈ Γ(P). We write φ i as an abbreviation of φ · · · φ (i times). We denote by ΣX(P) the set of expressions of the form σx(P ′ ) for σ ∈ Σ, x ∈ X and P ′ ⊆ P. The set of shape expressions over P and X is denoted Φ(P, X). Example 7.2.2. Consider the visual pulse specification from Figure 7.1a. We describe an ideal pulse as a shape expression φpulse as follows§ : φ ≡ linx(0, b) · linx(a2, b2) : a2 < 0 · linx(0, b3) · linx(a4, b4) : a4 > 0 · linx(0, b) The semantics of shape expressions is given as a relation between signals and parameter valuations, which we call a language. We associate with every shape expression a noisy language Lν for some noise tolerance threshold ν ≥ 0, capturing the ν-approximate meaning of the expression. The exact language L capturing the precise meaning of the expression is obtained by setting ν to zero. To define the noisy language of an expression, we associate a goodness of fit measure of a signal to an ideal shape, describing how far is the observed signal from the ideal shape. We derive this measure by combining mean squared error (MSE) computed on atomic shapes. The overall measure gives the quality of a match to a shape expression. We formally define the noisy language as follows. §We abuse the notation and replace a parameter variable by a constant, for instance linx(0, b), as a shortcut for linx(a1, b) : a1 = 0. 125 Definition 7.2.3 (SE noisy language). Let ν ∈ R≥0 be a noise tolerance threshold. The noisy language L of a shape expression is defined as follows: Lν(ϵ) = {(ξ, v) | |ξ| = 0} Lν(σx(P ′ )) = {(ξ, v) | |ξ| = v(l) and µ(ξ(x), σx(v(P ′ ))) ≤ ν} Lν(φ1 · φ2) = {(ξ1 · ξ2, v) | (ξ1, v) ∈ Lν(φ1) and (ξ2, v) ∈ Lν(φ2)} Lν(φ1 ∪ φ2) = Lν(φ1) ∪ Lν(φ2) Lν(φ ∗ ) = [∞ i=0 Lν(φ i ) Lν(φ : γ) = {(ξ, v) | (ξ, v) ∈ Lν(φ) and v |= γ} where µ(y,f) is substituted by either MSE(y,f) or 1 − CoD(y,f). Example 7.2.4. Consider the shape expression φpulse specifying a pulse, the signal ξ depicted in Figure 7.1a, and the signal ξ ′ = ξ I the restriction of ξ to the interval I = [7, 26). Let us consider v = (v(a2), v(a4), v(b), v(b2), v(b3), v(b4)) = (−0.67, 0.67, 9, 17, 7, −5) the valuation of parameter variables in φpulse that instantiates the ideal shape (red line) of the first pulse depicted in Figure 7.1a. Let ξ1 = ξ [7,12) , ξ2 = ξ 12,15) , ξ3 = ξ [15,18) , ξ4 = ξ [18,21) and ξ5 = ξ [21,26) . We have that: MSE(ξ1(x), linx(0, v(b))) = 0.04 MSE(ξ4(x), linx(v(a4), v(b4))) = 0.35 MSE(ξ2(x), linx(v(a2), v(b2))) = 0.49 MSE(ξ5(x), linx(0, v(b))) = 0.10 MSE(ξ3(x), linx(0, v(b3))) = 0.13 It follows that (ξ ′ , v) ∈ L0.5(φpulse) but (ξ ′ , v) ̸∈ L0.1(φpulse). 126 linx(0, b) q0 q1 q2 q3 q4 q5 linx(0, b) a2 < 0 linx(a2, b2) linx(0, b3) linx(a4, b4) a4 > 0 Figure 7.2: Shape automaton Apulse 7.2.3 Shape Automata We now define shape automata, which will act as recognizers for shape expressions. They are akin to finite state automata in which edges are labeled by shape expressions with unknown parameters, and parameter constraints. We will then show that they are inter-translatable to shape expressions. The syntax of a shape automaton (SA) is given as follows. Definition 7.2.5 (Shape automata). A shape automaton is a tuple ⟨P, X, Q, ∆, S, F⟩, where (1) P is the set of parameters, (2) X is the set of real-valued signal variables, (3) Q is the set of control locations, (4) ∆ ⊆ Q × ΣX(P) × Γ(P) × Q is the set of edges, (5) S ⊆ Q is the set of starting locations, and (6) F ⊆ Q is the set of final locations. Example 7.2.6. The shape automaton Apulse, shown in Figure 7.2 recognizes pulse shapes specified by the shape expression φpulse. A state in a shape automaton is a pair (q, v) where q is a location and v is a parameter valuation. The runs of shape automata are akin to those in weighted automata and defined as follows. For a signal ξ we define transitions ξ −−→ cost between two states as follows. We have (q, v) ξ −−→ cost (q ′ , v′ ) if there exists (q, σx(P ′ ), γ, q′ ) ∈ ∆ such that P ′ ⊆ P, cost = µ(ξ(x), σx(v ′ (P ′ ))), v ′ |= γ, v ′ (p) = v(p) for all p ∈ P\P ′ and v ′ (p) = v(p) also for all p ∈ P ∩P ′ such that v(p) ̸= ⊥. The semantics of a shape automaton are given as follows:. 127 Definition 7.2.7 (Shape automaton run). A run of a shape automaton over some signal ξ is a sequence of transitions (q0, v0) ξ1 −−−→ cost1 (q1, v1) ξ2 −−−→ cost2 . . . ξn −−−→ costn (qn, vn) such that q0 ∈ S, v0 = (⊥, . . . , ⊥) and qn ∈ F, where ξ1 · ξ2 . . . ξn is a decomposition of ξ. Such a run κ induces cost(κ) = maxn i=1 costi and the parameter valuation val(κ) = vn. The set of runs of a shape automaton A over some signal ξ is denoted K(A, ξ). A shape automaton A associates any given signal ξ to a similarity measure that is the minimum among the similarity measures of all runs. Definition 7.2.8 (SA language and noisy language). The noisy language of a shape automaton for a given noise tolerance threshold ν ∈ R+ is Lν(A) = {(ξ, v) | ∃κ ∈ K(A, ξ) s.t. val(κ) = v and cost(κ) ≤ ν}. The exact language of a shape automaton is L(A) = L0(A). Example 7.2.9. Consider the signal ξ ′ = ξ1ξ2ξ3ξ4ξ5 from Example 7.2.4 and let: v1 = (⊥, ⊥, 9, ⊥, ⊥, ⊥) c1 = 0.04 v4 = (−0.67, 0.67, 9, 17, 7, −5) c4 = 0.35 v2 = (−0.67, ⊥, 9, 17, ⊥, ⊥) c2 = 0.49 v5 = (−0.67, 0.67, 9, 17, 7, −5) c5 = 0.10 v3 = (−0.67, ⊥, 9, ⊥, 7, ⊥) c3 = 0.13 We then have, assuming v0 = (⊥, ⊥, ⊥, ⊥, ⊥, ⊥), that κ = (q0, v0) ξ1 −−−→ cost1 (q1, v1) ξ2 −−−→ cost2 · · · ξ5 −−−→ cost5 (q5, v5) is a run of Apulse over ξ ′ with cost(κ) = 0.49 and ξ ′ ∈ L0.5(Apulse). We now formally show the equivalence between shape expressions and shape automata. The first direction of the theorem allows to construct automata recognizers for arbitrary expressions. The second 128 direction of the theorem shows that shape expressions are expressively complete relative to the class of automata under consideration. Theorem 7.2.10 (SE ⇔ SA). For any shape expression φ there exists a shape automaton Aφ such thatLν(Aφ) = Lν(φ) for all ν ≥ 0. For any shape automaton A there exists a shape expression φA such that Lν(φA) = Lν(A) for all ν ≥ 0. 7.3 Pattern Matching In Section 7.2.3, we introduced shape automata to recognize signals that are close to a specified shape. However, a shape expression is not intended to represent a whole signal, but only a segment thereof. In this section, we extend shape automata to enable them identifying all signal segments that match specific shapes. We first define the notion of noisy match sets. Definition 7.3.1 (Noisy match set). For any signal ξ defined over a time domain T = [0, l), shape expression φ and noise tolerance threshold ν, we define the match set M(φ, ξ) and the noisy match set Mν(φ, ξ) as follows: Mν(φ, ξ) = {(t, t′ ) ∈ T 2 | t ≤ t ′ and ξ [t,t′ ) ∈ Lν(φ)} Given a shape automaton A, its associated shape pattern matching automaton Aˆ is another shape automaton that extends A with dedicated initial and final locations, which allow Aˆ to silently consume a prefix and a suffix of a signal. The construction follows [33] and is given in the definition below. Definition 7.3.2 (Shape pattern matching automaton). Let A = ⟨P, X, Q, ∆, S, F⟩ be a shape automaton. Then the corresponding shape pattern matching automaton is Aˆ = ⟨P, X, Q, ˆ ∆ˆ , S, ˆ Fˆ⟩, where 129 • Qˆ = Q ∪ {s, ˆ ˆf}, Sˆ = {sˆ}, Fˆ = { ˆf}, • ∆ = ∆ ˆ ∪ {(ˆs, any, true, q) | q ∈ S} ∪ {(q, any, true, ˆf) | q ∈ F}, where any is a special shape such that µ(ξ, any) = 0 for all ξ. Intuitively, given a signal ξ, a shape expression φ and its associated shape pattern matching automaton Aˆ φ, an accepting run κ over ξ decomposed into ξ0 · ξ1 · · · ξn+1 in Aˆ φ (ˆs, v0) ξ0 −→ 0 (q0, v0) ξ1 −−−→ cost1 . . . ξn −−−→ costn (qn, vn) ξn+1 −−−→ 0 ( ˆf, vn) represents one potential match (defined by segment (t, t′ ) in ξ where t = |ξ0| and t ′ = |ξ| − |ξn+1|) with one specific parameter instantiation (vn) and its associated similarity measure cost(κ) = maxn i=1 costi . We denote by λ(κ) = (t, t′ ) the label of run κ over ξ in Aˆ. We first note that for a given decomposition of ξ, there is an infinite number of runs over ξ in Aˆ φ that follow that decomposition due to the parameters being valued as real numbers. We also note that for a given signal ξ, there is a finite (but large) number of its decompositions. Example 7.3.3. Figure 7.3 shows three runs κ1, κ2 and κ3 over ξ in Aˆ pulse and the corresponding ideal shapes defined by the valuations computed during the runs. We can see that each run identifies one segment of ξ that could be a potential match of the shape expression φpulse with specific parameter values and cost. In particular, we can observe that runs κ1 and κ2 decompose ξ in the same manner but with different parameter valuations, resulting in cost(κ1) < cost(κ2). From the above observations, we obtain that the labeling of the set of runs associated to a shape pattern matching automaton Aˆ and a signal ξ gives us exactly the match set of L(A) relative to ξ. Theorem 7.3.4. Let φ be a shape expression, Aˆ φ the corresponding shape pattern matching automaton, ξ a signal and ν a noise tolerance threshold. We have that Mν(φ, ξ) = {(t, t′ ) | ∃κ ∈ K(Aˆ φ, ξ) s.t. λ(κ) = (t, t′ ) and cost(κ) ≤ ν}. 130 Figure 7.3: Pulse train - three runs κ1, κ2 and κ3 over ξ in Aˆ pulse. We observe that while this in principle solves the SE pattern-matching problem, the complexity in terms of signal length is not practical. Let us define the dot-depth of some expression φ the maximal number of concatenations featured on any branch of its syntax tree. Theorem 7.3.5. The size of the set of runs of a shape matching automaton Aˆ φ is Ω(n k+2), where n is the size of the trace, and k is the dot-depth of φ. The dot-depth of any expression is nonnegative, hence this lower bound is at least quadratic in the length of the signal. This means that any exhaustive algorithm will not scale in many practical applications, where typical signal can be over 106 samples long. We propose two ways to handle complexity: (1) bound the length of matches, or (2) develop heuristics to efficiently match shape expressions. Bounding the length of matches is reflected in the following definition. Definition 7.3.6 (Bounded shape expressions). A shape expression is said to be bounded (by lk) when for all words ξ we have that ξ ∈ L(φ) implies |ξ| ≤ k. Theorem 7.3.7 (Linear-time upper bound). For an expression φ bounded by k the size of the set of accepting runs of the shape matching automaton can be represented by a dag of size O(nk2 m·km ), where n is the length of the trace and m is the length of the expression. 131 7.4 Policy Scheduler for Shape Matching Automata In this section, we propose a heuristic in the form of a policy scheduler that efficiently approximates the complete match set by computing a representative subset of non-overlapping matches. Let ξ be a signal defined over X and σx(P ′ ) a shape with x ∈ X. We denote by reg the statistical regression with constraints which returns the pair of the parameter values v(P ′ ) which minimizes MSE under the constraint γ the associated µ(ξ, σx(v(P ′ ))), defined as follows: reg(ξ, σx, γ) = arg min v {MSE(ξ, σx(v(P ′ )))| v |= γ}, µ(ξ, σx(v(P ′ ))). We now show that µ (i.e. both MSE and CoD) can be computed in an online fashion. Given the two sequences y = y1, . . . , yn and f = f1, . . . , fn of observations and predictions, we define a recursive definition of MSE and CoD as follows. MSE(y,f, n + 1) = n n+1 MSE(y,f, n) + 1 n+1 (yn+1 − fn+1) 2 y¯(n + 1) = n n+1 y¯(n) + 1 n+1 yn+1 SStot(y, n + 1) = SStot(y, n) + (yn+1 − y¯(n))(yn+1 − y¯(n + 1)) SSres(y,f, n + 1) = SSres(y,f, n) + (yn+1 − fn+1) 2 R2 (y,f, n + 1) = 1 − SSres(y,f,n+1) SStot(y,n+1) We also require that matches of atomic shapes in a shape expression must have a minimum length λ > 1 defined by the user. We further assume that the shape matching automaton Aˆ, the signal ξ, the noise tolerance threshold ν and the minimum match length λ are given as global parameters to the main 132 procedure policy_scheduler and are implicitly propagated to all the other methods. We also define two auxiliary methods outq and out∆ as follows: outq(S) = {q ′ | ∃ (q, σx, γ, q′ ) ∈ ∆ for some q ∈ S} out∆(S) = {δ | ∃ δ = (q, σx, γ, q′ ) ∈ ∆ for some q ∈ S} The policy scheduler policy_scheduler searches for non-overlapping shape expression matches in ξ from time 0, using method expression_match. The call of expression_match at time t returns another time t ′ . If t ′ > t, the segment [t, t′ ] successfully matches the expression. The segment [t, t′ ] is added to the set of matches and the procedure expression_match is invoked again at time t ′ + 1. If t ′ ≤ t, it means that the expression could not be matched from time t. The procedure expression_match is invoked again at time t + 1. The shape matching procedure expression_match (see Algorithm 6) attempts in a recursive fashion to reach a final location from a set of locations S and time index t. The procedure returns another time index t ′ , where t ′ ≥ t if a final location can be reached in t ′ −t steps from a location in S, or t ′ = −∞ (the initial value of t ′ , see line 1) otherwise. If one of the locations is a final location, we have that t ′ = t (lines 2). If none of the locations in S is final, and we have not yet reached the end of ξ (lines 3 − 7), the procedure does the following. For every transition with a source location in S, labeled by σx and γ S (lines 4 − 7), atomic_match computes the end time τ of the longest match of σx that satisfies γ and starts at t (line 5). If there is no such match, τ equals to −∞, otherwise τ ≥ t + λ ¶ . For all the transitions that result in a match ending at time τ , we recursively call expression_match with the target location q ′ and time τ as inputs, and τ ′ as output (line 6). The procedure keeps the longest from the successful expression matches (line 7). This effectively allows the procedure to concurrently follow multiple paths and select the one that provides the longest match. ¶Recall that we require atomic matches of minimum length λ. 133 Algorithm 6: Shape expression match expression_match Input: Set of locations S, current end match time t Output: New end match time t ′ 1 t ′ ← −∞ 2 if S ∩ F ̸= ∅ then t ′ ← t 3 else if t < |ξ| then 4 foreach δ = (q, σx, γ, q′ ) ∈ out∆(S) do 5 τ ← atomic_match(δ, t) 6 if τ > −∞ then τ ′ ← expression_match({q ′}, τ ) 7 t ′ ← max{t ′ , τ ′} 8 return t ′ The atomic shape matching procedure atomic_match, shown in Algorithm 7, efficiently computes the longest match of an atomic shape starting from a given time index. It takes as inputs a transition δ = (q, σx, γ, q′ ) and the time index t, and returns the end time t ′ of the longest σx ν-noisy match [t, t′ ] that satisfies γ. The algorithm starts by fitting the shape σx to the segment ξ ′ = ξ [t,t+τ) under the constraint γ, using the regression method reg, and thus estimating the parameters v (lines 3). The procedure reg also returns the corresponding µ c of the performed regression. If the associated µ-value c is greater than the allowed noise tolerance ν, the procedure returns t ′ = −∞, meaning that the segment is not a good candidate for matching the shape. Otherwise, the algorithm iteratively extends the size τ of the segment as long as the µ-value between the extended prefix and σx(v(P ′ )) instantiated with the fixed parameter valuation v remains lower than or equal to ν (lines 4−10). We note that each extension of the signal prefix updates µ but not the parameter valuation. There are two possible reasons for µ becoming greater than ν: (i) either the estimated parameter valuation v needs to be updated, or (ii) the current prefix does not fit the shape under the constraint ν anymore with any valuation v. In the first case, the procedure re-estimates the new parameter valuation and re-computes µ (line 9). If the re-computed µ is smaller than or equal to ν and we didn’t reach the end of the signal, we repeat the match extension procedure. Otherwise, we terminate the procedure and return the time index t ′ where the current match (if any, otherwise t ′ equals to −∞) ended. 134 Algorithm 7: Atomic shape match atomic_match. Input: Transition δ = (q, σx, γ, q′ ), start match time index t Output: End match time t ′ 1 t ′ ← −∞ 2 if t + λ ≤ |ξ| then 3 τ ← λ; ξ ′ ← ξ [t,t+τ) ; (v, c) ← reg(ξ ′ , σx(P ′ ), γ) 4 while c ≤ ν do 5 t ′ ← t + τ 6 if t ′ < |ξ| then 7 τ ← τ + 1; ξ ′ ← ξ ′ · ξ(t ′ ) 8 c ← µ(ξ ′ , σx(v(P ′ ))) 9 if c > ν then (v, c) ← reg(ξ ′ , σx(P ′ ), γ) 10 else break 11 return t ′ 7.5 Implementation and Evaluation We implemented the algorithm from Section 7.4 into a prototype tool using Python programming language. We employed pattern matching of shape expressions to two applications – detection of patterns in electrocardiograms (ECG) and oscillatory behaviors in an aircraft elevator control system. All experiments were run on MacBook Pro with the Intel Core i7 2.6 GHz processor and 16GB RAM. 7.5.1 Detection of Anomalous Patterns in ECG In this case study, we consider ECG signals from the PhysioBank database, which contains 549 records from 290 subjects (209 male and 81 female, aged from 17 to 87). Each record includes 15 simultaneously measured signals, digitized at 1,000 samples per second, with 16-bit resolution over a range of ±16.384mV. The diagnostic classes for the subjects participating in the recordings include cardio-vascular diseases such as myocardial infarction, cardiomyopathy, dysrythmia and myocardial hypertrophy. Specification of an Anomalous Heart Pulse We consider the right bundle branch block (RBBB) heart condition, in which the right ventricle is not directly activated by impulses traveling through the right bundle branch. Figure 7.4a depicts a visual 135 (a) RBBB characteristics on channels v1, v2 (©A. Rad) (b) Signal on v6 channel (c) Magnified anomalous pulse Figure 7.4: Recognizing pulses in ECG signals characterization of the RBBB heart condition as it can be observed on channels v1 and v6. In this work, we concentrate on specifying the shape of the pulse depicted in v6 using shape expressions. The specification φ of the anomalous v6 pulse consists of a sequence of 7 atomic shapes: φ = exp(a1, b1, c1) : b1 > 0 · exp(a2, b2, c2) : b2 < 0· lin(a3, b3) : a3 > 0 · lin(a4, b4) : a4 < 0 · lin(a5, b5) : a5 > 0· exp(a6, b6, c6) : b6 > 0 · exp(a7, b7, c7) : b7 < 0 Evaluation We evaluated our shape expression matching procedure with respect to the recordings of a 70 year old patient that suffers from RBBB condition. The v6 channel recording of the patient, shown in Figure 7.4b, has 10,000 samples. In this experiment, we use CoD as our noise metric∥ . With noise threshold ν = 0.02, we were able to identify all the segments that match the specification in 28.98s. The matches are depicted as colored vertical bands in Figure 7.4b. Figure 7.4c zooms in on a single match and shows the ideal shape that was inferred to match the pattern. ∥We recall that ν = 0 denotes zero noise tolerance and ν = 1 allows arbitrary level of noise. 136 Table 7.1: Experimental Results (a) Sensitivity to the noise threshold ν |M atchhuman| |M atchν(φ)| |M atchν(φ ′ )| 0.70 4 9 4 0.24 4 7 4 0.20 4 5 4 0.10 4 4 4 0.02 4 4 4 0.01 4 0 0 (b) Runtime and memory requirements Num. Samples Runtime (s) Mem. (MB) 1,000 0.64 33.13 2,500 1.43 48.82 5,000 3.39 70.80 7,500 6.39 72.83 10,000 10.12 89.18 We now experimentally study how sensitive is the quality of the procedure outcome with respect to the noise threshold and the constraints on the parameters, and how well the procedure scales with the size of the input. Sensitivity to the noise threshold and the constraints on the parameters Domain knowledge in a particular application field can be used to derive more precise specifications. In the case of anomalous v6 pulses for patients with RBBB condition, such knowledge can be for instance used to refine its specification φ by further constraining the slope a3 to be greater than 0.5, resulting in specification φ ′ . We demonstrate the impact of the noise threshold to the quality of pattern matching in the cases of under-specified (φ) and over-specified (φ ′ ) shape expressions. Table 7.1a shows the results of the experiments, where column |M atchhuman| denotes the number of segments matched by the inspection of the signal by a human with domain knowledge and columns |M atchν(φ)| and |M atchν(φ ′ )| denotes the number of the segments matching the expressions φ and φ ′ by our procedure, respectively. 137 We can make several observations from this experiment. First, any inclusion of even limited domain knowledge in the specification can significantly improve the quality of the matching process and make it much more robust to different noise thresholds. Second, our approach can result in missing patterns or detecting false patterns. This result is expected – very low noise thresholds enable to only match shapes that are very close to the ideal one, while very high noise threshold result in matching shapes that are far away from the specification. As a consequence, our procedure may require tuning of parameter constraints for complex specifications. Scalability We now evaluate the scalability of our procedure with respect to the size of the signal, taking into account the computation time and the memory requirements. Table 7.1b summarizes the results. 7.6 Conclusion In this chapter, we proposed shape expressions as a language for specification of rich and complex temporal patterns. We studied essential properties of shape expressions and developed an efficient heuristic pattern matching procedure for this specification language. We believe that this work explores the expressiveness boundaries of declarative specification languages. We will pursue this work in several directions. We will apply our technique to examples from more application domains. We will study more sophisticated matching methods that will minimize the need of tuning parameter constraints. We will compare more closely our approach to the work on classical regular expression matching on one hand, and purely machine learning feature extraction methods on the other hand. We will finally investigate the application of shape expressions in testing CPS with the particular focus on generating test cases from such a specification language. 138 Chapter 8 Mining Shape Expressions from Positive Examples 8.1 Introduction From self-driving cars and service robots to the rapidly proliferating Internet-of-Things (IoT) devices, cyber-physical systems (CPS) are becoming pervasive in every aspect of our daily life. CPS applications embed computational units with physical entities such as sensors and actuators designed to tightly interact with some physical component or the real-world environment. With the recent strides in artificial intelligence and machine learning, CPS applications are evolving to be tremendously complex systems that can operate (autonomously) in sophisticated and unpredictable environments. It is common in CPS applications to use many different kinds of sensors and monitors to gather timeseries data about various aspects of the application’s operation. This includes data about the device’s environment, its internal system variables, or physical characteristics of the device (such as power, speed, temperature). A quandary facing many of the application developers is that there is a veritable deluge of gathered data in these systems, and designers are struggling to analyze, utilize and characterize the gathered data. One solution is to consider the vast literature on time-series analysis in the machine learning community. In the context of (unsupervised) learning, most of the work in the ML community focuses on identifying distance metrics on time-series data that enable effective clustering algorithms [146, 145], or identifying features from the data itself [271, 185, 170]. Most ML techniques, however, suffer from lack of 139 interpretability: for example it is almost impossible to relate the computations performed by successive layers of deep neural networks to the humanly comprehensible reasoning steps. In the last years, several papers [107, 153, 253, 226] have highlighted the need to reconcile ML techniques with symbolic AI such as logic and formal languages to complement and to address this shortcoming. Symbolic representations benefit of their declarative nature. They are re-usable, data-efficient and compositional. They provide an high-level and abstract framework that facilitates generalisation. Furthermore, since they are languagelike, they are verifiable and generally closer to the human understanding. One of the goals of this chapter is similar to that of ML methods: to extract information from data. However, the specific usage scenarios that we discuss in this chapter require information mining techniques that result in structured artifacts that are interpretable, by the human or by the machine. Consider the problems of mining temporal and logical information about a particular system variable or a physical quantity (i.e. the specification mining problem), the problem of mining the temporal and logical conditions on environment signals that ensures correct system behavior (i.e. the assumption mining problem), or the problem of automatically mining logical patterns from data (i.e. the explainable clustering problem). In each of these applications, explainability has a high value. Our research hypothesis in this chapter is that the recently developed language of Shape Expressions (SEs) [189] is an explainable and interpretable formalism to perform effective mining from a time-series dataset. SEs essentially allow us to express mean behaviors in the presence of noise and admit timeseries dataset, and demonstrate its efficacy on real-world data. We validate our research hypothesis by developing a new procedure for mining shape expressions from (positive examples of) time-series data. In the CPS context, each time-series datum is a sequence of time-value pairs encoding system behaviors or a discrete-time trace of the value of a particular system variable (e.g. sensor output, actuator input, state variable, etc.). Given a set of such time-series data, our specification mining procedure performs three main steps: 140 1. Segmentation: this step transforms each time-series datum into a (minimal) sequence of linear segments that can optimally approximate the original data within a given error threshold. The algorithm to accomplish this combines linear regression with a recursive procedure to create such a piecewise-linear (PWL) sequence, where for each inferred piece, we learn the parameters: slope, offset and duration. 2. Abstraction: this step clusters linear segments based on the similarity between their parameter values into some finite number of clusters. For each resulting cluster we assign a unique symbol; thus the symbol representing a cluster conservatively approximates all the line segments in the cluster. This step defines an abstraction function from line segments to the finite alphabet of symbols, and thereby allows us to map raw time-serie data into abstract sequences of symbols (finite words) over this alphabet. 3. Learning: this step infers temporal properties from the abstract traces by using a standard algorithm for learning deterministic finite automata (DFA) from positive examples. Finally, we map the DFA to the SE representation. The resulting shape expression provides a structured model of the observed system behavior that can be explained and interpreted by a human or a machine. The resulting model can group similar and repeating patterns within the behavior. Learning shape expressions from multiple classes of data can facilitate characterizing similarities and differences between different classes. The presented approach is a natural fit for mining specifications from behaviors of system that exhibit piecewise linear behavior but can be also used to approximate nonlinear behavior. Approximating complex nonlinear dynamics with piecewiselinear [78, 77, 22, 118] or multi-affine functions [117] is also a common practise in literature for system identification and for learning hybrid models amenable to formal analysis. We demonstrate the applicability of our approach on two case studies from different application domains and experimentally evaluate the implemented specification mining procedure. 141 0 100 200 6 8 10 0 100 200 6 8 10 0 100 200 6 8 10 0 100 200 6 8 10 0 100 200 5 6 7 8 9 10 0 100 200 6 8 10 Figure 8.1: Illustrative example - set of pulses. Illustrating example We use noisy pulses shown in Figure 8.1 to illustrate the various steps in our specification mining approach. Such analog pulses are common in many electronic applications. Distributed Standard Interface (DSI3) is one such example from the automotive domain, where analog pulses are used to encode communication between the micro-controller and sensors in an airbag system-on-chip. The six examples depicted in Figure 8.1 were synthetically generated – they all appear to visually have similar shape. However we can observe that the segments in the three top pulses have a steeper slope then the corresponding ramping segments in the bottom pulses. 8.2 Related Work In the last decade, specification mining [269, 8, 23, 128, 38, 140, 188, 283, 148, 53, 52, 59, 187, 186, 152, 138, 154] has become a very active research field supporting the analysis and the CPS development. Our approach builds on top of shape expressions [189], a recently introduced declarative language for specifying and monitoring sophisticated temporal patterns from possibly noisy data. On the contrary, most of the 142 current work in specification mining for CPS [51, 252, 269, 23, 128, 38, 140, 188, 283, 148, 53, 59, 187, 186, 152, 138, 154] centers the development around Signal Temporal Logic [177] (STL) (and its extensions), a temporal logic defined to reason about dense-time mixed-analog signals. A considerable part of the literature focuses [269, 23, 128, 38, 140, 188, 283] on the problem of learning the optimal parameters starting from a specific template formula (for example invariants [96, 42]). Learning both the structure of the formula and its parameters is more challenging and it has been addressed only recently [148, 53, 38, 59]. The common approach consists in a two-steps heuristics in which first the structure of the formula is inferred and then its parameters are optimised. The structure of the formula can be determined, for example, by using decision trees [53], genetic algorithms [187, 59, 154, 138] or SATbased algorithms [186]. All the aforementioned methods, with the exception of [138], require both positive and negative examples in the learning phase, while our approach is based only on positive examples. The work in [138] introduces the notion of tightness metric to learn template-driven STL formulas that satisfy as tight as possible a set of user-provided positive examples. In the same paper, the authors show how to combine their approach with a genetic algorithm to infer also the structure of more arbitrary formulas, similarly to [187]. However, genetic algorithms, that are metaheuristics, have the tendency to converge towards local optima rather than the global optimum of the problem. This results in the impossibility to assess whether the learned formula is really the optimal one. Our approach differs from the work in [138] in many ways. First, we consider a different specification language closer to regular expressions than temporal logic. Second, we choose the mean squared error as our measure of tightness of the specification with respect to the observed signal. This measure is used to guide the optimal split of signals in basic shapes (in this chapter we choose for simplicity lines as our basic shapes). Using standard clustering algorithms, the obtained shapes can be grouped according to their parameters. Each cluster can be expressed as a basic shape expression with constrained parameters that would match the same signals matched by the basic shape expressions in the cluster without incrementing the mean squared 143 error. This allows to derive a finite alphabet of symbols (each representing a different shape expression with constrained parameters) and to represent each signal in the training set as a finite sequence of them. Finally, in such setting we show how to use more reliable passive automata learning algorithms [125] rather than metaheuristics (e.g., genetic algorithms) to infer the structure of the expression. Our approach does not require to interact with a reactive system and it is indeed orthogonal to active automata learning (AAL) such as L∗ Angluin’s algorithm [20] and its recent developments [132, 236]. AAL is generally employed to learn how to interact with the surrounding environments [69, 105] and it needs to exercise the system in order to infer the relation between the provided input and the observed output. Our approach can be instead applied directly, without the necessity to provide an input. Finally, the problem of piecewise approximation of a signal has received considerable attention in mathematical literature [44, 195, 194, 209]. In particular, Bellman and Roth introduced in [44] a dynamic programming algorithm to compute the curve fitting of a set of data by a set of straight lines. The key idea of their approach is to employ a two-dimensional grid of N points in the abscissa and M points in the ordinate and to search the segmentations of length L, L = 1, 2, . . . , N −1, such that for each L the total error, measured as the sum of the max absolute error of the data with each fitting segment, is minimal among all possible segmentations of length L. For a sufficient large M, their approach approximates our piecewise fit that employs instead regression as the main ingredient. However, our approach does not require the user to provide an extra parameter M. Furthermore, the M points considered in [44] are generally chosen, for practical reasons, over the min and the max value of the signal. This does not necessary produce the best possible local fit, because sometimes a linear segment to fit well the data, needs to start from a value that is bigger or smaller than the max or the min value of the signal respectively. Recent works of Ozay et al. [191, 70]. adapted the dynamic programming algorithm in [44] to infer the linear segments through least squares error directly over the signal and generalise the approach to the segmentation of time-varying affine autoregressive exogenous (ARX) models. Although, our approach to compute the optimal (w.r.t. to 144 the maximum mean square error) split in basic shapes of a signal can be considered as a special case of the aforementioned approaches (see Section 8.3), the focus of our chapter is on mining a specification for a set of signals in a formal language and we use the segmentation as one of the main ingredients. 8.3 Learning Shape Expressions from Examples In this section, we present the procedure for mining shape expressions from positive examples. The highlevel overview of the approach is depicted in Figure 8.2. The procedure starts by approximating each time series into a sequence of linear segments. Each individual dimension x in a segment is fully characterized by three parameters: (1) slope ax, (2) (relative) offset bx and (3) duration dx. In the next step, the procedure collects all the inferred segments from all examples, and clusters them according to ax, bx and dx using the k-means clustering method, where k is dynamically determined. We use the clustering outcomes to generate a finite alphabet Σ of size k, where each letter in Σ corresponds to a specific cluster. In the next step, we map each line segment approximation to its associated letter and hence obtain a set of finite traces over that alphabet. We finally use a passive automata learning algorithm to learn a deterministic finite automaton (DFA) from the set of words, which we translate to a shape expression. In the remainder of the section, we present in detail each individual step of the procedure. 8.3.1 Approximating Time Series with Sequences of Linear Segments Let ξ = (t1, ξ(t1))· · ·(tn, ξ(tn)) be a signal of size n and ϵ an error threshold. In this section, we present a dynamic programming-based method that finds the piecewise-linear approximation of ξ that minimizes its number of linear segments while ensuring that each segment is approximated with MSE bounded by the threshold ϵ. The idea of using dynamic programming (DP) to compute a piecewise-linear approximation of a signal is not new. Bellman and Roth introduced a DP-based algorithm decades ago [44]. They showed that DP 145 Figure 8.2: Learning shape expressions from examples - an overview. offers a simple and direct approach to determine the linear segmentation on a predefined grid which best approximates a given signal. This idea has been recently (i) adapted to fit the linear segments directly on the signal rather than selecting them from a predefined grid, and (ii) generalized to the problem of segmentation of time-varying affine autoregressive exogenous (ARX) models of the form: yt = Xna i=1 a i t yt−i + Xnc i=1 c i tut−i + kt + ηt (8.1) 146 where u, y and η denote the input, output and noise, respectively, and t ∈ [t0, N] with t0 = max(na, nc) [191, 70]. The method we use in this work is a special case of the approach proposed in [191, 70] in which we set na = 0, nc = 1 and ut−1 = t in equation (8.1) and select the mean squared error, i.e. the ℓ2-norm with square, as fitting error function. Moreover, we take the maximum among the fitting errors of the single segments, i.e. the ℓ∞-norm, instead of their sum to define the overall fitting error of a segmentation. This will slightly change the recursion equation in the DP setting but the approach remains basically the same as in [191]. In the following, we sketch an algorithm which efficiently solves the problem of finding the minimum number of switches with bounded fitting error as introduced in [191] under the above mentioned settings. We first compute the linear regression and its associated MSE for every segment of ξ. Let P = {ax, bx | x ∈ X} be slope and relative offset parameters for each dimension x ∈ X. Given a segment u = ξ[i : j], 1 ≤ i < j ≤ |ξ|, we denote by lr(u) the linear regression of u, defined as lr(u) = arg min v∈V (P) MSE(u, uˆv), where uˆv is the linear approximation of u with respect to v, i.e., we have uˆv = (ti , vˆi)· · ·(tj , vˆj ) such that vˆk(x) = v(ax)tk + (v(bx) − v(ax)tk), for all i ≤ k ≤ j and x ∈ X. We store the linear regression and its associated MSE for every segment of ξ in arrays p and e, respectively. In particular, we have that p[i][j] = v ⋆ and e[i][j] = MSE(u, uˆv ⋆ ), where u = ξ[i : j] and v ⋆ = lr(u). For simplicity, we will denote by MSElr(u) the MSE of a segment u relative to its optimal linear approximation uˆv ⋆ computed by lr(u), i.e. MSElr(u) = MSE(u, uˆv ⋆ ). A split τ of ξ is a sequence τ = {s1, s2, . . . , sk, sk+1} of indices such that 1 ≤ k < n, s1 = 1 and sk+1 = n. We denote by |τ | = k + 1 the size of τ . Note that τ induces over ξ exactly k adjacent segments 147 ξ[si : si+1], i ∈ [1, k]. If k = 1, τ = {1, n} induces exactly one segment ξ[1 : n] over ξ which is the whole signal ξ. We now lift the definitions of linear regression and MSE to the piecewise-linear case, driven by the split τ . We define the τ -linear regression, denoted by lrτ (ξ), as the sequence lrτ (ξ) = {v ⋆ 1 , . . . , v⋆ k } of parameter valuations, where for all i ∈ [1, k], v ⋆ i = lr(ξ[si : si+1]). We define the error of the τ -linear regression, denoted by Eξ(τ ), as the dominant MSE among all MSElr(ξ[si : si+1]), i ∈ [1, k]: Eξ(τ ) = max i∈[1,k] MSElr(ξ[si : si+1]) We say that τ ϵ-approximates ξ, denoted by τ ∼ϵ ξ, if Eξ(τ ) ≤ ϵ. Given another split τ ′ of ξ, we say that τ and τ ′ are ϵ-equivalent, denoted by τ =ϵ τ ′ if |τ | = |τ ′ | and Eξ(τ ) = Eξ(τ ′ ). We say that τ ϵ-refines τ ′ , denoted by τ <ϵ τ ′ if τ ∼ϵ ξ, τ ′ ∼ϵ ξ and either |τ | < |τ ′ | (first criteria) or |τ | = |τ ′ | and Eξ(τ ) < Eξ(τ ′ ) (second criteria). Moreover, we say that τ is ϵ-optimal for ξ if τ ∼ϵ ξ and τ ≤ϵ τ ′ for all τ ′ of ξ such that τ ′ ∼ϵ ξ. We are now ready to define our method split, presented in Algorithm 8, that finds an optimal τ -split of ξ with respect to the threshold ϵmax. The procedure is implemented as a dynamic programming algorithm that recursively finds optimal splits for sub-segments of ξ. The algorithm maintains the arrays s and e that in each cell s[i][j] (originally initialized to the empty sequence {}) and e[i][j] (originally initialized to MSElr(ξ[i : j])) stores the ϵmax-optimal split τ ∗ of ξ[i : j] and Eξ(τ ∗ ), respectively. The procedure∗ is initially invoked with split(ξ). The inductive step split(ξ[i, j]) works as follows. The algorithm first checks whether an optimal split for ξ[i : j] has been already computed and stored in s[i][j]. If yes, the available result is returned (lines 2−3). If not, the procedure needs to compute the optimal split. There are two possibilities. If ξ[i : j] can be ∗To simplify the presentation, we assume that ϵmax, p, e and s are global variables which are initialized accordingly before split(ξ) is invoked. 148 linearly approximated with regression MSE smaller than or equal to ϵmax, then no further split is needed (lines 5 − 6). Otherwise, ξ[i : j] must be split in smaller segments (lines 7 − 13). We first initialize τ ∗ to the set of all indices of ξ[i : j] and ϵ ∗ to the corresponding Eξ(τ ∗ ) which is 0 since all segments induced by this τ ∗ over ξ[i : j] consist of just 2 data points (line 8). We then generate all promising ϵmax-optimal split candidates of ξ[i : j] (lines 9 − 12) and we update τ ∗ and ϵ ∗ when the current candidate ϵmax-refines the current τ ∗ (lines 13 − 14). That is, for all k ∈ [i + 1, j − 1], the algorithm recursively applies split(i, k) to compute an optimal split τL of the prefix segment ξ[i : k] and loads from e[i][k] its Eξ(τL) into ϵL. Moreover, the algorithm approximates the remaining suffix segment ξ[k : j] with just one line segment given by the linear regression of ξ[k : j] and loads in ϵR its MSElr available in e[i][k] (line 10). It then builds a new ϵmax-optimal split candidate τ by joining τL with τR and computing its Eξ(τ ) denoted by ϵ as the maximum between ϵL and ϵR (line 11). If τ ∼ϵmax ξ[i : j], i.e. ϵ ≤ ϵmax, and τ <ϵmax τ ∗ (line 12), then τ ∗ is updated with the new candidate τ (line 13). After k iterations, τ ∗ and ϵ ∗ contain the first encountered ϵmax-optimal split of ξ[i : j] and its corresponding Eξ(τ ∗ ), respectively. Finally, τ ∗ and ϵ ∗ are cached to s[i][j] and e[i][j] (line 14) and the procedure returns τ ∗ . Example 8.3.1. Figure 8.3 depicts the optimal τ -split of the raw pulse time series resulting from the application of the split procedure. Next, we show that our split procedure computes an ϵmax-optimal segmentation of ξ in time quadratic to the size of the trace. Theorem 8.3.2. Let ξ be a signal, ϵmax a threshold and τ = split(ξ). We have that τ is an ϵmax-optimal split of ξ. 149 0 100 200 time 6 8 10 All pulses 0 100 200 time 6 8 10 Pulse 1 0 100 200 time 6 8 10 Pulse 4 Figure 8.3: Example - inferring linear segments from pulses. Algorithm 8: split - optimal splitting of a signal segment. Input : ξ - signal of the form (ti , vi)· · ·(tj , vj ) Output: τ ∗ - an ϵmax-optimal split of ξ 1 Function split(ξ) 2 if s[i][j] ̸= {} then 3 τ ∗ ← s[i][j] 4 else 5 if e[i][j] ≤ ϵmax then 6 τ ∗ ← {i, j}; ϵ ∗ ← e[i][j] 7 else 8 τ ∗ ← {i, i + 1, . . . , j}; ϵ ∗ ← 0 9 for k ∈ {i + 1, ..., j − 1} do 10 (τL, ϵL) ← (split(ξ[i, k]), e[i][k]) 11 (τR, ϵR) ← ({k, j}, e[k][j]) 12 τ ← τL ∪ τR; ϵ ← max(ϵL, ϵR) 13 if ϵ ≤ ϵmax and (|τ | < |τ ∗ | or (|τ | = |τ ∗ | and ϵ < ϵ∗ )) then 14 τ ∗ ← τ ; ϵ ∗ ← ϵ 15 s[i][j] ← τ ∗ ; e[i][j] ← ϵ ∗ 16 return τ ∗ Proof Sketch. Assume that split∗ is a procedure that computes an ϵmax-optimal split of a signal. Consider an arbitrary signal ξ and let ξ = ξ1 · ξ2 · · · ξn be an optimal partition τ ∗ = split∗ (ξ) of ξ in n segments with Eξ(τ ∗ ) = max1≤i≤n MSElr(ξi). We prove that τ = split(ξ) splits ξ in exactly n segments with Eξ(τ ) = Eξ(τ ∗ ). We first prove by induction on the number of segments in the optimal solution that for every 1 ≤ i ≤ n, τi = split(ξ1 · · · ξi) splits ξ1 · · · ξi in at most i segments with Eξ(τi) ≤ max1≤j≤i MSElr(ξj ). Thus split(ξ1 · · · ξn) splits ξ in at most n segments with Eξ(τ ) ≤ Eξ(τ ∗ ). 150 By assumption that ξ = ξ1 · ξ2 · · · ξn is an optimal split, it follows that split(ξ) splits ξ in exactly n segments with Eξ(τ ) = Eξ(τ ∗ ). Base case i = 1: by assumption, we have that MSElr(ξ1) ≤ ϵmax, hence by the definition of Algorithm 8 τ1 = split(ξ1) does not further split ξ1 and Eξ1 (τ1) = MSElr(ξ1). For 1 < i < n, assume by inductive hypothesis that τi = split(ξ1 · · · ξi) splits the ξ1 · · · ξi signal in i segments with Eξ1···ξi (τi) ≤ max1≤j≤i MSElr(ξj ). We now show that τi+1 = split(ξ1 · · · ξi ·ξi+1)splits ξ1 · · · ξi ·ξi+1 in i + 1 segments with Eξ1···ξi+1 (τi+1) ≤ max1≤j≤i+1 MSElr(ξj ). By assumption, MSElr(ξi+1) ≤ ϵmax. It follows that the partition of split of ξ1 · · · ξi+1 into split of ξ1 · · · ξi and split of ξi+1 is a valid candidate, according to Algorithm 8. By assumption, we have that split splits w1 · · · wi into i segments and split splits wi+1 into a single segment. In addition, we have that Eξ1···ξi+1 (τi+1) = max(Eξ1···ξi (τi), MSElr(ξi+1) by definition of split. By assumption, we have that Eξ1···ξi (τi) ≤ max1≤j≤i MSElr(ξj ). It follows that Eξ1···ξi (τi+1) ≤ max1≤j≤i+1 MSElr(ξj ). Since there may be more optimal split candidates, it follows that split splits ξ1 · · · ξi+1 into at most i + 1 segments with error bounded by max1≤j≤i+1 MSElr(ξj ). The linear segmentation of ξ of size |ξ| = n requires computing the linear regressions for all ξ[i : j], 1 ≤ i < j ≤ n, which makes n 2−n 2 linear regressions in total. Lemma 8.3.3 shows that this can be done in O(n 2 ) time by employing the incremental linear regression. Lemma 8.3.3. Let ξ be a signal of size n. Then the linear regressions for all ξ[i : j], 1 ≤ i < j ≤ n, are computed in O(n 2 ) time. Proof Sketch. The linear regressions for all ξ[i; j], j ∈ [i + 1, n], can be obtained as by-products by incrementally computing the linear regression of ξ[i : n]. Thus, we only need to incrementally compute the linear regressions for n − 1 segments comprising n, n − 1, . . . 2 data points, respectively. Since the incremental simple linear regression of a segment comprising k data points is computed in O(k) time, it follows that the n − 1 incremental linear regressions are computed in O(n + (n − 1) + . . . + 2) = O(n 2 ) time. 151 Proposition 8.3.4. Let ξ be a signal of size n and ϵmax a threshold. Then split(ξ) is computed in O(n 2 ) time. Proof Sketch. By construction, split(ξ) invokes itself recursively to first solve the optimization sub-problems for the sub-segments of ξ in the following order: split(ξ[1 : 2]), split(ξ[1 : 3]), ..., split(ξ[1 : n − 1]). That is, for each k, 2 ≤ k ≤ n − 1, split(ξ[1 : k]) is invoked exactly n − k times. Thus, we can optimize the split algorithm by caching the result of split(ξ[1 : k]) when it is invoked the first time and retrieving it from the cache in constant time at all successive invocations. Computing split(ξ[1 : k]) when it is invoked the first time can be done in O(k) time since it requires the access of k−2 cached results and the computation of a finite number of other operations which is upper bounded by a constant. The remaining n−k −1 invocations of split(ξ[1 : k]), each one requiring the access in constant time of the cached result, are computed in O(n−k) time. It follows thatsplit(ξ) can be computed in O( Pn−1 k=2 n) = O(n 2 ) time. From Lemma 8.3.3 and Proposition 8.3.4 it follows that, given a signal ξ of size n and a threshold ϵmax, an ϵmax-optimal split of ξ is computed in O(n 2 ) time. Indeed, we first need to compute the linear regressions for all ξ[i : j], 1 ≤ i < j ≤ n, which is done in O(n 2 ) time according to Lemma 8.3.3, and subsequently run split(ξ), which is also executed in O(n 2 ) time according to Proposition 8.3.4. Note that the time complexity O(n 2 ) includes the worst case scenario which occurs when split(ξ) performs the maximum number of recursion invocations possible in order to compute an ϵmax-optimal split, i.e. the ϵmax-optimal split has n−1 segments. This is a polynomial order lower than the complexity from [191, 70] applied to our simple linear regression setting. This optimization is possible by employing the incremental linear regression of a segment which produces the linear regressions of all sub-segments along the way as by-products and do not need to be computed independently from scratch. Approximating Noisy Non-Linear Data with Sequences of Linear Segments The segmentation algorithm presented in this section is particularly effective in approximating behavior of piecewise-linear systems. For this class of systems, it is natural to associate the MSE resulting from the segmentation 152 Figure 8.4: Approximating exponential decay with linear segments – robustness to noise in data. algorithm to the presence of noise in the observed data. The same segmentation procedure can also be used to approximate non-linear behaviors. It should be noted that the MSE resulting from the application of our segmentation method to non-linear data does not only come from the noise in data, but also from the linear approximation of non-linearities. There is a trade-off between having an accurate approximation that requires multiple linear segments and the risk of over-fitting the data. In addition, the non-linearity approximation error can reduce the robustness of the segmentation procedure to noise. We illustrate the challenge of segmenting noisy non-linear data with an exponential decay example to which we add normally distributed noise. Figure 8.4 shows the effect of varying the maximum noise threshold ν and the standard deviation σ of the normally distributed noise with mean 0 to the outcome of the segmentation procedure. We can observe that the procedure consistently approximates the signal with two segments for σ ∈ [0, 0.03] and ν ∈ [0.005, 0.01]. By lowering ν to 0.002, the less noisy signals are still approximated with two segments. However, the signal with noise that has σ = 0.03 requires a third approximation segment. Similarly, less noisy signals are approximated with three segments when ν equals to 0.001, while the noisiest signal with σ = 0.03 requires a four-segment approximation. We can see that the level of noise and non-linearity in data can affect the outcome of the segmentation procedure. 153 8.3.2 Abstracting Sequences of Linear Segments to Finite Traces over Finite Alphabets Clustering is an unsupervised learning method for grouping data points with similar properties. Given a set of data points U and a finite alphabet Σ, clustering procedures aim at finding an appropriate labeling function λ : U → Σ that maps data points u ∈ U to letters in Σ. Clustering algorithms aim at grouping data points with similar features, while ensuring highly dissimilar features between different clusters. Let P = {ax, bx, dx | x ∈ X} be the set of parameters, where ax is the slope, bx the relative offset and dx the duration of a segment defined over x ∈ X. A valuation v(P) assigns real values to these parameters, and V (P) denotes the set of all valuations over P. We note that valuations in V (P) provide unique characterizations of linear (multi-dimensional) segments. Given a set U ⊆ V (P) of parameter valuations obtained in Section 8.3.1, our objective is to group and label similar segments. We use the well-known k-means clustering method [175] to find our labeling function λ. This procedure aims to partition |U| valuations into k clusters in which each parameter valuation v ∈ U belongs to the cluster with the nearest mean (called centroid), serving as a representative of the cluster. Implicit objective function in k-means measures the within cluster sum of squares (WCSS), which is the sum of distances of data points from their cluster centroids. First, the procedure initializes k centroids. The common implementation of this step consists in chosing random values for centroids. In the second phase, the algorithm repeats the following actions until it converges: (1) it calculates the distance between all valuations and centroids, assigning each valuation to the cluster represented by its closest centroid, and (2) it updates centroid positions by finding the average values of data points that are part of the cluster. We use the popular elbow method to find the optimal number k of clusters for U. It is typically used as a visual method, consisting of plotting the value of the WCSS produced by different values of k and finding the “elbow” in the plot. We automate this step by defining a threshold c and stopping when the difference between WCSS for k and k − 1 clusters falls below c. 154 −0.2 0.0 0.2 Slope 20 40 60 Duration Line segment parameters 2 4 6 8 10 Nb. clusters 0.0 0.2 0.4 0.6 WCSS Elbow −0.2 −0.1 0.0 0.1 0.2 0.3 Slope 20 40 60 Duration A B C D E Clustered data −0.26 −0.24 −0.22 −0.20 −0.18 Slope 18 20 22 24 Duration A Overapproximation box Figure 8.5: Example - clustering. The k-means clustering algorithm maps U into an alphabet Σ of size |Σ| = k. In the next step, we provide a meaning to each letter a ∈ Σ. Intuitively, for each cluster, we define the tightest box that includes all of its data points (parameter valuations) and represent it symbolically as a set of constraints. Let λ −1 : Σ → 2 U denote the inverse labelling function. For each A ∈ Σ, we associate a constrained atomic shape expression ψA = V x∈X linx(ax, bx, dx) : ax ∈ [a 1 x , a2 x ] ∧ bx ∈ [b 1 x , b2 x ] ∧ dx ∈ [d 1 x , d2 x ], where for all p ∈ P, p1 = minv∈λ−1(A) v(p) and p2 = maxv∈λ−1(A) v(p). Example 8.3.5. Figure 8.5 (top-left) shows all the pulse linear segment parameters projected on the slope and duration valuations. Figure 8.5 (top-right) depicts the result of applying k-means clustering to these segments and for different values of k, while Figure 8.5 (bottom-left) shows the parameters grouped into k = 5 clusters, the optimal number of clusters according to the automated elbow method. Figure 8.5 (bottom-right) illustrates 155 the bounding box that over-approximates all the data points in the cluster labelled by A. We associate to the letters of the inferred alphabet the following symbolic expressions: A : lin(a1, b1, d1) : a1 ∈ [−0.261, −0.197] ∧ b1 ∈ [9.06, 9.70] ∧ d1 ∈ [18, 24] B : lin(a2, b2, d2) : a2 ∈ [−0.007, 0.031] ∧ b2 ∈ [9.23, 10.09] ∧ d2 ∈ [26, 44] C : lin(a3, b3, d3) : a3 ∈ [−0.001, 0.001] ∧ b3 ∈ [4.96, 5.12] ∧ d3 ∈ [69, 74] D : lin(a4, b4, d4) : a4 ∈ [0.110, 0.116] ∧ b4 ∈ [4.43, 4.60] ∧ d4 ∈ [38, 39] E : lin(a5, b5, d5) : a5 ∈ [0.199, 0.263] ∧ b5 ∈ [4.35, 4.45] ∧ d5 ∈ [20, 22] (8.2) 8.3.3 Inferring Expressions from Finite Traces In the previous section, we showed how we can partition the set of linear segments to a few equivalence classes through discretization of their parameter space. We also showed how we can associate each equivalence class with a symbol. Let the set of such symbols be denoted by Σ. Essentially, through the methods in previous sections, we have mapped each timed trace into a string in Σ ⋆ . In this section, we show how we can learn a deterministic finite automaton (DFA) that accepts every string corresponding to all the positive examples in our dataset. For this purpose, we use an off-the-shelf DFA learning algorithm called RPNI (Reduced Positive and Negative Inference) [125]. Given a set S of example strings, the first step in RPNI is to construct the prefix tree acceptor (PTA) from the given examples. The PTA A is described as a tuple (Q, Σ, qλ, δ, FA), where Q is the finite set of states, qλ is the initial state, δ ⊆ Q × Σ → Q is the transition function, and FA is the set of accepting states. The set Q is essentially the prefix-closure of S, and for every state qu ∈ Q, and every a ∈ Σ, δ(qu, a) = qua. Finally, if u ∈ S, qu ∈ FA. While a PTA constructed using the above rules is a DFA accepting the given set of examples, it is typically not the minimally consistent DFA. The algorithm RPNI is a heuristic to minimize the DFA. A common idea in passive learning of DFAs from examples is to label the states of the PTA as Red (which is 156 1 2 3 4 5 6 7 A B C D E A A φ = A · B · C · (D ∪ E) · A (8.3) Figure 8.6: Inferred automaton and expression. initially just the initial state corresponding to the empty string), and Blue (initially the prefix-closure of the set S). The RPNI algorithm then repeatedly selects a Blue state and seeks to merge it with a Red state, while ensuring that the merged automaton is compatible with the given set of examples. Example 8.3.6. We use the RPNI algorithm to learn an automaton from a set of finite traces obtained from abstracting the segmented pulses and map it to its associated shape expression. Figure 8.6 depicts the inferred automaton and the shape expression. We can observe that the variability in the slope of the before-last segment is translated into the choice of taking either the transition labelled by D or E from the state 4, or equivalently into the union operation in the shape expression. The procedure is thus effectively able to distinguish between two different classes of pulses in unsupervised fashion. We finally also illustrate how our procedure generalizes the repeated shapes by learning repetitions in the form of a Kleene start. The result of using both individual pulse shapes of the first class, together with a pulse train signal shown in Figure 8.7 results in learning the specification (A∗ · B · C · D) + 8.3.4 Discussion The specification mining procedure, presented in this section, adopts several design choices that we motivate but we also discuss potential alternatives: 157 Figure 8.7: Train of pulses and its segmentation. • Linear segments: the restriction to linear atomic shapes has a twofold motivation: (1) linear approximations are easy to understand and hence facilitate explainability of inferred specifications, and (2) linear approximations admit efficient regression algorithms. For some applications, linear segments may not be sufficient to faithfully capture a shape. Our approach can be generalized to richer atomic shapes at the expense of computational efficiency. • Duration as a parameter: we chose to treat segment duration as a parameter that we use in the process of mapping raw data to a finite trace. In practice, two segments with similar slopes and relative offsets, but different durations my be mapped to two different letters of the inferred finite alphabet. An alternative approach would consist in treating the duration as a special parameter. We could use the slope and the relative offset to map sequences of linear segments to finite words, and then use durations to extend finite words to timed words. We could use methods such as timed k-trail [192] to learn a timed automaton from the set of timed traces. • Semantics of a Letter in the Alphabet: we map each letter in the finite alphabet to a box constraint over parameters that include all the segments in the cluster that characterizes that letter. There are several other approaches that could be used: (1) associating a more sophisticated over-approximation, such as a zonotope, of the segments in the cluster, or (2) defining the cluster centroid as the representative of the cluster and increase the noise tolerance based on the properties of the cluster. To simplify the presentation, 158 we also use the minimal bounded box that includes all the points in the cluster. This means that the overapproximation is not robust for the extreme points in the cluster. This problem can be addressed by bloating the bounding box by an amount defined by the user. • Clustering and Learning Algorithms: we use k-means clustering and RPNI as off-the-shelf algorithms to do clustering and passive learning from positive examples. These algorithms can be replaced with other procedures that implement the same function in a different manner. For instance, we can use silhouette clustering [219] instead of k-means clustering. 8.4 Experimental Evaluation In this section, we analyze both computational and qualitative aspects of our specification mining procedure and demonstrate its usability on two case studies. All the experiments were performed on a Razer computer with the Intel Core i7 4.1 GHz processor and 16GB RAM. 8.4.1 Experimental Results We use our illustrative train of pulses to study computational and qualitative aspects of our specification mining procedure. We first analyse the scalability of our approach with respect to both the size and the number of the training examples. We summarize the results in Table 8.1, where ts, tc, tl and ttotal denote segmentation, clustering, automata learning and total time. We can first observe that the experimental results follow the theoretical quadratic growth in the size of the signals and linear growth in the number of signals. We can also see that the segmentation part dominates the computation. This result is expected given that the segmentation is used to map a usually large number of raw data samples into a relatively small number of segments. The computation time for automata learning is negligible with respect to other parts of the procedure. This is mainly due to the fact that there are often equivalent abstract (finite) traces that are fed to the automata learning algorithm. 159 # traces |w| ts(s) tc(s) tl(s) ttotal(s) 1 10 0.0001 0.1020 0.0006 0.1027 1 100 0.0105 0.1451 0.0146 0.1698 1 1000 0.7733 0.2982 0.3081 1.3796 10 10 0.0010 0.1109 0.0006 0.1124 10 100 0.0909 0.2162 0.0172 0.3242 10 1000 7.2220 1.6809 0.3838 9.2867 100 10 0.0101 0.1626 0.0007 0.1734 100 100 0.9508 0.9909 0.0313 1.9730 100 1000 72.7604 16.1123 0.4196 89.2922 1000 10 0.1012 0.5912 0.0007 0.6930 1000 100 9.8420 8.7352 0.0208 18.5980 1000 1000 722.1752 149.0021 0.3944 871.5717 Table 8.1: Computational cost of the specification mining algorithm. ϵmax φ # Clusters 0.05 A · B · C · (D ∪ E) · A 5 0.1 A · B · C · (D ∪ E) · A 5 0.5 F · (G · (H ∪ I) ∪ J · I) 5 1 K · L 2 5 (M · M) ∪ (N · O) 3 10 P ∪ Q 2 Table 8.2: Sensitivity of specification mining to the maximum error threshold. Next, we analyse the sensitivity of our approach to the choice of the maximum error threshold ϵmax. Table 8.2 depicts shape expressions inferred from the same set of training examples (see Figure 8.1) under different maximum error thresholds and clustering termination criteria. As expected, the choice of the maximum error threshold can significantly influence the segmentation part of the algorithm and have an impact on the learned specification. It is interesting to observe that the size of the threshold does not monotonically affect the size of the specification. 8.4.2 Mining Patterns in ECG Data In this case study, we consider ECG signals from the PhysioBank database [114], which contains 549 records from 290 subjects (209 male and 81 female, aged from 17 to 87). Each record includes 15 simultaneously measured signals, digitized at 1,000 samples per second, with 16-bit resolution over a range of 160 (a) Patient 1 (b) Patient 2 Figure 8.8: Example of a heart beat from two patients. ±16.384mV. The diagnostic classes for the subjects participating in the recordings include cardio-vascular diseases such as myocardial infarction, cardiomyopathy, dysrhythmia and myocardial hypertrophy. In this experiment, we considered two sets of examples. The first set of examples comes from a male 70 year old patient with myocardial infarction, an anterior acute infarction and an antero-septal former infarction. The patient had hyperlipoproteinemial as an additional diagnosis, was not a smoker at the time of the admission and had two coronary vessels involved in the disease. The second set of examples comes from a male 52 year old patient with myocardial infarction with antero-septal acute infarction and no former infarctions. The patient had gastritis and rheumatoid arthritis as additional diagnoses, was not a smoker at the time of the admission and had one coronary vessel involved in the disease. Figure 8.8 depicts two heart beats from patient 1 and patient 2, respectively. Each dataset contains 125 heart beat measurements. We use both datasets to infer a common fourvalued alphabet shown in Equation 8.4. The letters A and B represent short and long almost-constant segments, respectively, while C and D represent short segments with high negative and positive slopes, respectively. A : lin(a1, b1, d1) : a1 ∈ [−1.1, 1.4] ∧ b1 ∈ [−1.8, 2.4] ∧ d1 ∈ [0.04, 0.29] B : lin(a2, b2, d2) : a2 ∈ [−0.45, 0.14] ∧ b2 ∈ [−1.9, 0.4] ∧ d2 ∈ [0.32, 0.55] C : lin(a3, b3, d3) : a3 ∈ [−69, −29] ∧ b3 ∈ [−158, −63] ∧ d3 ∈ [0.015, 0.025] D : lin(a4, b4, d4) : a4 ∈ [5.3, 44] ∧ b4 ∈ [1.3, 92] ∧ d4 ∈ [0.008, 0.03] (8.4) 161 We now set the maximum error threshold ϵmax = 0.001 and learn a a separate specification for each patient, as shown in Equation 8.5. φ1 = (A∗ · D · C · D · B) φ2 = (A∗ · D · C · A · (A ∪ B)) (8.5) The two specifications capture the approximated heart-beat behavior of the two patients – a relatively constant behavior (A∗ ), followed by a peak (D · C · D for patient 1 and D · C for patient two), followed by another relatively constant behavior (B for patient 1 and A · (A ∨ B) for patient 2. We can observe that at this level of abstraction, the two patients share similar constant prefix, followed by a peak. We note that the second increase in the slope in the peak of patient two is small and is not captured with this level of abstraction. We can observe that our method captures both similarities and differences between the heartbeats of two patients. 8.4.3 Mining robot motion patterns In [185], the authors described a time-series dataset obtained from a SONY AIBO robot – a small dogshaped robot mounted with a tri-axial accelerometer. In the experimental setting, the robot walks on two different surfaces: carpet and cement. Cemented floors being harder than carpet offer more reactive force to the robot, resulting in clear and sharp changes in acceleration of the robot. The time-series data corresponds to the X-axis readings of the robot labeled according to the surface on which it walks (i.e. cement or carpet). Each datum had 70 time steps, and we analyzed a total of 6 traces† . Our segmentation algorithm was able to partition traces into between 3 and 5 segments. The clustering algorithm was able to extract 5 distinct clusters from the segments, resulting in a finite alphabet of size 5. The specification that we mined from this data corresponds to the following shape expression: †The actual data was acquired from the UCR time-series repository [79]. 162 A ∗ · (B ∪ D) · (B ∪ E) · C · B ∗ (8.6) A : lin(a1, b1, d1) : a1 ∈ [1.365, 1.545]∧ b1 ∈ [−2.136, −1.488] ∧ d1 ∈ [2.0, 3.0] B : lin(a2, b2, d2) : a2 ∈ [−0.239, 0.393]∧ b2 ∈ [−0.986, 2.328] ∧ d2 ∈ [4.0, 13.0] C : lin(a4, b4, d4) : a3 ∈ [0.009, 0.084]∧ b3 ∈ [−1.623, 0.297] ∧ d3 ∈ [24.0, 31.0] D : lin(a3, b3, d3) : a4 ∈ [−1.117, −1.051]∧ b4 ∈ [4.440, 4.754] ∧ d4 ∈ [4.0, 4.0] E : lin(a5, b5, d5) : a5 ∈ [0.04, 0.065]∧ b5 ∈ [−1.208, −0.726] ∧ d5 ∈ [37.0, 41.0] (8.7) As we can see from the specification, the two classes of robot paths are similar with the local differences in behavior captured by the union. The disjunctions capture differences between the robot walking on cement and the other on carpet. The shape expression serves as an explainable classifier of the data, similar to the result in the ECG test case. Such information could be conceivably used by a supervisory controller for the robot that may make different control decisions based on the surface on which the robot walks. 8.5 Conclusions and Future Work We introduced a novel procedure for mining linear shape expressions from time series. It combines segmentation of raw data, clustering, abstraction and passive automata learning. We believe that the presented approach enables understanding and explaining data and facilitates discovering interesting patterns in time series. We implemented the algorithm in a prototype and applied it to two case studies from medical and robotic domains. 163 We plan to explore the applicability of specification mining in explainability of black-box models, testing and anomaly detection. In this work, we investigated a passive learning approach from positive examples. We plan to also study active learning of shape expressions and learning from both positive and negative examples. The quality of the resulting specification depends on the maximum error threshold and the number of clusters used to abstract the traces. These two parameters are currently manually set and require domain knowledge. We will investigate criteria for automatically chose these parameters. In our chapter, we have shown that we can use our approach to partition similar shapes into different classes, for instance to separate normal from anomalous heart beats. However, our approach does not guarantee that the intersection between the two mined specifications is empty. We plan to study refinement-based techniques to mine specifications that accurately characterize specific classes of shapes. Finally, we observe that linear shape expressions can be seen as a form of rectangular hybrid automata. We plan to explore further our results to identify richer classes of hybrid automata from time series. 164 Chapter 9 Conclusions In this thesis, we discuss various components of a broad framework that enable scalable reasoning techniques tuned to modern software design practices in autonomous CPS applications. In Chapter 3, we propose a verification framework that can search the parameter space to find the regions that lead to satisfaction or violation of given specification with probabilistic coverage guarantees. We treat the underlying CPS application as a black-box and use distribution-free and model-free techniques to provide such probabilistic correctness guarantees. In Chapter 4, we use physics-based or data-driven models of the system to continuously monitor logicbased requirements of systems operating in highly uncertain environments; this allows us to design runtime mitigation approaches to take corrective actions before a safety violation can occur. we present two predictive runtime verification algorithms to compute the probability that the current system trajectory violates a signal temporal logic specification. Both algorithms use i) trajectory predictors to predict future system states, and ii) conformal prediction to quantify prediction uncertainty. In Chapter 5, we study conformance of stochastic dynamical systems. Particularly, we define conformance between two stochastic systems as probabilistic bounds over the distribution of distances between model trajectories. Additionally, we propose the non-conformance risk to reason about the risk of stochastic systems not being conformant. We show that both notions have the transference property, meaning 165 that conformant systems satisfy similar system specifications. Lastly, we show how stochastic conformance and the non-conformance risk can be estimated from data using statistical tools such as conformal prediction. In Chapter 6, we perform robust testing for CPS using reinforcement learning. We train an agent to produce a policy to initiate unsafe behaviors in similar target systems without the need for retraining, thereby allowing for the elicitation of faulty behaviors across various systems. We use STL to specify the target against which we are testing and constraints that specify reasonableness of the testing regime. We are using STL as a lightweight, high-level programming language to loosely specify the desired behaviors of a test scenario, and leveraging RL algorithms to determine how to execute those behaviors. The learned adversarial policies are reactive, as opposed to testing schemes that rely on merely replaying pre-recorded behaviors, and under limited conditions can even provide valuable testing capability to modified versions of the system. In Chapter 7, we develop a versatile specification languages that can address the real-time, real-valued, and noisy nature of signals found in CPS applications. Specifically, we develop Shape Expressions, a language based on regular expressions to capture sequences of shapes in possibly noisy time-series data. We study essential properties of shape expressions and develop an efficient heuristic pattern matching procedure for this specification language. We believe that this work explores the expressiveness boundaries of declarative specification languages. In Chapter 8, we introduce a novel procedure for mining linear shape expressions from time series. It combines segmentation of raw data, clustering, abstraction and passive automata learning. We believe that the presented approach enables understanding and explaining data and facilitates discovering interesting patterns in time series. We demonstrate that by combining the logic expressions of our system requirements with machine learning and statistical techniques, we can reason about the system from design time to deployment. 166 Future Directions We investigate black-box modeling of cyber-physical systems, aiming to provide probabilistic guarantees regarding temporal specification requirements. We have explored various sources of uncertainties and applied different techniques to address them. All the techniques we propose involve using the robust semantics of specification and sampling-based methods. We argue that, in the future, the results from each different stage could be further integrated. For instance, statistical verification results on initial states could be utilized to accelerate the reasoning process for runtime verification. Additionally, analyzing the differences in robust semantics between various specification languages could enable the transfer of guarantees throughout the system when the specification language changes. We envision integrating distinct techniques into a cyclic process: verifying the system, implementing necessary adjustments, and conducting conformance checking. This approach enables the system to autonomously refine its design. Counterexample-guided methods are well-established in controller and program synthesis. However, given that cyber-physical systems (CPS) consist of multiple components, verifying a single component necessitates consideration of its ripple effects on other components. For instance, adjusting and retraining the vision sensor could inadvertently alter the behavior of the planner. Moreover, comprehensive system verification, such as evaluating a car’s speed and steering angle, poses the challenge of discerning which subsystem is responsible for a specific behavioral outcome. Consider a scenario where a segment of the system is altered, yet the resultant behavior shows no substantial deviation. If the system in its original form fails to meet a specific requirement, and this deficiency persists post-modification (as evidenced by conformance check outcomes), it implies that the modified subsystem is not the root cause of the issue. Furthermore, if the desired system behavior remains unattainable 167 despite adjustments across all subsystems (as determined by verification results), this situation may necessitate the development or acquisition of new specifications and an exploration of alternative specification languages. 168 Bibliography [1] Alessandro Abate, Alec Edwards, Mirco Giacobbe, Hashan Punchihewa, and Diptarko Roy. Quantitative Verification With Neural Networks For Probabilistic Programs and Stochastic Systems. 2023. arXiv: 2301.06136 [cs.LO]. [2] Houssam Abbas, Bardh Hoxha, Georgios Fainekos, Jyotirmoy V Deshmukh, James Kapinski, and Koichi Ueda. “Conformance testing as falsification for cyber-physical systems”. In: arXiv preprint arXiv:1401.5200 (2014). [3] Houssam Abbas, Bardh Hoxha, Georgios Fainekos, Jyotirmoy V Deshmukh, James Kapinski, and Koichi Ueda. “Conformance testing as falsification for cyber-physical systems”. In: arXiv preprint arXiv:1401.5200 (2014). [4] Houssam Abbas, Bardh Hoxha, Georgios Fainekos, and Koichi Ueda. “Robustness-guided temporal logic testing and verification for stochastic cyber-physical systems”. In: The 4th Annual IEEE International Conference on Cyber Technology in Automation, Control and Intelligent. IEEE. 2014, pp. 1–6. [5] Houssam Abbas, Hans Mittelmann, and Georgios Fainekos. “Formal property verification in a conformance testing framework”. In: 2014 Twelfth ACM/IEEE Conference on Formal Methods and Models for Codesign (MEMOCODE). IEEE. 2014, pp. 155–164. [6] Houssam Abbas, Alena Rodionova, Ezio Bartocci, Scott A Smolka, and Radu Grosu. “Quantitative regular expressions for arrhythmia detection algorithms”. In: International Conference on Computational Methods in Systems Biology. Springer. 2017, pp. 23–39. [7] Houssam Y Abbas. Test-based falsification and conformance testing for cyber-physical systems. Arizona State University, 2015. [8] C. Ackermann, R. Cleaveland, S. Huang, A. Ray, C. P. Shelton, and E. Latronico. “Automatic Requirement Extraction from Test Cases”. In: Proc. of RV 2010. 2010, pp. 1–15. [9] Gul Agha and Karl Palmskog. “A survey of statistical model checking”. In: ACM Transactions on Modeling and Computer Simulation (TOMACS) 28.1 (2018), pp. 1–39. [10] Takumi Akazaki and Ichiro Hasuo. “Time Robustness in MTL and Expressivity in Hybrid System Falsification”. In: CAV. 2015, pp. 356–374. 169 [11] Takumi Akazaki, Shuang Liu, Yoriyuki Yamagata, Yihai Duan, and Jianye Hao. “Falsification of cyber-physical systems using deep reinforcement learning”. In: International Symposium on Formal Methods. Springer. 2018, pp. 456–465. [12] Prithvi Akella, Mohamadreza Ahmadi, and Aaron D Ames. “A scenario approach to risk-aware safety-critical system verification”. In: arXiv preprint arXiv:2203.02595 (2022). [13] Prithvi Akella, Anushri Dixit, Mohamadreza Ahmadi, Joel W Burdick, and Aaron D Ames. “Sample-Based Bounds for Coherent Risk Measures: Applications to Policy Synthesis and Verification”. In: arXiv preprint arXiv:2204.09833 (2022). [14] Matthias Althoff and John M Dolan. “Online verification of automated road vehicles using reachability analysis”. In: IEEE Transactions on Robotics 30.4 (2014), pp. 903–918. [15] Rajeev Alur, Dana Fisman, and Mukund Raghothaman. “Regular programming for quantitative properties of data streams”. In: European Symposium on Programming. Springer. 2016, pp. 15–40. [16] Rajeev Alur, Thomas A Henzinger, Gerardo Lafferriere, and George J Pappas. “Discrete abstractions of hybrid systems”. In: Proceedings of the IEEE 88.7 (2000), pp. 971–984. [17] Rajeev Alur, Konstantinos Mamouras, and Caleb Stanford. “Modular quantitative monitoring”. In: Proceedings of the ACM on Programming Languages 3.POPL (2019), p. 50. [18] Étienne André, Ichiro Hasuo, and Masaki Waga. “Offline Timed Pattern Matching under Uncertainty”. In: 23rd International Conference on Engineering of Complex Computer Systems, ICECCS 2018, Melbourne, Australia, December 12-14, 2018. 2018, pp. 10–20. [19] Anastasios N Angelopoulos and Stephen Bates. “A gentle introduction to conformal prediction and distribution-free uncertainty quantification”. In: arXiv preprint arXiv:2107.07511 (2021). [20] Dana Angluin. “Learning Regular Sets from Queries and Counterexamples”. In: Inf. Comput. 75.2 (1987), pp. 87–106. doi: 10.1016/0890-5401(87)90052-6. [21] Hugo Araujo, Gustavo Carvalho, Morteza Mohaqeqi, Mohammad Reza Mousavi, and Augusto Sampaio. “Sound conformance testing for cyber-physical systems: Theory and implementation”. In: Science of Computer Programming 162 (2018), pp. 35–54. [22] E. Asarin, T. Dang, and A. Girard. “Hybridization methods for the analysis of nonlinear systems”. In: Acta Informatica 43.7 (2007), pp. 451–476. [23] E. Asarin, A. Donzé, O. Maler, and D. Nickovic. “Parametric Identification of Temporal Properties”. In: RV. 2011, pp. 147–160. [24] Eugene Asarin, Paul Caspi, and Oded Maler. “A Kleene Theorem for Timed Automata”. In: Logic in Computer Science (LICS). 1997, pp. 160–171. [25] Eugene Asarin, Paul Caspi, and Oded Maler. “Timed Regular Expressions”. In: Journal of ACM 49.2 (2002), pp. 172–206. 170 [26] Reza Babaee, Vijay Ganesh, and Sean Sedwards. “Accelerated learning of predictive runtime monitors for rare failure”. In: International Conference on Runtime Verification. Springer. 2019, pp. 111–128. [27] Reza Babaee, Arie Gurfinkel, and Sebastian Fischmeister. “Prevent: A Predictive Run-Time Verification Framework Using Statistical Learning”. In: International Conference on Software Engineering and Formal Methods. Springer. 2018, pp. 205–220. [28] Christel Baier and Joost-Pieter Katoen. Principles of Model Checking. 1st ed. Cambridge, MA: The MIT Press, 2008. isbn: 026202649X, 9780262026499. [29] Christel Baier and Joost-Pieter Katoen. Principles of model checking. MIT press, 2008. [30] Stanley Bak, Changliu Liu, and Taylor Johnson. “The second international verification of neural networks competition (vnn-comp 2021): Summary and results”. In: arXiv preprint arXiv:2109.00498 (2021). [31] Stanley Bak and Hoang-Dung Tran. “Neural Network Compression of ACAS Xu Early Prototype Is Unsafe: Closed-Loop Verification Through Quantized State Backreachability”. In: NASA Formal Methods Symposium. Springer. 2022, pp. 280–298. [32] Alexey Bakhirkin, Thomas Ferrère, Oded Maler, and Dogan Ulus. “On the Quantitative Semantics of Regular Expressions over Real-Valued Signals”. In: Formal Modeling and Analysis of Timed Systems - 15th International Conference, FORMATS 2017, Berlin, Germany, September 5-7, 2017, Proceedings. 2017, pp. 189–206. [33] Alexey Bakhirkin, Thomas Ferrère, Dejan Nickovic, Oded Maler, and Eugene Asarin. “Online Timed Pattern Matching Using Automata”. In: International Conference on Formal Modeling and Analysis of Timed Systems. Springer. 2018, pp. 215–232. [34] Andrew R Barron. “Universal approximation bounds for superpositions of a sigmoidal function”. In: IEEE Transactions on Information theory 39.3 (1993), pp. 930–945. [35] Ezio Bartocci, Roderick Bloem, Benedikt Maderbacher, Niveditha Manjunath, and Dejan Ničković. “Adaptive testing for specification coverage in CPS models”. In: IFAC-PapersOnLine 54.5 (2021), pp. 229–234. [36] Ezio Bartocci, Luca Bortolussi, Laura Nenzi, and Guido Sanguinetti. “On the robustness of temporal properties for stochastic models”. In: Proc. Int. Workshop Hybrid Syst. Biology. Taormina, Italy, Sept. 2013, pp. 3–19. [37] Ezio Bartocci, Luca Bortolussi, Laura Nenzi, and Guido Sanguinetti. “System design of stochastic models using robustness of temporal properties”. In: Theoret. Comp. Science 587 (2015), pp. 3–25. [38] Ezio Bartocci, Luca Bortolussi, and Guido Sanguinetti. “Data-Driven Statistical Learning of Temporal Logic Properties”. In: Proc. of FORMATS. 2014, pp. 23–37. 171 [39] Ezio Bartocci, Jyotirmoy Deshmukh, Alexandre Donzé, Georgios Fainekos, Oded Maler, Dejan Ničković, and Sriram Sankaranarayanan. “Specification-based monitoring of cyber-physical systems: a survey on theory, tools and applications”. In: Lectures on Runtime Verification: Introductory and Advanced Topics (2018), pp. 135–175. [40] Ezio Bartocci, Jyotirmoy Deshmukh, Felix Gigler, Cristinel Mateis, Dejan Ničković, and Xin Qin. “Mining shape expressions from positive examples”. In: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39.11 (2020), pp. 3809–3820. [41] Ezio Bartocci, Thomas Ferrère, Niveditha Manjunath, and Dejan Ničković. “Localizing faults in Simulink/Stateflow models with STL”. In: Proc. of HSCC. 2018, pp. 197–206. [42] Ezio Bartocci, Niveditha Manjunath, Leonardo Mariani, Cristinel Mateis, and Dejan Nickovic. “Automatic Failure Explanation in CPS Models”. In: Proc. of SEFM 2019: the 17th International Conference on Software Engineering and Formal Methods. Vol. 11724. LNCS. Springer, 2019, pp. 69–86. doi: 10.1007/978-3-030-30446-1. [43] Andreas Bauer, Martin Leucker, and Christian Schallhart. “Runtime verification for LTL and TLTL”. In: ACM Transactions on Software Engineering and Methodology (TOSEM) 20.4 (2011), pp. 1–64. [44] Richard Bellman and Robert Roth. “Curve fitting by segmented straight lines”. In: Journal of the American Statistical Association 64.327 (1969), pp. 1079–1084. [45] Calin Belta, Boyan Yordanov, and Ebru Aydin Gol. Formal methods for discrete-time dynamical systems. Vol. 15. Springer, 2017. [46] Gaoang Bian and Alessandro Abate. “On the relationship between bisimulation and trace equivalence in an approximate probabilistic context”. In: Foundations of Software Science and Computation Structures: 20th International Conference, FOSSACS 2017, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2017, Uppsala, Sweden, April 22-29, 2017, Proceedings 20. Springer. 2017, pp. 321–337. [47] Andrea Bianco and Luca de Alfaro. “Model checking of probabilistic and nondeterministic systems”. In: International Conference on Foundations of Software Technology and Theoretical Computer Science. Springer. 1995, pp. 499–513. [48] Sebastian Biewer, Rayna Dimitrova, Michael Fries, Maciej Gazda, Thomas Heinze, Holger Hermanns, and Mohammad Reza Mousavi. “Conformance relations and hyperproperties for doping detection in time and space”. In: arXiv preprint arXiv:2012.03910 (2020). [49] Christopher M Bishop. Pattern recognition and machine learning. springer, 2006. [50] Roderick Bloem, Krishnendu Chatterjee, and Barbara Jobstmann. “Graph games and reactive synthesis”. In: Handbook of Model Checking. Springer, 2018, pp. 921–962. [51] Giuseppe Bombara and Calin Belta. “Online Learning of Temporal Logic Formulae for Signal Classification”. In: Proc. of the 2018 European Control Conference. 2018, pp. 2057–2062. 172 [52] Giuseppe Bombara and Calin Belta. “Signal Clustering Using Temporal Logics”. In: RV. 2017, pp. 121–137. [53] Giuseppe Bombara, Cristian-Ioan Vasile, Francisco Penedo, Hirotoshi Yasuoka, and Calin Belta. “A Decision Tree Approach to Data Classification Using Signal Temporal Logic”. In: HSCC. 2016, pp. 1–10. [54] Luca Bortolussi, Francesca Cairoli, Nicola Paoletti, Scott A Smolka, and Scott D Stoller. “Neural predictive monitoring”. In: International Conference on Runtime Verification. Springer. 2019, pp. 129–147. [55] Luca Bortolussi, Francesca Cairoli, Nicola Paoletti, Scott A Smolka, and Scott D Stoller. “Neural predictive monitoring and a comparison of frequentist and Bayesian approaches”. In: International Journal on Software Tools for Technology Transfer 23.4 (2021), pp. 615–640. [56] Dimitrios Boursinos and Xenofon Koutsoukos. “Assurance monitoring of cyber-physical systems with machine learning components”. In: arXiv preprint arXiv:2001.05014 (2020). [57] Dimitrios Boursinos and Xenofon Koutsoukos. “Assurance monitoring of learning-enabled cyber-physical systems using inductive conformal prediction based on distance learning”. In: AI EDAM 35.2 (2021), pp. 251–264. [58] Dimitrios Boursinos and Xenofon Koutsoukos. “Trusted Confidence Bounds for Learning Enabled Cyber-Physical Systems”. In: 2020 IEEE Security and Privacy Workshops (SPW). 2020, pp. 228–233. doi: 10.1109/SPW50608.2020.00053. [59] Sara Bufo, Ezio Bartocci, Guido Sanguinetti, Massimo Borelli, Umberto Lucangelo, and Luca Bortolussi. “Temporal Logic Based Monitoring of Assisted Ventilation in Intensive Care Patients”. In: Proc. of ISoLA. 2014, pp. 391–403. [60] Feiyang Cai and Xenofon Koutsoukos. “Real-time out-of-distribution detection in learning-enabled cyber-physical systems”. In: 2020 ACM/IEEE 11th International Conference on Cyber-Physical Systems (ICCPS). IEEE. 2020, pp. 174–183. [61] Francesca Cairoli, Luca Bortolussi, and Nicola Paoletti. “Neural predictive monitoring under partial observability”. In: International Conference on Runtime Verification. Springer. 2021, pp. 121–141. [62] Francesca Cairoli, Nicola Paoletti, and Luca Bortolussi. “Conformal Quantitative Predictive Monitoring of STL Requirements for Stochastic Processes”. In: arXiv preprint arXiv:2211.02375 (2022). [63] Ian Cassar, Adrian Francalanza, Luca Aceto, and Anna Ingólfsdóttir. “A survey of runtime monitoring instrumentation techniques”. In: arXiv preprint arXiv:1708.07229 (2017). [64] Maxime Cauchois, Suyash Gupta, Alnur Ali, and John C Duchi. “Robust validation: Confident predictions even when distributions shift”. In: arXiv preprint arXiv:2008.04267 (2020). 173 [65] Andrea Censi, Konstantin Slutsky, Tichakorn Wongpiromsarn, Dmitry Yershov, Scott Pendleton, James Fu, and Emilio Frazzoli. “Liability, Ethics, and Culture-Aware Behavior Specification using Rulebooks”. en. In: ICRA. 2019. [66] Pavol Cerny, Thomas A Henzinger, and Arjun Radhakrishna. “Quantitative abstraction refinement”. In: Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 2013, pp. 115–128. [67] Margaret P Chapman, Riccardo Bonalli, Kevin M Smith, Insoon Yang, Marco Pavone, and Claire J Tomlin. “Risk-sensitive safety analysis using Conditional Value-at-Risk”. In: IEEE Transactions on Automatic Control (2021). [68] Margaret P. Chapman, Jonathan Lacotte, Aviv Tamar, Donggun Lee, Kevin M. Smith, Victoria Cheng, Jaime F. Fisac, Susmit Jha, Marco Pavone, and Claire J. Tomlin. “A Risk-Sensitive Finite-Time Reachability Approach for Safety of Stochastic Dynamic Systems”. In: 2019 American Control Conference (ACC). 2019, pp. 2958–2963. doi: 10.23919/ACC.2019.8815169. [69] Yushan Chen, Jana Tumova, Alphan Ulusoy, and Calin Belta. “Temporal logic robot control based on automata learning of environmental dynamics”. In: The International Journal of Robotics Research 32.5 (2013), pp. 547–565. doi: 10.1177/0278364912473168. [70] Glen Chou, Necmiye Ozay, and Dmitry Berenson. “Incremental Segmentation of ARX Models”. In: IFAC-PapersOnLine 51.15 (2018). 18th IFAC Symposium on System Identification SYSID 2018, pp. 587–592. issn: 2405-8963. doi: https://doi.org/10.1016/j.ifacol.2018.09.222. [71] Yi Chou, Hansol Yoon, and Sriram Sankaranarayanan. “Predictive runtime monitoring of vehicle models using Bayesian estimation and reachability analysis”. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE. 2020, pp. 2111–2118. [72] E. M. Clarke, J. R. Faeder, C. J Langmead, L. A. Harris, S. K. Jha, and A. Legay. “Statistical model checking in biolab: Applications to the automated analysis of t-cell receptor signaling pathway”. In: CMSB. Springer. 2008, pp. 231–250. [73] Edmund M Clarke. “Model checking”. In: International Conference on Foundations of Software Technology and Theoretical Computer Science. Springer. 1997, pp. 54–56. [74] Matthew Cleaveland, Lars Lindemann, Radoslav Ivanov, and George J Pappas. “Risk verification of stochastic systems with neural network controllers”. In: Artificial Intelligence 313 (2022), p. 103782. [75] Anthony Corso, Peter Du, Katherine Driggs-Campbell, and Mykel J Kochenderfer. “Adaptive stress testing with reward augmentation for autonomous vehicle validatio”. In: 2019 IEEE Intelligent Transportation Systems Conference (ITSC). IEEE. 2019, pp. 163–168. [76] Anthony Corso, Robert Moss, Mark Koren, Ritchie Lee, and Mykel Kochenderfer. “A survey of algorithms for black-box safety validation of cyber-physical systems”. In: Journal of Artificial Intelligence Research 72 (2021), pp. 377–428. 174 [77] Thao Dang, Oded Maler, and Romain Testylier. “Accurate hybridization of nonlinear systems”. In: Proc. of HSCC 2010: the 13th ACM International Conference on Hybrid Systems: Computation and Control. ACM, 2010, pp. 11–20. [78] Thao Dang and Romain Testylier. “Hybridization domain construction using curvature estimation”. In: Proc. of HSCC 2011: the 14th International Conference on Hybrid Systems: computation and control. ACM, 2011, pp. 123–132. [79] Hoang Anh Dau, Eamonn Keogh, Kaveh Kamgar, Chin-Chia Michael Yeh, Yan Zhu, Shaghayegh Gharghabi, Chotirat Ann Ratanamahatana, Yanping, Bing Hu, Nurjahan Begum, Anthony Bagnall, Abdullah Mueen, Gustavo Batista, and Hexagon-ML. The UCR Time Series Classification Archive. https://www.cs.ucr.edu/$\sim$eamonn/time_series_data_2018/. Oct. 2018. [80] Florent Delgrange, Ann Nowe, and Guillermo A Pérez. “Wasserstein Auto-encoded MDPs: Formal Verification of Efficiently Distilled RL Policies with Many-sided Guarantees”. In: arXiv preprint arXiv:2303.12558 (2023). [81] Jyotirmoy Deshmukh, Xiaoqing Jin, Rupak Majumdar, and Vinayak Prabhu. “Parameter optimization in control software using statistical fault localization techniques”. In: Proc. of ICCPS. IEEE. 2018, pp. 220–231. [82] Jyotirmoy V Deshmukh, Alexandre Donzé, Shromona Ghosh, Xiaoqing Jin, Garvit Juniwal, and Sanjit A Seshia. “Robust online monitoring of signal temporal logic”. In: Formal Methods in System Design 51.1 (2017), pp. 5–30. [83] Jyotirmoy V Deshmukh, Rupak Majumdar, and Vinayak S Prabhu. “Quantifying conformance using the Skorokhod metric”. In: Computer Aided Verification: 27th International Conference, CAV 2015, San Francisco, CA, USA, July 18-24, 2015, Proceedings, Part II 27. Springer. 2015, pp. 234–250. [84] Jyotirmoy V Deshmukh, Rupak Majumdar, and Vinayak S Prabhu. “Quantifying conformance using the skorokhod metric”. In: Computer Aided Verification: 27th International Conference, CAV 2015, San Francisco, CA, USA, July 18-24, 2015, Proceedings, Part II 27. Springer. 2015, pp. 234–250. [85] Jyotirmoy V Deshmukh and Sriram Sankaranarayanan. “Formal techniques for verification and testing of cyber-physical systems”. In: Design Automation of Cyber-Physical Systems. Springer, 2019, pp. 69–105. [86] Rayna Dimitrova, Maciej Gazda, Mohammad Reza Mousavi, Sebastian Biewer, and Holger Hermanns. “Conformance-based doping detection for cyber-physical systems”. In: International Conference on Formal Techniques for Distributed Objects, Components, and Systems. Springer. 2020, pp. 59–77. [87] Rayna Dimitrova and Rupak Majumdar. “Deductive control synthesis for alternating-time logics”. In: 2014 International Conference on Embedded Software (EMSOFT). IEEE. 2014, pp. 1–10. [88] Wenhao Ding, Chejian Xu, Mansur Arief, Haohong Lin, Bo Li, and Ding Zhao. “A survey on safety-critical driving scenario generation—A methodological perspective”. In: IEEE Transactions on Intelligent Transportation Systems (2023). 175 [89] Alexandre Donzé and Oded Maler. “Robust Satisfaction of Temporal Logic over Real-Valued Signals”. en. In: Formal Modeling and Analysis of Timed Systems. Vol. 6246. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010, pp. 92–106. isbn: 978-3-642-15296-2 978-3-642-15297-9. [90] Alexandre Donzé and Oded Maler. “Robust satisfaction of temporal logic over real-valued signals”. In: International Conference on Formal Modeling and Analysis of Timed Systems. Springer. 2010, pp. 92–106. [91] Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. “CARLA: An Open Urban Driving Simulator”. In: Proceedings of the 1st Annual Conference on Robot Learning. 2017, pp. 1–16. [92] Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. “CARLA: An open urban driving simulator”. In: Conference on robot learning. PMLR. 2017, pp. 1–16. [93] Tommaso Dreossi, Alexandre Donzé, and Sanjit A Seshia. “Compositional falsification of cyber-physical systems with machine learning components”. In: Journal of Automated Reasoning 63 (2019), pp. 1031–1053. [94] Tommaso Dreossi, Daniel J Fremont, Shromona Ghosh, Edward Kim, Hadi Ravanbakhsh, Marcell Vazquez-Chanlatte, and Sanjit A Seshia. “Verifai: A toolkit for the formal design and analysis of artificial intelligence-based systems”. In: International Conference on Computer Aided Verification. Springer. 2019, pp. 432–442. [95] Souradeep Dutta, Susmit Jha, Sriram Sankaranarayanan, and Ashish Tiwari. “Learning and verification of feedback control systems using feedforward neural networks”. In: IFAC-PapersOnLine 51.16 (2018), pp. 151–156. [96] Michael D. Ernst, Jeff H. Perkins, Philip J. Guo, Stephen McCamant, Carlos Pacheco, Matthew S. Tschantz, and Chen Xiao. “The Daikon System for Dynamic Detection of Likely Invariants”. In: Sci. Comput. Program. 69.1–3 (2007), pp. 35–45. issn: 0167-6423. doi: 10.1016/j.scico.2007.01.015. [97] Georgios E Fainekos and George J Pappas. “Robustness of temporal logic specifications for continuous-time signals”. In: Theoretical Computer Science 410.42 (2009), pp. 4262–4291. [98] Chuchu Fan, Bolun Qi, Sayan Mitra, and Mahesh Viswanathan. “Dryvr: Data-driven verification and compositional reasoning for automotive systems”. In: CAV. 2017, pp. 441–461. [99] Norm Ferns, Prakash Panangaden, and Doina Precup. “Bisimulation Metrics for Continuous Markov Decision Processes”. In: SIAM Journal on Computing 40.6 (2011), pp. 1662–1714. [100] Angelo Ferrando and Giorgio Delzanno. “Incrementally Predictive Runtime Verification”. In: CILC. 2021, pp. 92–106. [101] Thomas Ferrère, Dejan Nickovic, Alexandre Donzé, Hisahiro Ito, and James Kapinski. “Interface-aware signal temporal logic”. In: Proceedings of the 22nd ACM International Conference on Hybrid Systems: Computation and Control. 2019, pp. 57–66. 176 [102] Bernd Finkbeiner, Christopher Hahn, Marvin Stenger, and Leander Tentrup. “Monitoring hyperproperties”. In: Formal Methods in System Design 54.3 (2019), pp. 336–363. [103] Matteo Fontana, Gianluca Zeni, and Simone Vantini. “Conformal prediction: A unified review of theory and new challenges”. In: Bernoulli 29.1 (2023), pp. 1–23. [104] Jerome Friedman, Trevor Hastie, and Robert Tibshirani. The elements of statistical learning. Vol. 1. 10. Springer series in statistics New York, 2001. [105] Jie Fu, Herbert G. Tanner, Jeffrey Heinz, and Jane Chandlee. “Adaptive Symbolic Control for Finite-State Transition Systems With Grammatical Inference”. In: IEEE Trans. Automat. Contr. 59.2 (2014), pp. 505–511. doi: 10.1109/TAC.2013.2272885. [106] Jie Fu and Ufuk Topcu. “Probably approximately correct MDP learning and control with temporal logic constraints”. In: arXiv preprint arXiv:1404.7073 (2014). [107] Marta Garnelo and Murray Shanahan. “Reconciling deep learning with symbolic artificial intelligence: representing objects and relations”. In: Current Opinion in Behavioral Sciences 29 (2019). SI: 29: Artificial Intelligence (2019), pp. 17–23. issn: 2352-1546. doi: https://doi.org/10.1016/j.cobeha.2018.12.010. [108] Pierre Geurts. “Pattern extraction for time series classification”. In: European Conference on Principles of Data Mining and Knowledge Discovery. Springer. 2001, pp. 115–127. [109] Shromona Ghosh, Dorsa Sadigh, Pierluigi Nuzzo, Vasumathi Raman, Alexandre Donzé, Alberto L Sangiovanni-Vincentelli, S Shankar Sastry, and Sanjit A Seshia. “Diagnosis and repair for synthesis from signal temporal logic specifications”. In: Proc. Int. Conf. Hybrid Syst.: Comp. Control. ACM. 2016, pp. 31–40. [110] Antoine Girard and George J Pappas. “Approximate bisimulation: A bridge between computer science and control theory”. In: European Journal of Control 17.5-6 (2011), pp. 568–578. [111] Antoine Girard and George J. Pappas. “Approximate Bisimulation: A Bridge Between Computer Science and Control Theory”. en. In: European Journal of Control 17.5 (Jan. 2011), pp. 568–578. issn: 0947-3580. doi: 10.3166/ejc.17.568-578. [112] Rafal Goebel, Ricardo G Sanfelice, and Andrew R Teel. “Hybrid dynamical systems”. In: IEEE control systems magazine 29.2 (2009), pp. 28–93. [113] Rafal Goebel, Ricardo G Sanfelice, and Andrew R Teel. Hybrid Dynamical Systems: modeling, stability, and robustness. 1st ed. Princeton, NJ: Princeton University Press, 2012. isbn: 9781400842636. [114] Ary L Goldberger, Luis AN Amaral, Leon Glass, Jeffrey M Hausdorff, Plamen Ch Ivanov, Roger G Mark, Joseph E Mietus, George B Moody, Chung-Kang Peng, and H Eugene Stanley. “PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals”. In: Circulation 101.23 (2000), e215–e220. [115] Robert M Gray. Entropy and information theory. Springer Science & Business Media, 2011. 177 [116] Luis Gressenbuch and Matthias Althoff. “Predictive monitoring of traffic rules”. In: 2021 IEEE International Intelligent Transportation Systems Conference (ITSC). IEEE. 2021, pp. 915–922. [117] R. Grosu, G. Batt, F. Fenton, J. Glimm, C. Le Guernic, S. A. Smolka, and E. Bartocci. “From Cardiac Cells to Genetic Regulatory Networks”. In: Proc. of CAV 2011: the 14th International Conference on Computer Aided Verification. Vol. 6806. LNCS. Springer Berlin / Heidelberg, 2011, pp. 396–411. isbn: 978-3-642-22109-5. [118] Radu Grosu, Sayan Mitra, Pei Ye, Emilia Entcheva, I. V. Ramakrishnan, and Scott A. Smolka. “Learning Cycle-Linear Hybrid Automata for Excitable Cells”. In: Proc. of HSCC 2007: the 10th International Workshop on Hybrid Systems: Computation and Control. Vol. 4416. Lecture Notes in Computer Science. Springer, 2007, pp. 245–258. [119] Xiaozhe Gu and Arvind Easwaran. “Towards safe machine learning for cps: infer uncertainty from training data”. In: Proceedings of the 10th ACM/IEEE International Conference on Cyber-Physical Systems. 2019, pp. 249–258. [120] Sofie Haesaert and Sadegh Soudjani. “Robust dynamic programming for temporal logic control of stochastic systems”. In: IEEE Transactions on Automatic Control 66.6 (2020), pp. 2496–2511. [121] Christopher Hahn. “Algorithms for monitoring hyperproperties”. In: International Conference on Runtime Verification. Springer. 2019, pp. 70–90. [122] E. M. Hahn, M. Perez, S. Schewe, F. Somenzi, A. Trivedi, and D. Wojtczak. Reward Shaping for Reinforcement Learning with Omega-Regular Objectives. 2020. arXiv: 2001.05977 [cs.LO]. [123] Hans Hansson and Bengt Jonsson. “A logic for reasoning about time and reliability”. In: Formal aspects of computing 6.5 (1994), pp. 512–535. [124] Peter Heidlauf, Alexander Collins, Michael Bolender, and Stanley Bak. “Verification Challenges in F-16 Ground Collision Avoidance and Other Automated Maneuvers”. In: ARCH@ ADHS. 2018, pp. 208–217. [125] Colin de la Higuera. Grammatical Inference: Learning Automata and Grammars. Cambridge University Press, 2010. doi: 10.1017/CBO9781139194655. [126] Sepp Hochreiter and Jürgen Schmidhuber. “Long short-term memory”. In: Neural computation 9.8 (1997), pp. 1735–1780. [127] L. Jeff Hong, Zhaolin Hu, and Guangwu Liu. “Monte Carlo Methods for Value-at-Risk and Conditional Value-at-Risk: A Review”. In: ACM Trans. Model. Comput. Simul. 24.4 (Nov. 2014). issn: 1049-3301. doi: 10.1145/2661631. [128] Bardh Hoxha, Adel Dokhanchi, and Georgios E. Fainekos. “Mining parametric temporal logic properties in model-based design for cyber-physical systems”. In: STTT 20.1 (2018), pp. 79–93. doi: 10.1007/s10009-017-0447-4. 178 [129] Xiaowei Huang, Daniel Kroening, Wenjie Ruan, James Sharp, Youcheng Sun, Emese Thamo, Min Wu, and Xinping Yi. “A survey of safety and trustworthiness of deep neural networks: Verification, testing, adversarial attack and defence, and interpretability”. In: Computer Science Review 37 (2020), p. 100270. [130] Xiaowei Huang, Marta Kwiatkowska, Sen Wang, and Min Wu. “Safety verification of deep neural networks”. In: International Conference on Computer Aided Verification. Springer. 2017, pp. 3–29. [131] “IEEE standard on pulse ment and analysis by objective techniques”. In: IEEE Std. 181-1977 (1977). [132] Malte Isberner, Falk Howar, and Bernhard Steffen. “The TTT Algorithm: A Redundancy-Free Approach to Active Automata Learning”. In: RV. 2014, pp. 307–322. [133] Radoslav Ivanov, Taylor J Carpenter, James Weimer, Rajeev Alur, George J Pappas, and Insup Lee. “Case study: verifying the safety of an autonomous racing car with a neural network controller”. In: Proceedings of the 23rd International Conference on Hybrid Systems: Computation and Control. 2020, pp. 1–7. [134] Radoslav Ivanov, James Weimer, Rajeev Alur, George J Pappas, and Insup Lee. “Verisig: verifying safety properties of hybrid systems with neural network controllers”. In: Proceedings of the 22nd ACM International Conference on Hybrid Systems: Computation and Control. 2019, pp. 169–178. [135] John Jackson, Luca Laurenti, Eric Frew, and Morteza Lahijanian. “Formal verification of unknown dynamical systems via Gaussian process regression”. In: arXiv preprint arXiv:2201.00655 (2021). [136] Manfred Jaeger, Kim G Larsen, and Alessandro Tibo. “From statistical model checking to run-time monitoring using a bayesian network approach”. In: International Conference on Runtime Verification. Springer. 2020, pp. 517–535. [137] Stefan Jakšić, Ezio Bartocci, Radu Grosu, Thang Nguyen, and Dejan Ničković. “Quantitative Monitoring of STL with Edit Distance”. en. In: Formal Methods in System Design 53.1 (Aug. 2018), pp. 83–112. issn: 1572-8102. doi: 10.1007/s10703-018-0319-x. [138] Susmit Jha, Ashish Tiwari, Sanjit A. Seshia, Tuhin Sahai, and Natarajan Shankar. “TeLEx: learning signal temporal logic from positive examples using tightness metric”. In: Formal Methods in System Design 54.3 (2019), pp. 364–387. doi: 10.1007/s10703-019-00332-1. [139] Xiaoqing Jin, Jyotirmoy V Deshmukh, James Kapinski, Koichi Ueda, and Ken Butts. “Benchmarks for model transformations and conformance checking”. In: 1st International Workshop on Applied Verification for Continuous and Hybrid Systems (ARCH). 2014. [140] Xiaoqing Jin, Alexandre Donzé, Jyotirmoy V. Deshmukh, and Sanjit A. Seshia. “Mining Requirements From Closed-Loop Control Models”. In: IEEE TCAD 34.11 (2015), pp. 1704–1717. [141] A Agung Julius and George J Pappas. “Approximations of stochastic hybrid systems”. In: IEEE Transactions on Automatic Control 54.6 (2009), pp. 1193–1203. [142] A Agung Julius and AJ Van Der Schaft. “Bisimulation as congruence in the behavioral setting”. In: Proceedings of the 44th IEEE Conference on Decision and Control. IEEE. 2005, pp. 814–819. 179 [143] Agung A Julius, Antoine Girard, and George J Pappas. “Approximate bisimulation for a class of stochastic hybrid systems”. In: 2006 American Control Conference. IEEE. 2006, 6–pp. [144] Guy Katz, Clark Barrett, David L. Dill, Kyle Julian, and Mykel J. Kochenderfer. “Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks”. In: CAV. Ed. by Rupak Majumdar and Viktor Kunčak. 2017, pp. 97–117. isbn: 978-3-319-63387-9. [145] Eamonn J Keogh and Michael J Pazzani. “An Enhanced Representation of Time Series Which Allows Fast and Accurate Classification, Clustering and Relevance Feedback.” In: Kdd. Vol. 98. 1998, pp. 239–243. [146] Eamonn J Keogh and Michael J Pazzani. “Scaling up dynamic time warping for datamining applications”. In: KDD. 2000, pp. 285–289. [147] Narges Khakpour and Mohammad Reza Mousavi. “Notions of conformance testing for cyber-physical systems: Overview and roadmap”. In: 26th International Conference on Concurrency Theory (CONCUR 2015). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. 2015. [148] Z. Kong, A. Jones, and C. Belta. “Temporal Logics for Learning and Detection of Anomalous Behavior”. In: IEEE Trans. Aut. Control 62.3 (Mar. 2017), pp. 1210–1222. issn: 0018-9286. doi: 10.1109/TAC.2016.2585083. [149] Mark Koren, Saud Alsaif, Ritchie Lee, and Mykel J Kochenderfer. “Adaptive stress testing for autonomous vehicles”. In: 2018 IEEE Intelligent Vehicles Symposium (IV). IEEE. 2018, pp. 1–7. [150] Markus Koschi, Christian Pek, Mona Beikirch, and Matthias Althoff. “Set-based prediction of pedestrians in urban environments considering formalized traffic rules”. In: 2018 21st international conference on intelligent transportation systems (ITSC). IEEE. 2018, pp. 2704–2711. [151] Marta Kwiatkowska, Gethin Norman, and David Parker. “PRISM 4.0: Verification of probabilistic real-time systems”. In: International conference on computer aided verification. Springer. 2011, pp. 585–591. [152] Panagiotis Kyriakis, Jyotirmoy V. Deshmukh, and Paul Bogdan. “Specification Mining and Robust Design under Uncertainty: A Stochastic Temporal Logic Approach”. In: ACM TECS 18.5s (2019), 96:1–96:21. doi: 10.1145/3358231. [153] Brenden M. Lake, Tomer D. Ullman, Joshua B. Tenenbaum, and Samuel J. Gershman. “Building machines that learn and think like people”. In: Behavioral and Brain Sciences 40 (2017), e253. doi: 10.1017/S0140525X16001837. [154] Josephine Lamp, Simone Silvetti, Marc Breton, Laura Nenzi, and Lu Feng. “A Logic-Based Learning Approach to Explore Diabetes Patient Behaviors”. In: Proc. of CMSB 2019: the 17th International Conference on Computational Methods in Systems Biology. Vol. 11773. LNCS. Springer, 2019, pp. 188–206. doi: 10.1007/978-3-030-31304-3. 180 [155] Sander JJ Leemans and Artem Polyvyanyy. “Stochastic-aware conformance checking: An entropy-based approach”. In: Advanced Information Systems Engineering: 32nd International Conference, CAiSE 2020, Grenoble, France, June 8–12, 2020, Proceedings 32. Springer. 2020, pp. 217–233. [156] Sander JJ Leemans, Anja F Syring, and Wil MP van der Aalst. “Earth movers’ stochastic conformance checking”. In: Business Process Management Forum: BPM Forum 2019, Vienna, Austria, September 1–6, 2019, Proceedings 17. Springer. 2019, pp. 127–143. [157] Axel Legay, Benoıt Delahaye, and Saddek Bensalem. “Statistical model checking: An overview”. In: International conference on runtime verification. Springer. 2010, pp. 122–135. [158] Axel Legay, Anna Lukina, Louis Marie Traonouez, Junxing Yang, Scott A Smolka, and Radu Grosu. “Statistical model checking”. In: Computing and Software Science. Springer, 2019, pp. 478–504. [159] Axel Legay and Mahesh Viswanathan. “Statistical model checking: challenges and perspectives”. In: International Journal on Software Tools for Technology Transfer 17 (2015), pp. 369–376. [160] Jing Lei, Max G’Sell, Alessandro Rinaldo, Ryan J Tibshirani, and Larry Wasserman. “Distribution-free predictive inference for regression”. In: Journal of the American Statistical Association 113.523 (2018), pp. 1094–1111. [161] Jing Lei and Larry Wasserman. “Distribution-free prediction bands for non-parametric regression”. In: Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76.1 (2014), pp. 71–96. [162] Kendra Lesser, Meeko Oishi, and R Scott Erwin. “Stochastic reachability for control of spacecraft relative motion”. In: 52nd IEEE Conference on Decision and Control. IEEE. 2013, pp. 4705–4712. [163] Martin Leucker and Christian Schallhart. “A brief account of runtime verification”. In: The journal of logic and algebraic programming 78.5 (2009), pp. 293–303. [164] Karen Leung, Nikos Aréchiga, and Marco Pavone. “Backpropagation for parametric STL”. In: 2019 IEEE Intelligent Vehicles Symposium (IV). IEEE. 2019, pp. 185–192. [165] Lars Lindemann, Matthew Cleaveland, Gihyun Shim, and George J Pappas. “Safe Planning in Dynamic Environments using Conformal Prediction”. In: arXiv preprint arXiv:2210.10254 (2022). [166] Lars Lindemann, Lejun Jiang, Nikolai Matni, and George J Pappas. “Risk of stochastic systems for temporal logic specifications”. In: ACM Transactions on Embedded Computing Systems 22.3 (2023), pp. 1–31. [167] Lars Lindemann, Xin Qin, Jyotirmoy V Deshmukh, and George J Pappas. “Conformal prediction for stl runtime verification”. In: Proceedings of the ACM/IEEE 14th International Conference on Cyber-Physical Systems (with CPS-IoT Week 2023). 2023, pp. 142–153. 181 [168] Lars Lindemann, Alexander Robey, Lejun Jiang, Stephen Tu, and Nikolai Matni. “Learning Robust Output Control Barrier Functions from Safe Expert Demonstrations”. In: arXiv preprint arXiv:2111.09971 (2021). [169] Lars Lindemann, Alena Rodionova, and George Pappas. “Temporal Robustness of Stochastic Signals”. In: 25th ACM International Conference on Hybrid Systems: Computation and Control. 2022, pp. 1–11. [170] Jason Lines, Luke M Davis, Jon Hills, and Anthony Bagnall. “A shapelet transform for time series classification”. In: KDD. ACM. 2012, pp. 289–297. [171] Zachary C Lipton, John Berkowitz, and Charles Elkan. “A critical review of recurrent neural networks for sequence learning”. In: arXiv preprint arXiv:1506.00019 (2015). [172] Jun Liu, Necmiye Ozay, Ufuk Topcu, and Richard M Murray. “Synthesis of reactive switching protocols from temporal logic specifications”. In: IEEE Transactions on Automatic Control 58.7 (2013), pp. 1771–1785. [173] Anna Lukina, Christian Schilling, and Thomas A Henzinger. “Into the unknown: Active monitoring of neural networks”. In: International Conference on Runtime Verification. Springer. 2021, pp. 42–61. [174] Rachel Luo, Shengjia Zhao, Jonathan Kuck, Boris Ivanovic, Silvio Savarese, Edward Schmerling, and Marco Pavone. “Sample-efficient safety assurances using conformal prediction”. In: arXiv preprint arXiv:2109.14082 (2021). [175] James MacQueen et al. “Some methods for classification and analysis of multivariate observations”. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. Vol. 1. 14. Oakland, CA, USA. 1967, pp. 281–297. [176] Anirudha Majumdar and Marco Pavone. “How should a robot assess risk? towards an axiomatic theory of risk in robotics”. In: Robotics Research: The 18th International Symposium ISRR. Springer. 2020, pp. 75–84. [177] Oded Maler and Dejan Nickovic. “Monitoring properties of analog and mixed-signal circuits”. In: STTT 15.3 (2013), pp. 247–268. doi: 10.1007/s10009-012-0247-9. [178] Oded Maler and Dejan Nickovic. “Monitoring temporal prop/Hastieerties of continuous signals”. In: FORMATS. Springer, 2004, pp. 152–166. [179] Oded Maler and Dejan Nickovic. “Monitoring temporal properties of continuous signals”. In: International Symposium on Formal Techniques in Real-Time and Fault-Tolerant Systems. Springer. 2004, pp. 152–166. [180] Konstantinos Mamouras, Mukund Raghothaman, Rajeev Alur, Zachary G Ives, and Sanjeev Khanna. “StreamQRE: Modular specification and efficient evaluation of quantitative queries over streaming data”. In: ACM SIGPLAN Notices. Vol. 52. 6. ACM. 2017, pp. 693–708. 182 [181] Chiara Dalla Man, Francesco Micheletto, Dayu Lv, Marc Breton, Boris Kovatchev, and Claudio Cobelli. “The UVA/PADOVA type 1 diabetes simulator: new features”. In: Journal of diabetes science and technology 8.1 (2014), pp. 26–34. [182] Pascal Massart. “The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality”. In: The annals of Probability (1990), pp. 1269–1283. [183] Mathworks R2020a. Train DQN agent for Lane Keep Assist. https://www.mathworks.com/help/reinforcement-learning/ug/train-dqn-agent-for-lane-keepingassist.html. [184] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. “Playing Atari with Deep Reinforcement Learning”. en. In: NIPS. arXiv: 1312.5602. 2013. url: http://arxiv.org/abs/1312.5602 (visited on 08/16/2019). [185] Abdullah Mueen, Eamonn Keogh, and Neal Young. “Logical-shapelets: an expressive primitive for time series classification”. In: KDD. ACM. 2011, pp. 1154–1162. [186] Daniel Neider and Ivan Gavran. “Learning Linear Temporal Properties”. In: Proc. of FMCAD 2018: the 2018 Formal Methods in Computer Aided Design. IEEE, 2018, pp. 1–10. doi: 10.23919/FMCAD.2018.8603016. [187] Laura Nenzi, Simone Silvetti, Ezio Bartocci, and Luca Bortolussi. “A Robust Genetic Algorithm for Learning Temporal Specifications from Data”. In: QEST. Vol. 11024. Lecture Notes in Computer Science. Springer, 2018, pp. 323–338. doi: 10.1007/978-3-319-99154-2. [188] Luan Viet Nguyen, James Kapinski, Xiaoqing Jin, Jyotirmoy V. Deshmukh, Ken Butts, and Taylor T. Johnson. “Abnormal Data Classification Using Time-Frequency Temporal Logic”. In: HSCC. ACM, 2017, pp. 237–242. doi: 10.1145/3049797. [189] Dejan Nickovic, Xin Qin, Thomas Ferrère, Cristinel Mateis, and Jyotirmoy V. Deshmukh. “Shape Expressions for Specifying and Extracting Signal Features”. In: RV. 2019, pp. 292–309. [190] Robert T Olszewski. Generalized feature extraction for structural pattern recognition in time-series data. Tech. rep. Carnegie-Mellon Univ. School of Computer Science, 2001. [191] Necmiye Ozay. “An exact and efficient algorithm for segmentation of ARX models”. In: 2016 American Control Conference (ACC). IEEE. 2016, pp. 38–41. [192] Fabrizio Pastore, Daniela Micucci, and Leonardo Mariani. “Timed k-Tail: Automatic Inference of Timed Automata”. In: Proc. of ICST 2017: the 2017 IEEE International Conference on Software Testing, Verification and Validation. IEEE Computer Society, 2017, pp. 401–411. doi: 10.1109/ICST.2017.43. 183 [193] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. “PyTorch: An Imperative Style, High-Performance Deep Learning Library”. In: Advances in Neural Information Processing Systems 32. Ed. by H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. Fox, and R. Garnett. Curran Associates, Inc., 2019, pp. 8024–8035. [194] Theodosios Pavlidis. “Linguistic analysis of waveforms”. In: SEN Report Series Software Engineering. Vol. 2. Elsevier, 1971, pp. 203–225. [195] Theodosios Pavlidis and Steven L. Horowitz. “Segmentation of Plane Curves”. In: IEEE Trans. Comput. 23.8 (1974), pp. 860–870. issn: 0018-9340. doi: 10.1109/T-C.1974.224041. [196] Giulia Pedrielli, Tanmay Khandait, Surdeep Chotaliya, Quinn Thibeault, Hao Huang, Mauricio Castillo-Effen, and Georgios Fainekos. “Part-x: A family of stochastic algorithms for search-based test generation with probabilistic guarantees”. In: arXiv preprint arXiv:2110.10729 (2021). [197] Andy D Pimentel. “Exploring exploration: A tutorial introduction to embedded systems design space exploration”. In: IEEE Design & Test 34.1 (2016), pp. 77–90. [198] Srinivas Pinisetty, Thierry Jéron, Stavros Tripakis, Yliès Falcone, Hervé Marchand, and Viorel Preoteasa. “Predictive runtime verification of timed properties”. In: Journal of Systems and Software 132 (2017), pp. 353–365. [199] Amir Pnueli. “The temporal logic of programs”. In: Proc. Annual Symp. Found. Comp. Sci. Washington, DC, Oct. 1977, pp. 46–57. [200] Athanasios S Polydoros and Lazaros Nalpantidis. “Survey of model-based reinforcement learning: Applications on robotics”. In: Journal of Intelligent & Robotic Systems 86.2 (2017), pp. 153–173. [201] Xin Qin, Nikos Aréchiga, Andrew Best, and Jyotirmoy Deshmukh. “Automatic Testing With Reusable Adversarial Agents”. In: arXiv preprint arXiv:1910.13645 (2021). [202] Xin Qin and Jyotirmoy V Deshmukh. “Clairvoyant Monitoring for Signal Temporal Logic”. In: International Conference on Formal Modeling and Analysis of Timed Systems. Springer. 2020, pp. 178–195. [203] Xin Qin, Navid Hashemi, Lars Lindemann, and Jyotirmoy V. Deshmukh. “Conformance Testing for Stochastic Cyber-Physical Systems”. In: arXiv preprint arXiv:2308.06474 (2023). [204] Xin Qin, Yuan Xia, Aditya Zutshi, Chuchu Fan, and Jyotirmoy V Deshmukh. “Statistical verification of cyber-physical systems using surrogate models and conformal inference”. In: 2022 ACM/IEEE 13th International Conference on Cyber-Physical Systems (ICCPS). IEEE. 2022, pp. 116–126. 184 [205] Xin Qin, Yuan Xia, Aditya Zutshi, Chuchu Fan, and Jyotirmoy V Deshmukh. “Statistical verification using surrogate models and conformal inference and a comparison with risk-aware verification”. In: ACM Transactions on Cyber-Physical Systems (2024). [206] Nijat Rajabli, Francesco Flammini, Roberto Nardone, and Valeria Vittorini. “Software verification and validation of safe autonomous cars: A systematic literature review”. In: IEEE Access 9 (2020), pp. 4797–4819. [207] Thanawin Rakthanmanon, Bilson Campana, Abdullah Mueen, Gustavo Batista, Brandon Westover, Qiang Zhu, Jesin Zakaria, and Eamonn Keogh. “Searching and mining trillions of time series subsequences under dynamic time warping”. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM. 2012, pp. 262–270. [208] Vasumathi Raman, Alexandre Donzé, Dorsa Sadigh, Richard M Murray, and Sanjit A Seshia. “Reactive synthesis from signal temporal logic specifications”. In: Proceedings of the 18th international conference on hybrid systems: Computation and control. 2015, pp. 239–248. [209] Urs Ramer. “An iterative procedure for the polygonal approximation of plane curves”. In: Computer Graphics and Image Processing 1.3 (1972), pp. 244–256. issn: 0146-664X. doi: 10.1016/S0146-664X(72)80017-0. [210] Carl Edward Rasmussen. “Gaussian processes in machine learning”. In: Summer School on Machine Learning. Springer. 2003, pp. 63–71. [211] R. Tyrrell Rockafellar and Stanislav Uryasev. “Optimization of Conditional Value-at-Risk”. In: Journal of Risk 2 (2000), pp. 21–41. [212] R.Tyrrell Rockafellar and Stanislav Uryasev. “Conditional value-at-risk for general loss distributions”. In: Journal of Banking & Finance 26.7 (2002), pp. 1443–1471. issn: 0378-4266. doi: https://doi.org/10.1016/S0378-4266(02)00271-6. [213] Alena Rodionova, Ezio Bartocci, Dejan Nickovic, and Radu Grosu. “Temporal Logic as Filtering”. In: Proceedings of the 19th International Conference on Hybrid Systems: Computation and Control - HSCC ’16 (2016), pp. 11–20. arXiv: 1510.08079. [214] Alëna Rodionova, Lars Lindemann, Manfred Morari, and George J. Pappas. “Temporal Robustness of Temporal Logic Specifications: Analysis and Control Design”. In: ACM Trans. Embed. Comput. Syst. (July 2022). [215] Hendrik Roehm, Jens Oehlerking, Matthias Woehrle, and Matthias Althoff. “Model conformance for cyber-physical systems: A survey”. In: ACM Transactions on Cyber-Physical Systems 3.3 (2019), pp. 1–26. [216] Yaniv Romano, Evan Patterson, and Emmanuel Candes. “Conformalized quantile regression”. In: NeurIPS. 2019, pp. 3538–3548. 185 [217] Nima Roohi, Yu Wang, Matthew West, Geir E Dullerud, and Mahesh Viswanathan. “Statistical verification of the Toyota powertrain control verification benchmark”. In: Proceedings of the 20th International Conference on Hybrid Systems: Computation and Control. 2017, pp. 65–70. [218] Stéphane Ross and Drew Bagnell. “Efficient reductions for imitation learning”. In: Proceedings of the International Conference on Artificial Intelligence and Statistics. Sardinia, Italy, May 2010, pp. 661–668. [219] Peter J Rousseeuw. “Silhouettes: a graphical aid to the interpretation and validation of cluster analysis”. In: Journal of computational and applied mathematics 20 (1987), pp. 53–65. [220] Ivan Ruchkin, Matthew Cleaveland, Radoslav Ivanov, Pengyuan Lu, Taylor Carpenter, Oleg Sokolsky, and Insup Lee. “Confidence Composition for Monitors of Verification Assumptions”. In: 2022 ACM/IEEE 13th International Conference on Cyber-Physical Systems (ICCPS). IEEE. 2022, pp. 1–12. [221] Andrey Rudenko, Luigi Palmieri, Michael Herman, Kris M Kitani, Dariu M Gavrila, and Kai O Arras. “Human motion trajectory prediction: A survey”. In: The International Journal of Robotics Research 39.8 (2020), pp. 895–935. [222] John Rushby. “Partitioning for safety and security: Requirements, mechanisms, and assurance”. In: AFRL-IF-RS-TR’-2002-85 (2002), p. 9. [223] Sadra Sadraddini and Calin Belta. “Robust temporal logic model predictive control”. In: 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton). IEEE. 2015, pp. 772–779. [224] Ali Salamati, Sadegh Soudjani, and Majid Zamani. “Data-driven verification of stochastic linear systems with signal temporal logic constraints”. In: Automatica 131 (2021), p. 109781. [225] Ali Salamati, Sadegh Soudjani, and Majid Zamani. “Data-Driven Verification under Signal Temporal Logic Constraints”. In: IFAC-PapersOnLine 53.2 (2020), pp. 69–74. [226] Adam Santoro, Felix Hill, David G. T. Barrett, Ari S. Morcos, and Timothy P. Lillicrap. “Measuring abstract reasoning in neural networks”. In: Proc. of the 35th International Conference on Machine Learning, ICML 2018. Vol. 80. Proceedings of Machine Learning Research. PMLR, 2018, pp. 4477–4486. [227] Wilhelmus HA Schilders, Henk A Van der Vorst, and Joost Rommes. Model order reduction: theory, research aspects and applications. Vol. 13. Springer, 2008. [228] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. “Proximal policy optimization algorithms”. In: arXiv preprint arXiv:1707.06347 (2017). [229] Daniel Selvaratnam, Michael Cantoni, JM Davoren, and Iman Shames. “MITL Verification Under Timing Uncertainty”. In: arXiv preprint arXiv:2204.10493 (2022). 186 [230] Koushik Sen, Mahesh Viswanathan, and Gul Agha. “Statistical Model Checking of Black-Box Probabilistic Systems”. In: Computer Aided Verification. Ed. by Rajeev Alur and Doron A. Peled. Berlin, Heidelberg: Springer Berlin Heidelberg, 2004, pp. 202–215. isbn: 978-3-540-27813-9. [231] Glenn Shafer and Vladimir Vovk. “A Tutorial on Conformal Prediction.” In: Journal of Machine Learning Research 9.3 (2008). [232] Mary Sheeran, Satnam Singh, and Gunnar Stålmarck. “Checking safety properties using induction and a SAT-solver”. In: International conference on formal methods in computer-aided design. Springer. 2000, pp. 127–144. [233] Yasser Shoukry, Pierluigi Nuzzo, Alberto L Sangiovanni-Vincentelli, Sanjit A Seshia, George J Pappas, and Paulo Tabuada. “SMC: Satisfiability modulo convex optimization”. In: Proceedings of the 20th International Conference on Hybrid Systems: Computation and Control. 2017, pp. 19–28. [234] A Prasad Sistla, Miloš Žefran, and Yao Feng. “Runtime monitoring of stochastic cyber-physical systems with hybrid state”. In: International Conference on Runtime Verification. Springer. 2011, pp. 276–293. [235] Kamile Stankeviciute, Ahmed M Alaa, and Mihaela van der Schaar. “Conformal time-series forecasting”. In: Advances in Neural Information Processing Systems 34 (2021), pp. 6216–6228. [236] Bernhard Steffen, Falk Howar, and Malte Isberner. “Active Automata Learning: From DFAs to Interface Programs and Beyond”. In: Proc. of ICGI 2012: the Eleventh International Conference on Grammatical Inference. Vol. 21. JMLR Proceedings. JMLR.org, 2012, pp. 195–209. [237] Brian L Stevens, Frank L Lewis, and Eric N Johnson. Aircraft control and simulation: dynamics, controls design, and autonomous systems. John Wiley & Sons, 2015. [238] Xiaowu Sun, Haitham Khedr, and Yasser Shoukry. “Formal verification of neural network controlled autonomous systems”. In: Proceedings of the 22nd ACM International Conference on Hybrid Systems: Computation and Control. 2019, pp. 147–156. [239] Richard S. Sutton and Andrew G. Barto. Reinforcement learning: an introduction. en. Second edition. Adaptive computation and machine learning series. Cambridge, MA: The MIT Press, 2018. isbn: 978-0-262-03924-6. [240] Minghu Tan, Hong Shen, Kang Xi, and Bin Chai. “Trajectory prediction of flying vehicles based on deep learning methods”. In: Applied Intelligence (2022), pp. 1–22. [241] Quinn Thibeault, Jacob Anderson, Aniruddh Chandratre, Giulia Pedrielli, and Georgios Fainekos. “Psy-taliro: A python toolbox for search-based test generation for cyber-physical systems”. In: Formal Methods for Industrial Critical Systems: 26th International Conference, FMICS 2021, Paris, France, August 24–26, 2021, Proceedings 26. Springer. 2021, pp. 223–231. [242] Ryan J Tibshirani, Rina Foygel Barber, Emmanuel Candes, and Aaditya Ramdas. “Conformal prediction under covariate shift”. In: Advances in neural information processing systems 32 (2019). 187 [243] Hoang-Dung Tran, Feiyang Cai, Manzanas Lopez Diego, Patrick Musau, Taylor T Johnson, and Xenofon Koutsoukos. “Safety verification of cyber-physical systems with reinforcement learning control”. In: ACM Transactions on Embedded Computing Systems (TECS) 18.5s (2019), pp. 1–22. [244] Hoang-Dung Tran, Diago Manzanas Lopez, Patrick Musau, Xiaodong Yang, Luan Viet Nguyen, Weiming Xiang, and Taylor T. Johnson. “Star-Based Reachability Analysis of Deep Neural Networks”. In: Formal Methods – The Next 30 Years. Ed. by Maurice H. ter Beek, Annabelle McIver, and José N. Oliveira. Cham: Springer International Publishing, 2019, pp. 670–686. isbn: 978-3-030-30942-8. [245] Hoang-Dung Tran, Xiaodong Yang, Diego Manzanas Lopez, Patrick Musau, Luan Viet Nguyen, Weiming Xiang, Stanley Bak, and Taylor T Johnson. “NNV: the neural network verification tool for deep neural networks and learning-enabled cyber-physical systems”. In: International Conference on Computer Aided Verification. Springer. 2020, pp. 3–17. [246] Cumhur Erkan Tuncali and Georgios Fainekos. “Rapidly-exploring random trees-based test generation for autonomous vehicles”. In: arXiv preprint arXiv:1903.10629 (2019). [247] Cumhur Erkan Tuncali, Georgios Fainekos, Hisahiro Ito, and James Kapinski. “Simulation-based adversarial test generation for autonomous vehicles with machine learning components”. In: 2018 IEEE Intelligent Vehicles Symposium (IV). IEEE. 2018, pp. 1555–1562. [248] Dogan Ulus. “Montre: A Tool for Monitoring Timed Regular Expressions”. In: Computer Aided Verification - 29th International Conference, CAV 2017, Heidelberg, Germany, July 24-28, 2017, Proceedings, Part I. 2017, pp. 329–335. [249] Dogan Ulus, Thomas Ferrère, Eugene Asarin, and Oded Maler. “Online Timed Pattern Matching Using Derivatives”. In: Tools and Algorithms for the Construction and Analysis of Systems - 22nd International Conference, TACAS 2016, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2016, Eindhoven, The Netherlands, April 2-8, 2016, Proceedings. 2016, pp. 736–751. [250] Dogan Ulus, Thomas Ferrère, Eugene Asarin, and Oded Maler. “Timed Pattern Matching”. In: Formal Modeling and Analysis of Timed Systems (FORMATS). 2014, pp. 222–236. [251] Stanislav Uryasev. “Conditional value-at-risk: Optimization algorithms and applications”. In: proceedings of the IEEE/IAFE/INFORMS 2000 conference on computational intelligence for financial engineering (CIFEr)(Cat. No. 00TH8520). IEEE. 2000, pp. 49–57. [252] Prashant Vaidyanathan, Rachael Ivison, Giuseppe Bombara, Nicholas A. DeLateur, Ron Weiss, Douglas Densmore, and Calin Belta. “Grid-based temporal logic inference”. In: Proc. of CDC. IEEE, 2017, pp. 5354–5359. doi: 10.1109/CDC.2017.8264452. [253] Abhinav Verma, Vijayaraghavan Murali, Rishabh Singh, Pushmeet Kohli, and Swarat Chaudhuri. “Programmatically Interpretable Reinforcement Learning”. In: Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Vol. 80. Proceedings of Machine Learning Research. PMLR, 2018, pp. 5052–5061. 188 [254] Roman Vershynin. High-dimensional probability: An introduction with applications in data science. Vol. 47. Cambridge university press, 2018. [255] Abraham P Vinod, Joseph D Gleason, and Meeko MK Oishi. “SReachTools: a MATLAB stochastic reachability toolbox”. In: Proceedings of the 22nd ACM international conference on hybrid systems: computation and control. 2019, pp. 33–38. [256] Abraham P Vinod and Meeko MK Oishi. “Affine controller synthesis for stochastic reachability via difference of convex programming”. In: 2019 IEEE 58th Conference on Decision and Control (CDC). IEEE. 2019, pp. 7273–7280. [257] Michael P Vitus and Claire J Tomlin. “On feedback design and risk allocation in chance constrained control”. In: 2011 50th IEEE Conference on Decision and Control and European Control Conference. IEEE. 2011, pp. 734–739. [258] Vladimir Vovk, Alexander Gammerman, and Glenn Shafer. Algorithmic learning in a random world. Springer Science & Business Media, 2005. [259] Masaki Waga and Ichiro Hasuo. “Moore-Machine Filtering for Timed and Untimed Pattern Matching”. In: IEEE Trans. on CAD of Integrated Circuits and Systems 37.11 (2018), pp. 2649–2660. [260] Masaki Waga, Ichiro Hasuo, and Kohei Suenaga. “Efficient Online Timed Pattern Matching by Automata-Based Skipping”. In: Formal Modeling and Analysis of Timed Systems - 15th International Conference, FORMATS 2017, Berlin, Germany, September 5-7, 2017, Proceedings. 2017, pp. 224–243. [261] Masaki Waga, Ichiro Hasuo, and Kohei Suenaga. “MONAA: A Tool for Timed Pattern Matching with Automata-Based Acceleration”. In: 3rd Workshop on Monitoring and Testing of Cyber-Physical Systems, MT@CPSWeek 2018, Porto, Portugal, April 10, 2018. 2018, pp. 14–15. [262] Ying Wang and Fuqing Gao. “Deviation inequalities for an estimator of the conditional value-at-risk”. In: Operations Research Letters 38.3 (2010), pp. 236–239. [263] Yu Wang, Mojtaba Zarei, Borzoo Bonakdarpoor, and Miroslav Pajic. “Probabilistic conformance for cyber-physical systems”. In: Proceedings of the ACM/IEEE 12th International Conference on Cyber-Physical Systems. 2021, pp. 55–66. [264] Yu Wang, Mojtaba Zarei, Borzoo Bonakdarpour, and Miroslav Pajic. “Statistical verification of hyperproperties for cyber-physical systems”. In: ACM Transactions on Embedded Computing Systems (TECS) 18.5s (2019), pp. 1–23. [265] Martin Weiglhofer and Bernhard K. Aichernig. “Unifying Input Output Conformance”. In: Unifying Theories of Programming. Ed. by Andrew Butterfield. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010, pp. 181–201. isbn: 978-3-642-14521-6. [266] Florian Wenig, Peter Klanatsky, Christian Heschl, Cristinel Mateis, and Nickovic Dejan. “Exponential pattern recognition for deriving air change rates from CO2 data”. In: 26th IEEE International Symposium on Industrial Electronics, ISIE 2017, Edinburgh, United Kingdom, June 19-21, 2017. 2017, pp. 1507–1512. 189 [267] Cristina M Wilcox and Brian C Williams. “Runtime verification of stochastic, faulty systems”. In: International Conference on Runtime Verification. Springer. 2010, pp. 452–459. [268] Jinyu Xie. Simglucose v0.2.1. https://github.com/jxx123/simglucose. 2018. [269] Z. Xu and A. A. Julius. “Census Signal Temporal Logic Inference for Multiagent Group Behavior Analysis”. In: IEEE Transactions on Automation Science and Engineering 15.1 (2018), pp. 264–277. doi: 10.1109/TASE.2016.2611536. [270] Shakiba Yaghoubi and Georgios Fainekos. “Gray-box adversarial testing for control systems with machine learning components”. In: HSCC. 2019, pp. 179–184. [271] Lexiang Ye and Eamonn Keogh. “Time series shapelets: a new primitive for data mining”. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. 2009, pp. 947–956. [272] Hansol Yoon, Yi Chou, Xin Chen, Eric Frew, and Sriram Sankaranarayanan. “Predictive runtime monitoring for linear stochastic systems and applications to geofence enforcement for UAVs”. In: International Conference on Runtime Verification. Springer. 2019, pp. 349–367. [273] Hansol Yoon and Sriram Sankaranarayanan. “Predictive runtime monitoring for mobile robots using logic-based bayesian intent inference”. In: 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE. 2021, pp. 8565–8571. [274] Håkan LS Younes and Reid G Simmons. “Probabilistic verification of discrete event systems using acceptance sampling”. In: International Conference on Computer Aided Verification. Springer. 2002, pp. 223–235. [275] Håkan LS Younes and Reid G Simmons. “Statistical probabilistic model checking with a focus on time-bounded properties”. In: Information and Computation 204.9 (2006), pp. 1368–1409. [276] Xinyi Yu, Weijie Dong, Xiang Yin, and Shaoyuan Li. “Model Predictive Monitoring of Dynamic Systems for Signal Temporal Logic Specifications”. In: arXiv preprint arXiv:2209.12493 (2022). [277] Xinyi Yu, Weijie Dong, Xiang Yin, and Shaoyuan Li. “Online Monitoring of Dynamic Systems for Signal Temporal Logic Specifications with Model Information”. In: arXiv preprint arXiv:2203.16267 (2022). [278] Mojtaba Zarei, Yu Wang, and Miroslav Pajic. “Statistical verification of learning-based cyber-physical systems”. In: Proceedings of the 23rd International Conference on Hybrid Systems: Computation and Control. 2020, pp. 1–7. [279] Kuize Zhang and Majid Zamani. “Infinite-step opacity of nondeterministic finite transition systems: A bisimulation relation approach”. In: 2017 IEEE 56th Annual Conference on Decision and Control (CDC). IEEE. 2017, pp. 5615–5619. [280] Zhenya Zhang, Paolo Arcaini, and Ichiro Hasuo. “Hybrid System Falsification Under (In)Equality Constraints via Search Space Transformation”. In: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2020). 190 [281] Zhenya Zhang, Ichiro Hasuo, and Paolo Arcaini. “Multi-Armed Bandits for Boolean Connectives in Hybrid System Falsification”. In: Computer Aided Verification. Ed. by Isil Dillig and Serdar Tasiran. Cham: Springer International Publishing, 2019, pp. 401–420. isbn: 978-3-030-25540-4. [282] Yang Zhao and Kristin Y. Rozier. “Probabilistic model checking for comparative analysis of automated air traffic control systems”. In: 2014 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). 2014, pp. 690–695. doi: 10.1109/ICCAD.2014.7001427. [283] Jun Zhou, R. Ramanathan, Weng-Fai Wong, and P. S. Thiagarajan. “Automated property synthesis of ODEs based bio-pathways models”. In: Proc. of CMSB 2017. 2017, pp. 265–282. [284] Yuan Zhou, Yang Sun, Yun Tang, Yuqi Chen, Jun Sun, Christopher M Poskitt, Yang Liu, and Zijiang Yang. “Specification-based Autonomous Driving System Testing”. In: IEEE Transactions on Software Engineering (2023). [285] Paolo Zuliani, André Platzer, and Edmund M Clarke. “Bayesian statistical model checking with application to simulink/stateflow verification”. In: HSCC. 2010, pp. 243–252. 191
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Learning logical abstractions from sequential data
PDF
Assume-guarantee contracts for assured cyber-physical system design under uncertainty
PDF
Verification, learning and control in cyber-physical systems
PDF
Theoretical foundations for modeling, analysis and optimization of cyber-physical-human systems
PDF
Side-channel security enabled by program analysis and synthesis
PDF
Towards the efficient and flexible leveraging of distributed memories
PDF
Differential verification of deep neural networks
PDF
Improving binary program analysis to enhance the security of modern software systems
PDF
Security-driven design of logic locking schemes: metrics, attacks, and defenses
PDF
Understanding dynamics of cyber-physical systems: mathematical models, control algorithms and hardware incarnations
PDF
Sample-efficient and robust neurosymbolic learning from demonstrations
PDF
Constraint-based program analysis for concurrent software
PDF
Utilizing user feedback to assist software developers to better use mobile ads in apps
PDF
Theoretical foundations and design methodologies for cyber-neural systems
PDF
Defending industrial control systems: an end-to-end approach for managing cyber-physical risk
PDF
Automatic detection and optimization of energy optimizable UIs in Android applications using program analysis
PDF
Tensor learning for large-scale spatiotemporal analysis
PDF
Detection, localization, and repair of internationalization presentation failures in web applications
PDF
Theoretical and computational foundations for cyber‐physical systems design
PDF
Dynamic graph analytics for cyber systems security applications
Asset Metadata
Creator
Qin, Xin
(author)
Core Title
Data-driven and logic-based analysis of learning-enabled cyber-physical systems
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Degree Conferral Date
2024-12
Publication Date
09/09/2024
Defense Date
05/31/2024
Publisher
Los Angeles, California
(original),
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
cyber-physical systems,formal methods,OAI-PMH Harvest,testing,Verification
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Deshmukh, Jyotirmoy Vinay (
committee chair
), Bogdan, Paul (
committee member
), Chattopadhyay, Souti (
committee member
), Liu, Yan (
committee member
), Wang, Chao (
committee member
)
Creator Email
qxin23@gmail.com,xinqin@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC11399AILZ
Unique identifier
UC11399AILZ
Identifier
etd-QinXin-13501.pdf (filename)
Legacy Identifier
etd-QinXin-13501
Document Type
Dissertation
Format
theses (aat)
Rights
Qin, Xin
Internet Media Type
application/pdf
Type
texts
Source
20240909-usctheses-batch-1209
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
cyber-physical systems
formal methods
testing