Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Learning logical abstractions from sequential data
(USC Thesis Other)
Learning logical abstractions from sequential data
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Learning logical abstractions from sequential data by Sara Mohammadinejad A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) December 2022 Copyright 2023 Sara Mohammadinejad Acknowledgements I would like to express my deepest appreciation to my PhD advisor, Dr. Jyotirmoy Deshmukh, for his tremendous support of my research. It was an honor for me to have Jyotirmoy as my advisor. Jyotirmoy is a wonderful person. He always gave me the opportunity to choose the projects which suited my interests the best. He was always a great help with my projects, from the start point to the end. Jyotirmoy also supported me to land a well-suited job position, and allowed me to start the position during my PhD, which was a significant help to my carrier. Special thanks to my faculty collaborator and committee member Dr. Jesse Thomason, who helped me to start learning and carrying out research in the field of Natural Language Processing (NLP). Jesse was my best NLP teacher who showed me the most fascinating aspects of the field. I would like to thank my dissertation committee, Professors Chao Wang, Mukund Raghothaman and Paul Bogdan for their brilliant guidance. Finally, my family deserves endless gratitude for always being there for me. ii Table of Contents Acknowledgements ii List of Tables vi List of Figures vii Abstract xi Chapter 1: Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Summary of existing work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.1 Learning from time-series data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.2 Mining environment assumptions for CPS . . . . . . . . . . . . . . . . . . . . . . . 8 1.2.3 Learning from spatio-temporal data . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.2.4 Learning from natural language . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3 Hypotheses and Insights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.4.1 Learning from time-series data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.4.2 Mining environment assumptions for CPS . . . . . . . . . . . . . . . . . . . . . . . 13 1.4.3 Learning from spatio-temporal data . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.4.4 Learning from natural language . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.5 Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Chapter 2: Background 16 Chapter 3: Enumerative Learning of STL formulas 26 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2.1 Learning Parameters of PSTL formulas . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2.2 Learning Structure of PSTL formulas . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2.2.1 Signature-based Optimization . . . . . . . . . . . . . . . . . . . . . . . . 37 3.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.3.1 Maritime Surveillance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.3.2 Linear System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.3.3 Cruise Control of Train . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.4 User Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 iii Chapter 4: Mining environment assumptions for CPS 57 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.2.1 Learning Parameters of PSTL formulas . . . . . . . . . . . . . . . . . . . . . . . . 60 4.2.2 Environment Assumption Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.3.1 Benchmarking our DT-based parameter Inference . . . . . . . . . . . . . . . . . . . 71 4.3.1.1 Maritime Surveillance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.3.1.2 Linear System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.3.1.3 Cruise Control of Train . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.3.2 Benchmarking Environment Assumption mining . . . . . . . . . . . . . . . . . . . 74 4.3.2.1 Synthetic model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.3.2.2 Automatic Transmission Controller . . . . . . . . . . . . . . . . . . . . . 75 4.3.2.3 Abstract Fuel Control Model . . . . . . . . . . . . . . . . . . . . . . . . 76 4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Chapter 5: Learning Spatio-Temporal logic formulae 78 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.2.1 Constructing a Spatial Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.2.2 Learning STREL formulas from data . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.3.1 BSS data from the city of Edinburgh . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.3.2 COVID-19 data from LA County . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5.3.3 Outdoor Air Quality data from California . . . . . . . . . . . . . . . . . . . . . . . 99 5.3.4 Food Court Building . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Chapter 6: Learning from Natural Language and Demonstrations using Signal Temporal Logic 105 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.2.1 DIALOGUESTL: Learning PSTL Candidates . . . . . . . . . . . . . . . . . . . . . 107 6.2.2 DIALOGUESTL: Selecting Correct STL . . . . . . . . . . . . . . . . . . . . . . . . 111 6.2.3 Learning optimal policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 6.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 6.3.1 Results Across Description Types . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 6.3.2 Comparison with DeepSTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 Chapter 7: Related Work 119 7.1 Related Work on learning from time-series data . . . . . . . . . . . . . . . . . . . . . . . . 119 7.2 Related Work on mining environment assumptions for CPS . . . . . . . . . . . . . . . . . . 122 7.3 Related Work on learning from spatio-temporal data . . . . . . . . . . . . . . . . . . . . . . 124 7.4 Related Work on learning from natural language . . . . . . . . . . . . . . . . . . . . . . . . 124 iv Chapter 8: Conclusion and Future Work 126 8.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 8.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 8.2.1 Learning from time-series data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 8.2.2 Learning from spatio-temporal data . . . . . . . . . . . . . . . . . . . . . . . . . . 128 8.2.3 Learning from natural language . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Bibliography 130 Appendices 138 C Boolean and Quantitative Semantics of STREL . . . . . . . . . . . . . . . . . . . . . . . . 139 D Monotonicity Proofs for spatial operators . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 v List of Tables 3.1 Optimization using signature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.2 Results of our enumerative solver. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.1 Summary of results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.2 Run time of the learning algorithm (seconds), number of isolated nodes in the spatial model . , . (threshold for time-out is set to 30 minutes). . . . . . . . . . . . . . . . . . . . . . . . . 94 6.1 DIALOGUESTL results on sample natural language inputs across 142 GPT-3 paraphrases of the inputs for fixed user demonstrations per row (#Ds). We report the most frequent STL prediction and whether it is correct or not. The task types include (C)onstraint, (s)ingle, se(Q)uence, (M)ultiple-choice and con(D)itional. Note that “≥ 0 ′′ is removed from all atoms for brevity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 6.2 DIALOGUESTL performance on sample natural language inputs across 142 GPT-3 paraphrases of the inputs for fixed user demonstrations per row. We report the average number of enumerated formulas (#EFs), average user interactions (#UIs) to select a final formula, success rate (SR) of finding the exact match correct formula, and average runtime in seconds(RT). The task types include (C)onstraint, (s)ingle, se(Q)uence, (M)ultiple-choice and con(D)itional. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 6.3 Success rate (SR) and accuracy (ACC) comparison of DialogueSTL and DeepSTL on natural language inputs across 142 GPT-3 generated paraphrases. . . . . . . . . . . . . . . . 118 vi List of Figures 1.1 Pacemaker is example of a medical cyber-physical-system (CPS) that generates ECG signals [66]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Supervised learning using formal parametric logic. . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Active learning using formal parametric logic. . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Unsupervised learning using formal parametric logic. . . . . . . . . . . . . . . . . . . . . . 5 1.5 Human-in-the-loop learning using formal parametric logic. . . . . . . . . . . . . . . . . . . 6 2.1 The robot tries to reach the lamp placed at location (0,0) in 15 seconds while avoiding wall (black) and water (blue) tiles. Both green (—) and red (—) demonstrations satisfy the formula F [0,15] (robotAt(0,0)≥ 0); in the next 15 seconds, the robot should eventually reach to the location(0,0). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2 Illustration of the notion of robust satisfaction value or robustness. Robustness value is an approximation of signed distance between a trace and the set of traces satisfying or violating an STL formula. Consider the formulasϕ 1 = G [0,10] (x≤ 3) andϕ 2 = F [0,10] (x<− 3). The robustness of the blue trace with respect toϕ 1 is the signed distance between the blue trace and the red trace (violating trace). Similarly, the robustness of the blue trace with respect to ϕ 2 is the signed distance between the blue trace and the green trace (satisfying trace). . . . . 21 2.3 Illustration of the notion of robust satisfaction value or robustness for the STL formula ϕ = G [50,100) (x≤ 3). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.1 The high level flow of our enumerative learning technique. . . . . . . . . . . . . . . . . . . 28 3.2 Illustration of the method to recursively approximate satisfaction boundary of a PSTL formula to an arbitrary precision. Green arrows indicate the monotonicity direction (both decreasing). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.3 Change of F 1 score with respect to the number of random sample parameters (m) for fixed number of random traces (n=10). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 vii 3.4 Change of F 1 score with respect to the number of random traces (n) for fixed number of sample parameters (m=10). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.5 Naval surveillance data set [16] (Green traces: normal trajectories, red and blue traces: two kinds of anomalous trajectories and dash lines: the thresholds of the STL learned by enumerative solver (= 36.3260, 28.8430 )). . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.6 Simulation results of the linear system (Green traces: normal operation of the system, red traces: anomalous behavior of the system and dash line: the threshold of the STL learned by enumerative solver (= 0.9736)). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.7 Simulation results of cruise control of the train (Green traces: normal operation, red traces: anomalous behavior and dash line: the threshold of the STL learned by enumerative solver (= 35.8816)). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.8 User study RQ1: Choosing the best natural language description for a given STL formula. . 50 3.9 User study RQ2: Choosing the best STL formula for a given trace. . . . . . . . . . . . . . . 51 3.10 User study RQ3: Choosing the best STL formula that describes the two videos v1 and v2. . . 51 3.11 User study RQ4: Repairing the incorrect STL formula. . . . . . . . . . . . . . . . . . . . . 52 3.12 User study RQ5: The distribution of interpretability score of STL formulas based on the users’ opinions. Score range is from 1 to 10 which 10 shows the highest degree of interpretability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.13 User study RQ6: Comparison of STL operators in terms of difficulty to interpret. . . . . . . 53 3.14 User study RQ8: Asking users to sort the STL formulas from easiest to most difficult. . . . . 54 3.15 User study RQ8: Longer STL formulas potentially hamper interpretability. . . . . . . . . . . 55 3.16 User study RQ8: As the number of nested operators increases, the formula becomes more difficult to interpret. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.1 The LHS illustrates our STL-based classification framework (Chapter 3), and the RHS shows the high level flow of our environment assumption mining technique that leverages our classification tool. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.2 Grid sampling of time parameters for formula G [τ 1 ,τ 2 ] (x(t)> c). Since τ 1 should be less thanτ 2 , the area aboveτ 1 =τ 2 line (green line) is sampled. . . . . . . . . . . . . . . . . . . 63 4.3 Example Tree returned byTrainDecisionTree function in Algo. 4. . . . . . . . . . . . . . . 64 4.4 The Simulink model of automatic transmission controller [49]. Inputs of the system are throttle and brake. RPM, gear and speed are outputs of the system. . . . . . . . . . . . . . . 76 viii 4.5 Violating traces for the automatic transmission controller. . . . . . . . . . . . . . . . . . . . 77 5.1 Interpretable clusters automatically identified by our technique. (a) Clusters learned, (b) BSS locations in Edinburgh. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.2 Different approaches for constructing the spatial model for the BSS. (a) shows an(∞,d hvrsn )- connectivity spatial model where d hvrsn is the Haversine distance between locations. (b) shows a (δ,d hvrsn )-connectivity spatial model where δ = 1km. Observe that the spatial model is disconnected. (c) shows an MST-spatial model. (d) shows an(α,d hvrsn ) enhanced MSG spatial model withα = 2. Observe that this spatial model is sparse compared even to the(δ,d hvrsn )-connectivity spatial model. . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.3 Illustration of clustering on the BSS locations. (a)ϕ yellow =ϕ(τ 2 ,d 2 )∧¬ϕ pink ∧¬ϕ blue . (b) The result of applying decision tree algorithm on labeled parameter valuations shown in Fig. 5.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.4 Using KMeans approach from tslearn library with Dynamic Time Warping (DTW) metric to cluster BSS spatio-temporal traces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5.5 Using Kshape approach from tslearn library to cluster BSS spatio-temporal traces. . . . . . . 98 5.6 Procedure to learn STREL formulas from COVID-19 data. (a) The learned hyper-boxes before pruning the DT. (b) The learned hyper-boxes after pruning the DT. (c) Red-color points: hot spots. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.7 Changing of clustering with respect to time for the COVID-19 data and PSTREL formula ϕ(c,d)=3 [0,d] {F [0,10] (x> c)}. The plots confirm the rapid spread of the COVID-19 virus in LA county from April 2020 to September 2020. . . . . . . . . . . . . . . . . . . . . . . . 100 5.8 Clustering experiments on the California Air Quality Data. (a) The learned Hyper-boxes from Air Quality data. (b) Red and orange points: high density of PM2.5. . . . . . . . . . . 101 5.9 Results of the Food Court Building case study. . . . . . . . . . . . . . . . . . . . . . . . . . 103 6.1 We infer an STL formula and optimal policy from a given natural language description, a few demonstrations, and questions to the user. . . . . . . . . . . . . . . . . . . . . . . . . . 106 6.2 We infer an STL formulaϕ best = F [0,15] (lampOn≥ 0∧ F [0,10] (itemOnRobot(purpleCube)≥ 0)) and an optimal policy from a given natural language description “Turn on the lamp and pick up the cube”, a demonstration (—), and interactions with the user. NL splitter extracts components of the Natural Language (i.e., “Turn on the lamp”, “and”, “pick up the cube”). Each component is mapped to an atom or operator using the Atom and Operator Predictors {“Turn on the lamp”:lampOn≥ 0, “and”:∧, “pick up the cube”:itemOnRobot≥ 0}. Next, candidate PSTL formulas are generated from the predicted atoms and operators. Asking questions from the user can help learn parameters of PSTL formulas and therefore, learning the best STL formula. Finally, Deep RL techniques are employed to learn an optimal policy from the learned STL formula. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 ix 6.3 The learned policy (—) from theϕ task = F [0,15] (lampOn∧ F [0,10] (itemOnRobot(purpleCube))).113 x Abstract Sequential data refers to data where order between successive data-points is important. Time-series data and spatio-temporal data are examples of sequential data. A time-series datum is an ordered sequence of data values where each data point is a unique time-stamp. A spatio-temporal datum can be viewed as a set of time- series data, which each time-series datum is associated with a unique spatial location. Cyber-physical system applications such as autonomous vehicles, wearable devices, and avionic systems generate a large volume of time-series data. The Internet-of-Things, complex sensor networks, multi-agent cyber-physical systems are all examples of spatially distributed systems that continuously evolve in time, and such systems generate huge amounts of spatio-temporal data. Designers often look for tools to extract high-level information from such data. Traditional machine learning (ML) techniques for sequential data offer several solutions to solve these problems; however, the artifacts trained by these algorithms often lack interpretability. For instance, Recurrent Neural Networks (RNNs) are among the most popular models used by machine learning techniques for solving various learning problems. However, RNNs generate black-box models that may be difficult to interpret due to their high-dimensionality, highly nonlinear nature and a lack of immediate connection to visible patterns in the data. These complex ML models demonstrate good performance in various learning tasks; however, in many application settings, analysts require an understanding of why the model produces a particular output. For example, in a classification task, it is important to understand which features in the data led to the data being assigned a particular class label. A definition of interpretability by Biran and Cotton [15] is: Models are interpretable if their decisions can be understood by humans. xi Formal parametric logic, such as Signal Temporal Logic (STL) and Spatio-temporal Reach and Escape Logic (STREL) are seeing increasing adoption in the formal methods and industrial communities as go-to specification languages for sequential data. Formal parametric logic are machine-checkable, and human- understandable abstractions for sequential data, and they can be used to tackle a variety of learning problems that include but are not limited to classification, clustering and active learning. The use of formal parametric logic in the context of machine learning tasks has seen considerable amount of interest in recent years. We make several significant contributions to this growing body of literature. This dissertation makes five key contributions towards learning formal parametric logic from sequential data. (1) We develop a new technique for learning STL-based classifiers from time-series data and provide a way to systematically explore the space of all STL formulas. (2) We conduct a user study to investigate whether STL formulas are indeed understandable to humans. (3) As an application of our STL-based learning framework, we investigate the problem of mining environment assumptions for cyber-physical system models. (4) We develop the first set of algorithms for logical unsupervised learning of spatio-temporal data and show that our method generates STREL formulas of bounded description complexity. (5) We design an explainable and interactive learning approach to learn from natural language and demonstrations using STL. Finally, we showcase the effectiveness of our approaches on case studies that include but are not limited to urban transportation, automotive, robotics and air quality monitoring. xii Chapter 1 Introduction This chapter talks about the motivation behind our work, limitations of existing work, our hypotheses for resolving those limitations and finally the core contributions of this dissertation. 1.1 Motivation Cyber-Physical Systems (CPS) integrate computation and control with physical processes [58], and have several applications in today’s world. Self-driving cars, unmanned aerial vehicles, general purpose robots, and medical devices are examples of safety critical CPS, and ensuring the correct design of such systems is of paramount importance. To achieve this goal, designers are interested in understanding and learning the properties of the entire system. Such systems consist of heterogeneous components, and each of these components could itself be quite complex. Understanding the dynamical behavior of individual components can be quite complex and thus, engineers are interested in extracting high-level behavior of such systems. Since CPS generate huge amounts of sequential data such as time-series data, learning from such data may help understand the high-level behavior of the CPS. Implantable Pacemaker is a medical device for delivering timely electrical pulses to the heart [105], and the main function of a pacemaker is to ensure safe and efficient cardiac output [79] (See Fig. 1.1). A cardiologist might be interested in analyzing the traces generated from a pacemaker to make sure it will 1 Figure 1.1: Pacemaker is example of a medical cyber-physical-system (CPS) that generates ECG signals [66]. not cause any dangers for patients. Classifying patients’ electrocardiogram (ECG) traces to healthy and unhealthy might be another problem of interest for a cardiologist because learning such a classifier can automate the process of detecting unhealthy ECG signals. Consider a bike-sharing system (BSS) as another example. Here, the system consists of a number of bike stations that would use sensors to detect the number of bikes present at a station, and use incentives to let users return bikes to stations that are running low. Clustering the spatio-temporal data generated from a BSS may help understand the behavior and limitations of different bike stations. For instance, stations that are far away from other stations should always have bike availability in a short wait time. Otherwise, the customers have to walk a long distance to nearby stations to rent a bike. Next, we explain with more detail the motivation behind the following works that all tackle the problem of learning from sequential data generated from CPS. Learning from time-series data. Cyber-physical systems (CPS) generate large amounts of data due to a proliferation of sensors responsible for monitoring various aspects of the system. Designers are typically interested in extracting high-level information from such data but due to its large volume, manual analysis is not feasible. Traditional machine learning (ML) techniques for time-series data offer several solutions to 2 Figure 1.2: Supervised learning using formal parametric logic. solve these problems; however, the artifacts trained by these algorithms often lack interpretability. Existing interpretable ML techniques such as Linear Regression and Decision Trees do not always produce inter- pretable results when applied to the problem of learning from sequential data. On the other hand, Signal Temporal Logic (STL) is a popular formalism to express properties of time-series data in several application contexts that include but are not limited to automotive systems [51, 48, 10], analog circuits [65], biology [89] and robotics [101]. STL is a logic over Boolean and temporal combinations of signal predicates which arguably allows for human-interpretable specification of system requirements. For instance, in automotive domain, STL can be used to formulate properties such as “the car successfully stops before hitting an ob- stacle” [54]. In contrast to traditional machine learning approaches that may use large feature sets thus resulting in uninterpretable artifacts, we propose to learn classifiers that can be expressed as STL formulae. This gives us the advantage that the classifier can be interpreted by the analyst. Formally, given a set of labeled traces X and a labeling function L : X→[0,ℓ], the goal is to learn STL formula classifiers ϕ 1 ,ϕ 2 , ..., ϕ ℓ s.t., X⊨ ϕ i ⇐⇒ L(X)= i and X⊭ϕ j ,∀ j̸= i (see Fig. 1.2). X⊨ ϕ i means that X satisfies ϕ i and X⊭ϕ j means that X does not satisfyϕ j . Mining environment assumptions for CPS. Autonomous cyber-physical systems such as self-driving cars, unmanned aerial vehicles, general purpose robots, and medical devices can often be modeled as a system 3 Figure 1.3: Active learning using formal parametric logic. consisting of heterogeneous components. Each of these components could itself be quite complex, and understanding the high-level behavior of such components at an abstract, behavioral level is thus a significant challenge. We assume that the correctness of each component can be specified as a formal specification over the signals or time-series data output by the component. Specifically, we target specifications/requirements expressed using Signal Temporal Logic (STL). We then hypothesize that a large subset of input signals for which the corresponding output signals satisfy the output requirement can also be compactly described using an STL formula that we call an environment assumption. We propose an algorithm to mine such an environment assumption using an active learning approach. Active learning is a special case of machine learning in which a learning algorithm can query an oracle interactively to label data with the desired outputs [88]. In our method, the oracle is a falsification tool implemented in Breach [30] and is asked if the learned environment assumption is correct or not. If the oracle answers yes, the result is returned. Otherwise, the oracle provides a counter example. The counter example is used to refine the formula that is learned (see Fig. 1.3). Learning from spatio-temporal data. Spatially distributed systems such as multi-agent CPS generate huge amounts of spatio-temporal data, and system designers are often interested in analyzing and discover- ing structure within the data. There has been considerable interest in learning causal and logical properties of temporal data using logic such as Signal Temporal Logic (STL); however, there is limited work on dis- covering such relations on spatio-temporal data. We propose the first set of algorithms for unsupervised learning from spatio-temporal data using Spatio-temporal Reach and Escape Logic (STREL). STREL is a 4 Figure 1.4: Unsupervised learning using formal parametric logic. logic that was introduced in [11] as a formalism for monitoring spatially distributed cyber-physical systems, which extends STL with a set of spatial operators. Formally, given a set of unlabeled spatio-temporal data X, the goal is to group X into k clusters C 1 , C 2 , ..., C k and learn STREL formulasϕ 1 ,ϕ 2 , ...,ϕ k s.t., X i ⊨ ϕ i and X⊭ϕ j ,∀ j̸= i (see Fig. 1.4). Learning from natural language. Natural language (NL) can be used to express information about several aspects of autonomous CPS including safety specifications, performance requirements, and task objectives. NL sentences in this context could themselves be viewed as a form of sequential data. While NL is ambiguous, real world tasks and their safety requirements need to be communicated unambiguously. Signal Temporal Logic (STL) is a formal logic that can serve as a versatile, expressive, and unambiguous formal language to describe robotic tasks. On one hand, existing work in using STL for the robotics domain typically requires end-users to express task specifications in STL – a challenge for non-expert users. On the other, translating from NL to STL specifications is currently restricted to specific subsets of STL. For example, the work in [39] allows the conjunction and disjunction of only two atomic propositions, and some nested formulas are not supported. In this work, we propose DIALOGUESTL, a human-in-the-loop learning approach for learning correct and concise STL formulas from (often) ambiguous NL descriptions aided by a small number of user demonstrations. Human-in-the-loop learning is similar to active learning except that 5 Figure 1.5: Human-in-the-loop learning using formal parametric logic. there is no oracle that knows the correct formula. In fact, there is no mechanical way to verify if the learned formula is the “correct” formula, as the users themselves may not know the unambiguous temporal logic specification for their desired task objective. In this case, the user can observe the policies generated by a suitable decision-making algorithm (that guarantees satisfaction of the task objective), and then indicate approval or disapproval of the learned formula. In our setup, we require the users to provide additional data in case there are ambiguities in their task objectives (see Fig. 1.5). 1.2 Summary of existing work In order to outline our key contributions, we briefly provide a summary of the existing work in each sub- problem tackled in this dissertation, and identify certain limitations that we tackle in subsequent chapters. 1.2.1 Learning from time-series data Traditional ML approaches for time-series learning. Existing ML approaches for time-series learning combine techniques such as k-nearest neighbours [22, 103, 21], support vector machines (SVM) [87, 20, 86], k-means clustering [42], Hierarchical Clustering and agglomerative clustering [24] with similarity met- rics between time-series data such as the Euclidean distance, dynamic time-warping (DTW) distance, and statistical measures that include but are not limited to mean, median and correlation. Recent work on the idea of “shapelets” relies on automatically identifying distinguishing shapes in the time-series data [106]; such shapelets serve as features for ML tasks and can provide interpretable results [105]. Most of these 6 approaches are based on shape-similarity which might be useful in some applications; however, for appli- cations where the user is interested in mining temporal information from data, dissimilar traces might be clustered in the same group [96]. The work by Vazquez-Chanlatte et al. [96] gives an example of a clustering problem that dissimilar signals with similar logical properties are clustered in the same group. Deep learning approaches for time-series classification have recently received lots of attention [43, 52, 44]. For example, Recurrent Neural Networks (RNNs) are specifically designed for learning from sequential data and have applications in a variety of domains such as speech recognition, machine translation and natu- ral language processing. While RNNs achieve good accuracy compared to the state-of-the-art techniques for time-series classification, they lack interpretability due to their highly nonlinear nature, high-dimensionality and a lack of immediate connection to the input patterns. While existing interpretable ML techniques such as linear regression, logistic regression and decision trees improve human readability, such techniques are not specialized in learning temporal properties of timed traces. A naïve application of a decision tree to timed traces would treat every time instant in the trace as a decision variable, leading to deep trees that lose interpretability [70]. We empirically demonstrate this in Chapter 4. STL-based approaches for time-series learning. There has been keen recent interest in learning STL specifications from data. Some of this work has focused on supervised learning (where given labeled traces, an STL formula is learned to distinguish positively labeled traces from negatively labeled traces) [16, 54, 74, 37], or clustering (where signals are clustered together based on whether they satisfy similar STL for- mulas) [96, 97], or anomaly detection (where STL is utilized for recognizing the anomalous behavior of an embedded system) [50]. There are two main approaches in learning STL formulas from data: template-based learning and template-free learning. In template-based methods, the user provides a template in the form of a parametric STL formula (PSTL formula) and the learning algorithm infers only the values of parameters from data [96, 7 97, 49, 47, 46]. Without a good understanding of the application domain, choosing the appropriate PSTL formula can be challenging. Furthermore, in template-based methods the users may provide very specific templates which may make it difficult to derive new knowledge from data [16]. Template-free methods learn both the structure of the underlying STL formula and the formula itself. While these techniques have been proven effective for many diverse applications [50, 54, 16, 74, 37], they may produce long and complicated STL classifiers that hamper interpretability; we empirically demonstrate this in Chapter 3. Furthermore, some of existing works on template-free learning focus on STL fragments that do not allow properties such as concurrent eventuality, repeated patterns and persistence to be expressed [54]. 1.2.2 Mining environment assumptions for CPS Many complex cyber-physical systems such as general purpose robots, unmanned aerial vehicles, self- driving cars and medical devices can be modeled as heterogeneous components interacting with each other in real-time. As such components are quite complex, understanding the high-level behavior of them is a great challenge. Contract-based reasoning [76, 60] is a potential approach for compositional reasoning of such complex component-based CPS models. Here, a design component C is modeled in terms of environ- ment assumptions, i.e., assumptions on the timed input traces to C, and output guarantees, i.e. properties satisfied by the corresponding model outputs. A big challenge is that designers do not often articulate such assumptions and guarantees using logical, machine-checkable formalisms [102]. There are existing works such as [49, 23, 41] that use formal logic to learn output requirements; however, we are interested in mining environment assumptions and not output requirements, i.e., an active learning procedure that separates input traces that lead to outputs satisfying/violating an output requirement. 8 1.2.3 Learning from spatio-temporal data There has been considerable interest in learning causal and logical properties of temporal data using logics such as Signal Temporal Logic (STL) [96, 49, 71, 70, 74]; however, there is limited work on discovering such relations on spatio-temporal data. There are a few hurdles in applying existing STL-based learning techniques to spatio-temporal data. For example, in [96], the authors assume a monotonic fragment of PSTL: there is no such fragment identified in the literature for the spatio-temporal logic STREL. Second, in [96], the authors assume that clusters in the parameter space can be separated by axis-aligned hyper-boxes which is a restrictive assumption for some case studies. For example, the points in Fig. 5.6a cannot be separated using axis-aligned hyper-boxes. Third, given spatio-temporal data, we can have different choices to impose the edge relation on nodes, which can affect the formula we learn; this challenge does not exist for temporal data. In the past, analytic models based on partial differential equations (e.g. diffusion equations) [35] have been used to express the spatio-temporal evolution of these systems. While such formalisms are incred- ibly powerful, they may not be as interpretable as logical formulas. Traditional machine learning (ML) approaches have also been used to uncover the structure of such spatio-temporal systems. However, in Chapter 5, we empirically demonstrate that these techniques may suffer from the lack of interpretability. Our proposed method draws on a recently proposed logic known as Spatio-Temporal Reach and Escape Logic (STREL) [11]. Recent research on STREL has focused on efficient algorithms for runtime verifi- cation and monitoring of STREL specifications [11, 12]. However, there is no existing work on mining STREL specifications. MoonLight [12] which is a recent tool for monitoring of STREL properties uses a naïve method for creating a spatial model that has several issues, including disconnectivity and distance overestimation (See Fig. 5.2b). We resolve these issues by proposing our new method for creating a spatial 9 graph. The details will be explained in Chapter 5. While there are many work on monitoring of spatio- temporal logic, to the best of our knowledge, there is no work on automatically inferring spatio-temporal logic formulas from data that we address in this dissertation. 1.2.4 Learning from natural language State-of-the-art for learning STL formulae from natural language is DeepSTL [39], which translates infor- mal English requirements to STL by training a sequence-to-sequence transformer on synthetic data gener- ated from a hand-defined STL-to-English grammar; DeepSTL is restricted to this specific fragment of STL. For example, it allows the conjunction and disjunction of only two atomic propositions, and some nested formulas are not supported. In Chapter 6, we empirically show that our work achieves better accuracy compared to DeepSTL and is significantly faster to train. Furthermore, DeepSTL is a black-box learning algorithm that can lack transparency in mapping which parts of the STL formula are learned from corre- sponding parts of the NL sentence; in contrast, our approach generates explanation dictionaries that explain how different parts of a natural language sentence are mapped to different components of the predicted STL formula. 1.3 Hypotheses and Insights Here, we propose a set of hypotheses to improve the limitations of existing techniques for learning from sequential data. We start with hypotheses for learning short and understandable STL-based classifiers from time-series data. Hypothesis A1: Enumerative learning can generate shorter STL-based classifiers compared to existing approaches. 10 According to Hypothesis A1, enumeration-based techniques generate shortest possible STL-based clas- sifiers because such approaches investigate STL formulas in increasing order of their length. Prior work on learning STL formulas from time-series data [16, 50, 74] may generate long STL formulas; in comparison, we empirically show that our tool can learn shorter STL formulas. Hypothesis A2: STL formulas are understandable by humans. Hypothesis A3: Longer STL formulas potentially hamper interpretability. We performed a user study to test the hypotheses A1 and A2. Details of the user study will be explained in Section. 3.4. Hypothesis B: A large subset of input signals for which the corresponding output signals satisfy the output requirement can also be compactly described using an STL formula. Hypothesis B suggests a technique for learning environment assumptions for a CPS model. The intuition behind this hypothesis is that input traces that lead to satisfying outputs should share a similar temporal behavior that can be captured by an STL formula. Hypothesis C: Spatial regions satisfying similar temporal properties can be clustered using STREL formulas. Hypothesis C proposes a new technique for clustering of spaio-temporal data using STREL formulas. Intuitively, spatio-temporal traces with similar properties satisfy similar STREL formulas, and thus, STREL formulas can be used for clustering of such data. Hypothesis D1: NL can be mapped to STL. Hypothesis D2: Interactions with the user can help convert an ambiguous natural language sentence to an unambiguous formula such as STL for describing robotic tasks. 11 We test the hypotheses D1 and D2 by providing an interactive approach for learning unambiguous STL formulas from ambiguous natural language descriptions of robotic tasks. 1.4 Contributions This dissertation empirically tests the aforementioned hypotheses by proposing four novel techniques for interpretable learning from sequential data. The first method carries out Hypothesis A’s idea, which tries to build concise and understandable STL-based classifiers from time-series data. The second work seeks to learn interpretable STL-based environment assumptions for CPS models, which is the approach proposed by Hypothesis B. The concept proposed by Hypothesis C is put into practice in the third study that deals with the clustering of spatio-temporal data. Finally, Hypothesis D is the main idea behind the last piece of work which aims to learn STL formulas from natural language and robot demonstrations. To test our hypotheses, we compare each technique to the best available alternatives. The methods and findings that constitute this dissertation’s main contributions are outlined below. 1.4.1 Learning from time-series data To address the limitations of some of the state-of-the-art methods, we propose an enumerative technique for template-free learning of STL classifiers. An enumerative technique takes as input an ordered grammar and then, systematically applies production rules to enlist valid sentences/formulas in the grammar. Our method checks whether the enumerated formula is “suitable” and stops if yes. Our enumerative technique can hopefully generate STL classifiers that are shorter and hence more interpretable. Formally, given a set of positively labeled traces X 1 and a set of negatively labeled traces X 0 , our enumerative technique aims to learn an STL formula that evaluates to true for most of the traces in X 1 and to false for most of the traces in X 0 with minimal misclassification rate * . * A trace is misclassified if it is in X 0 but the formula evaluates to true, or vice versa 12 We combine enumeration of PSTL formulas with a parameter inference technique based on binary search to check whether the enumerated PSTL formula is suitable. We also use an optimization approach to avoid enumerating semantically equivalent formulas. Our proposed method is evaluated on a variety of bench- marks from the automotive and transportation domains. The results show that our tool can learn simpler STL classifiers compared to state of the art (at a small price in misclassification rate). 1.4.2 Mining environment assumptions for CPS Next, as an application of our classification framework, we investigate the problem of mining environment assumptions for CPS models. Many complex CPS such as self-driving cars, unmanned aerial vehicles, general purpose robots, and medical devices can be modeled as heterogeneous components interacting with each other in real-time. Each of these components could itself be quite complex and understanding the high-level behavior of such components at an abstract, behavioral level is thus a significant challenge. The complexity of individual components makes compositional reasoning about global properties a difficult task. We assume that the correctness of each component can be specified as a requirement satisfied by the output signals of the component, and that such an output guarantee is expressed in a real-time temporal logic such as STL. We propose an algorithm to mine an environment assumption using our previously developed supervised learning technique. Formally, we consider the following problem: Given an output requirement ϕ out , what are the assumptions on the model environment, i.e., input traces to the model, that guarantee that the corresponding output traces satisfyϕ out ? We propose a counterexample-guided inductive synthesis algorithm that combines systematic enumeration of PSTL formulas with a falsification procedure to mine the assumptions on the model inputs. The falsification procedure checks if there exists an input trace to the model that satisfies ϕ in but the corresponding output does not satisfy ϕ out . To summarize, we propose a new algorithm to mine environment assumptions (expressed in STL) automatically. As our algorithm systematically increases the syntactic complexity of the PSTL formulas, it uses the Occam’s Razor principle 13 to learn environment assumptions, i.e., it attempts to learn STL classifiers that are short, and hence simple and more interpretable. We also propose a new technique based on decision-tree learning for inferring parameter valuations that lead to a good classifier. Finally, we demonstrate the capability of our assumption mining algorithm on a few benchmark models. 1.4.3 Learning from spatio-temporal data We propose the first set of algorithms for logical unsupervised learning of spatio-temporal data. We in- troduce parametric spatio-temporal reach and escape logic (PSTREL), by treating threshold constants in signal predicates, time bounds in temporal operators, and distance bounds in spatial operators as parame- ters. Our method does automatic feature extraction from the spatio-temporal data by projecting it onto the parameter space of a PSTREL formula. We propose an agglomerative hierarchical clustering technique that hopefully guarantees that each cluster satisfies a distinct STREL formula. We also explore the space of implied edge relations between spatial nodes, proposing an algorithm to define the most suitable graph. Our method generates STREL formulas of bounded description complexity using a novel decision-tree approach which generalizes previous unsupervised learning techniques [96] for Signal Temporal Logic. We demon- strate the effectiveness of our approach on case studies from diverse domains such as urban transportation, epidemiology, green infrastructure, and air quality monitoring. 1.4.4 Learning from natural language We propose DIALOGUESTL, an explainable and interactive approach for learning correct and concise STL formulas from (often) ambiguous NL descriptions. We use a combination of semantic parsing, pre-trained transformer-based language models, and user-in-the-loop clarifications aided by a small number of user demonstrations to predict the best STL formula to encode NL task descriptions. An advantage of mapping NL to STL is that there has been considerable recent work on the use of reinforcement learning (RL) to 14 identify control policies for robots using STL. We show that we can use Deep Q-Learning techniques † to learn optimal policies from the learned STL specifications. We demonstrate that D IALOGUESTL is efficient, explainable, and has high accuracy in predicting the correct STL formula with a few number of demonstrations and a few interactions with an oracle user. 1.5 Layout The structure of this dissertation is as follows. Chapter 2 provides background on several categories of sequential data and a set of formal parametric logic used to learn from such data. The primary contributions of this dissertation are presented in the following four chapters. Chapter 3 presents our new technique for Enumerative Learning of STL formulas, which has been published in the main track of the IEEE/ACM International Conference on Hybrid Systems: Computation and Control (HSCC) [71]. Our Environment Assumption Mining method for CPS is presented in Chapter 4, which was published in the main track of the IEEE/ACM International Conference on Cyber-Physical Systems (ICCPS) [70]. Chapter 5 presents our technique for learning spatio-temporal logic formulae, and has been published in the main track of the International Symposium on Automated Technology for Verification and Analysis (ATV A) [69]. Chapter 6 presents our technique for Learning from natural language, and will be submitted to NASA Formal Methods Symposium (NFM). We analyze the related work in Chapter 7. Finally, we conclude this dissertation’s contributions in Chapter 8 and offer potential areas for further research. † We use Deep Q-Learning techniques due to scalability to environments with a large number of states. 15 Chapter 2 Background In this dissertation, we aim to learn formal parametric logic from sequential data. Sequential data refers to data where order between successive data-points is important, and time-series data or trace is an example of sequential data. Below, we formally define time-series data and formal parametric logic. Definition 1 (Time-Series, Traces, Signals). A trace x is a mapping from time domainT to value domain D, x :T →D where,T ⊆ R ≥ 0 ,D⊆ R n ,T ̸= / 0, and the variable n denotes the trace dimension. For instance, ifT ={t 1 ,t 2 ,...,t T } is the time domain andD ={x 1 ,x 2 ,...,x T } is the value domain, an example trace could be x={(t 1 ,x 1 ),(t 2 ,x 2 ),...,(t T ,x T )}, where t i is the i th time instance, x i is the value of trace at time instance t i and i∈[1,T]. Moreover, T is the number of time instances, t i ∈R ≥ 0 and x i ∈R n . In this work, we may use the terms time-series data, traces and signals interchangeably (see Fig. 2.2 for examples of sinusoidal traces). Definition 2 (Formal Parametric Logic). A formal parametric logic is specified using a specific grammar that provides an interpretation to the predicate symbols and temporal operator symbols. The syntax of a formal parametric logic specifies well formed formulas in the logic. Let S be a set of sequential data variables, i.e., each variable s j in S is a function fromT (time domain) toD (a value domain). LetF be a set of predicate symbols over the value domain, where each symbol f in 16 F has a positive arity. LetH be a set of temporal operators; a temporal operator h takes as input a time in time domain and a set of sequential data variables and returns a Boolean value. LetP V be a set of parameter variables that take values over the value domain. LetP T be a set of parameter variables that take values over the time domain. A parameterized predicate is a predicate symbol inF where at least one of the predicate arguments is a parameter inP V . A parameterized temporal operator is a temporal operator that takes at least one additional time-parameter as its argument. In Chapter 3, we want to learn a logical formula that separates two sets of labeled traces. The logical formalism that we choose for this purpose is called Signal Temporal Logic (STL). Signal Temporal Logic (STL). Temporal logics were introduced in the late 70s [80] to formally reason about the temporal behaviors of reactive systems, which were originally input-output systems with Boolean, discrete-time signals. Temporal logics such as Timed Propositional Temporal Logic [3], and Metric Tempo- ral Logic (MTL) [55] were introduced later to reason about dense real-time signals. More recently, Signal Temporal Logic [34] was proposed in the context of analog and mixed-signal circuits as a specification lan- guage for reasoning about properties of real-valued signals. The simplest properties or constraints can be expressed in the form of atomic predicates. An atomic predicate is formulated as f(x)∼ c, where x is a trace, f is a scalar-valued function over the trace x,∼∈ {≥ ,≤ ,=}, and c∈R. For instance, x≥ 2 is an atomic predicate, where f(x)= x,∼ is≥ , and c= 2. Temporal properties include temporal operators such as G (always), F (eventually) and U (until). For example, G(x> 2) means signal x is always greater than 2. Each temporal operator is indexed by an interval I :=(a,b)|(a,b]|[a,b)|[a,b], where a,b∈T . The syntax of STL is formally defined as follows: ϕ := true| f(x)∼ c|¬ϕ|ϕ 1 ∧ϕ 2 |ϕ 1 U I ϕ 2 (2.1) 17 where c∈R. G and F operators are special instances of U operator F I ϕ≜ trueU I ϕ, and G I ϕ≜¬F I ¬ϕ, and they are defined for formula simplification. The semantics of STL are defined over a discrete-time signal x defined over some time-domain T , and are formally defined as follows: (x,t)|= f(x)∼ c ⇐⇒ f(x(t))∼ c is true (x,t)|=¬ϕ ⇐⇒ (x,t)⊭ϕ (x,t)|=ϕ 1 ∧ϕ 2 ⇐⇒ (x,t)|=ϕ 1 ∧ (x,t)|=ϕ 2 (x,t)|=ϕ 1 U I ϕ 2 ⇐⇒ ∃t 1 ∈ t⊕ I :(x,t 1 )|=ϕ 2 ∧ ∀t 2 ∈[t,t 1 ) :(x,t 2 )|=ϕ 1 , x|=ϕ is the shorthand of (x,0)|=ϕ, and⊕ denotes the Minkowski sum (t⊕ [a,b]=[t+ a,t+ b]). Note that we only include the atomic predicate of the form f(x)≥ c, as any other atomic signal predicate can be expressed using predicates of this form, negations and conjunctions. Example. The signal x satisfies f (x)> 0 at time t (where t≥ 0) if f(x(t))> 0. It satisfies ϕ= G [0,2) (x> 0) if for all time 0≤ t< 2,x(t)> 0 and satisfies ϕ = F [0,1) (x> 0) if exists t, such that 0≤ t< 1 and x(t)> 0. The signal x satisfies the formula ϕ =(x> 0)U [0,2] (x< 1) if there exists some time t 1 where 0≤ t 1 ≤ 2 and x(t 1 )< 1, and for all time t 2 ∈[0,t 1 ),x(t 2 )> 0. We can create higher-level STL formulas by utilizing two or more of the operators. For instance, a signal x satisfies ϕ = F [0,1] G [0,2] (x(t)> 1) iff there exists t 1 such that 0≤ t 1 ≤ 1, and for all time t 1 ≤ t≤ t 1 + 2,x(t)> 1. Another example of sequential data is a robot trajectory. STL can also be used for describing properties of a robot trajectory. Two green and red robot trajectories are illustrated in Fig. 2.1. Next, we define what a demonstration means and then, give an example of using STL for describing a robot demonstration. 18 Figure 2.1: The robot tries to reach the lamp placed at location (0,0) in 15 seconds while avoiding wall (black) and water (blue) tiles. Both green (—) and red (—) demonstrations satisfy the formula F [0,15] (robotAt(0,0)≥ 0); in the next 15 seconds, the robot should eventually reach to the location(0,0). Definition 3 (Demonstration). Demonstration is a finite sequence of state-action pairs. Formally, d = {(s 0 ,a 0 ),(s 1 ,a 1 ),...,(s ℓ ,a ℓ )} defines a demonstration with length ℓ, where s i ∈ S and a i ∈ A. S is the set of all possible states and A is the set of all possible actions in an environment. Example. Consider the grid world environment illustrated in Fig. 2.1. While both demonstrations reach the lamp, only the green demonstration satisfies the formulas G(¬(robotAtWall≥ 0)) and G(¬(robotAtWater≥ 0)). The formula G(¬(robotAtWall≥ 0)) means that the robot should never climb walls. The formula G(¬(robotAtWater≥ 0)) means that the robot should not step in water. The red demonstration intersects with both walls and water tiles. In addition to the Boolean semantics, quantitative semantics of STL quantify the robustness degree of satisfaction by a particular trace [33, 32]. Intuitively, an STL with a large positive robustness is far from violation, and with large negative robustness is far from satisfaction. If the robustness is a small positive number, a very small perturbation might make it negative and lead to the violation of the property. Formally, the robustness value approximates the signed distance of a trace from the set of traces that marginally satisfy or violate the given formula. Fig. 2.3 illustrates the notion of robustness for the STL formula ϕ = G [50,100) (x≤ 3). The LHS and RHS of Fig. 2.3 show traces satisfying and violating ϕ, respectively. The distance to satisfaction or violation is called the robustness value. Technically, in [32] the authors define a robustness signal ρ that maps an STL formula ϕ and a trace x to a number at each time t that denotes an 19 approximation of the signed distance of the suffix of x starting at time t w.r.t. traces satisfying or violating ϕ. The convention is to call the value at time 0 of the robustness signal of the top-level STL formula as the robustness value. This definition has the property that if a trace has positive robustness value then it satisfies the top-level formula, and violates the formula if it has a negative robustness value. The quantitative semantics of STL are formally defined as follows: ρ( f(x)≥ c,x,t) = f(x(t))− c ρ(¬ϕ,x,t) = − ρ(ϕ,x,t) ρ(ϕ 1 ∧ϕ 2 ,x,t) = min(ρ(ϕ 1 ,x,t),ρ(ϕ 2 ,x,t)) ρ(G I ϕ,x,t) = inf t ′ ∈t⊕ I (ρ(ϕ,x,t ′ )) ρ(F I ϕ,x,t) = sup t ′ ∈t⊕ I (ρ(ϕ,x,t ′ )) ρ(ϕ 1 U I ϕ 2 ,x,t) = sup t ′ ∈t⊕ I min ρ(ϕ 2 ,x,t ′ ), inf t ′′ ∈[t,t ′ ) ρ(ϕ 1 ,x,t ′′ ) . Example. Consider the signal x, and the STL formulas ϕ 1 = G [0,10) (x≤ 3) and ϕ 2 = F [0,10] (x<− 3). Consider a timed trace of x, where x(t)= sin(2πt) (for some discrete set of time instants t∈[0,50]). This trace satisfies ϕ 1 because sin(2πt) never exceeds 3 and violates ϕ 2 since sin(2πt)≥− 3 for all t. The robustness value of ϕ 1 with respect to x(t) is the minimum of 3− x(t) over [0;10), or 2. The robustness value ofϕ 2 with respect to x(t) is the maximum of− 3− x(t) over[0,10] or− 2 (see Fig. 2.2). The main difficulty in systematic enumeration for STL is that both time-bounds on temporal operators and predicates over real-valued signals entail real numbers. There are therefore an uncountably infinite number of STL formulas of that length. One solution is to apply the enumerative approach to PSTL, which uses parameters instead of numbers. PSTL formula is an extension of STL formula where real numbers are replaced by parameters. Property “Always between time 0 and some unspecified time c, the signal x is less 20 Figure 2.2: Illustration of the notion of robust satisfaction value or robustness. Robustness value is an approximation of signed distance between a trace and the set of traces satisfying or violating an STL formula. Consider the formulasϕ 1 = G [0,10] (x≤ 3) andϕ 2 = F [0,10] (x<− 3). The robustness of the blue trace with respect toϕ 1 is the signed distance between the blue trace and the red trace (violating trace). Similarly, the robustness of the blue trace with respect to ϕ 2 is the signed distance between the blue trace and the green trace (satisfying trace). Figure 2.3: Illustration of the notion of robust satisfaction value or robustness for the STL formula ϕ = G [50,100) (x≤ 3). than some value c” could be expressed using Parametric STL (PSTL) G [0,τ] (x< c), where the unspecified valuesτ and c are referred to as parameters. Next, we give the formal definition of PSTL. Parametric Signal Temporal Logic (PSTL). A PSTL [7] formula is an extension of STL formula where constants are replaced by parameters. The associated STL formula is obtained by assigning a value to each parameter variable using a valuation function. LetP be the set of parameters, V represent the domain set of the parameter variablesP V , andT represent the time domain of the parameter variablesP T . Then, P is the set containing the two disjoint setsP V andP T , where at least one of the sets is non-empty. A valuation function ν maps a parameter to a value in its domain. A vector of parameter variables p is 21 obtained by mapping parameter vectors p into tuples of respective values over V orT . Hence, we obtain the parameter spaceD P ⊆ V |P V | × T |P T | . An STL formula is obtained by pairing a PSTL formula with a valuation function that assigns a value to each parameter variable. For example, consider the PSTL formula ϕ(c,τ)= F [0,τ] (x> c) with parameters c and τ. The STL formula F [0,5] (x>− 2.3) which is an instance of ϕ is obtained with the valuation ν = {τ7→ 5,c7→− 2.3}. In Chapter 5, we try to cluster spatio-temporal data using STREL. STREL extends STL with a set of spatial operators. Thus, we introduce the notation and terminology for spatial models, spatio-temporal traces, and Spatio-Temporal Reach and Escape Logic (STREL). Definition 4 (Spatial Model). A spatial modelS is defined as a pair ⟨L,W⟩, where L is a set of nodes or locations and W ⊆ L× R× L is a nonempty relation associating each distinct pairℓ 1 ,ℓ 2 ∈ L with a label w∈R (also denotedℓ 1 w − → ℓ 2 ). There are many different choices possible for the proximity relation W; for example, W could be defined in a way that the edge-weights indicate spatial proximity or communication network connectivity. Given a set of locations, unless there is a user-specified W, we note that there are several graphs (and associated edge-weights) that we can use to express spatial models. We explore these possibilities in Section. 5.2.1. For the rest of this section, we assume that W is defined using the notion of (δ,d)-connectivity graph as defined in Definition 5. Definition 5 ((δ,d)-connectivity spatial model). Given a compact metric space M with the distance metric d : M× M→R ≥ 0 , a set of locations L that is a finite subset of M, and a fixed δ ∈R,δ > 0, a (δ,d)- connectivity spatial model is defined as ⟨L,W⟩, where(ℓ 1 ,w,ℓ 2 )∈ W iff d(ℓ 1 ,ℓ 2 )= w, and w<δ. Example. In the BSS, each bike station is a node/location in the spatial model, where locations are assumed to lie on the metric space defined by the 3D spherical manifold of the earth’s surface; each location is 22 defined by its latitude and longitude, and the distance metric is the Haversine distance * . Fig. 5.2b shows the δ-connectivity graph of the Edinburgh BSS, withδ = 1km. Definition 6 (Route). For a spatial modelS =⟨L,W⟩, a routeτ is an infinite sequence ℓ 0 ℓ 1 ··· ℓ k ··· such that for any i≥ 0,ℓ i w i − → ℓ i+1 . For a route τ, τ[i] denotes the i th node ℓ i in τ, τ[i..] indicates the suffix route ℓ i ℓ i+1 ..., and τ(ℓ) denotes mini|τ[i]=ℓ, i.e. the first occurrence of ℓ inτ. Note thatτ(ℓ)=∞ if∀iτ[i]̸=ℓ. We useT(S) to denote the set of routes inS , andT(S,ℓ) to denote the set of routes inS starting fromℓ∈ L. We can use routes to define the route distance between two locations in the spatial model as follows. Definition 7 (Route Distance and Spatial Model Induced Distance). Given a route τ, the route distance alongτ up to a locationℓ denoted d τ S (ℓ) is defined as ∑ τ(ℓ) i=0 w i . The spatial model induced distance between locationsℓ 1 andℓ 2 (denoted d S (ℓ 1 ,ℓ 2 )) is defined as: d S (ℓ 1 ,ℓ 2 )= min τ∈T(S,ℓ 1 ) d τ S (ℓ 2 ). Note that by the above definition, d τ S (ℓ)= 0 ifτ[0]=ℓ and∞ ifℓ is not a part of the route (i.e. τ(ℓ)=∞), and d S (ℓ 1 ,ℓ 2 )=∞ if there is no route fromℓ 1 toℓ 2 . Spatio-temporal Time-Series. A spatio-temporal trace associates each location in a spatial model with a time-series trace. Formally, a time-series trace x is a mapping from a time domainT to some bounded and non-empty set known as the value domain V . Given a spatial modelS =⟨L,W⟩, a spatio-temporal traceσ is a function from L× T to V . We denote the time-series trace at locationℓ byσ(ℓ). Example. Consider a spatio-temporal traceσ of the BSS defined such that for each location ℓ and at any given time t,σ(ℓ,t) is(B(t),S(t)), where B(t) and S(t) are respectively the number of bikes and empty slots at time t. Spatio-Temporal Reach and Escape Logic (STREL). STREL is a logic that was introduced in [11] as a formalism for monitoring spatially distributed cyber-physical systems. STREL extends Signal Temporal * Haversine Formula gives minimum distance between any two points on sphere by using their latitudes and longitudes. 23 Logic [64] with two spatial operators, reach and escape, from which is possible to derive other three spatial modalities: everywhere, somewhere and surround. Syntax. The syntax of STREL is given by: ϕ ::= true|µ|¬ϕ|ϕ 1 ∧ϕ 2 |ϕ 1 U I ϕ 2 |ϕ 1 R D ϕ 2 |E D ϕ. Here,µ is an atomic predicate (AP) over the value domain V . Negation¬ and conjunction∧ are the standard Boolean connectives, while U I is the temporal operator until with I being a non-singular interval over the time-domainT . The operatorsR D andE D are spatial operators where D denotes an interval over the distances induced by the underlying spatial model, i.e., an interval overR ≥ 0 . Semantics. A STREL formula is evaluated piecewise over each location and each time appearing in a given spatio-temporal trace. We use the notation(σ,ℓ)|=ϕ if the formulaϕ holds true at locationℓ for the given spatio-temporal traceσ. The interpretation of atomic predicates, Boolean operations and temporal operators follows standard semantics for Signal Temporal Logic: E.g., for a given locationℓ and a given time t, the formula ϕ 1 U I ϕ 2 holds atℓ iff there is some time t ′ in t⊕ I where ϕ 2 holds, and for all times t ′′ in [t,t ′ ), ϕ 1 holds. Here the⊕ operator defines the interval obtained by adding t to both interval end-points. We use standard abbreviations F I ϕ = trueU I ϕ and G I ϕ =¬F I ¬ϕ, for the eventually and globally operators. The reachability (R D ) and escape (E D ) operators are spatial operators. The formula ϕ 1 R D ϕ 2 holds at a locationℓ if there is a route τ starting atℓ that reaches a locationℓ ′ that satisfies ϕ 2 , with a route distance d τ S (ℓ ′ ) that lies in the interval D, and for all preceding locations, includingℓ, ϕ 1 holds true. The escape formulaE D ϕ holds at a locationℓ if there exists a locationℓ ′ at a route distance d S (ℓ 1 ,ℓ 2 ) that lies in the interval D and a route starting atℓ and reachingℓ ′ consisting of locations that satisfyϕ. We define two other operators for notational convenience: The somewhere operator, denoted3 [0,d] ϕ, is defined as trueR [0,d] ϕ, 24 and the everywhere operator, denoted□ [0,d] ϕ is defined as ¬3 [0,d] ¬ϕ, where d is a real positive value; their meaning is described in the next example. Example. In the BSS, we use atomic predicates S> 0 and B> 10, and the formula G [0,3hours] 3 [0,1km] (B> 10) is true if always within the next 3 hours, at a locationℓ, there is some locationℓ ′ at most 1 km fromℓ, where the number of bikes available exceed 10. Similarly, the formula□ [0,1km] G [0,30min] (S> 0) is true at a locationℓ if for all locations within 1km, for the next 30 mins, there is at least one empty slot available. 25 Chapter 3 Enumerative Learning of STL formulas 3.1 Introduction A crucial question is: How do we automatically identify logical structure or relations within CPS data? Machine learning (ML) algorithms may not be specialized to learn logical structure underlying time-series data [96], and typically require users to hand-create features of interest in the underlying time-series signals. These methods then try to learn discriminators over feature spaces to cluster or classify data. These feature spaces can often be quite large, and ML algorithms may choose a subset of these features in an ad hoc fashion. This results in an artifact (e.g., a discriminator or a clustering mechanism) that is not human- interpretable. In fact, ML techniques focus only on solving the classification problem and suggest no other comprehension of the target system [16]. Signal Temporal Logic (STL) is a popular formalism to express properties of time-series data in several application contexts that include but are not limited to automotive systems [51, 48, 10], analog circuits [65], biology [89] and robotics [101]. STL is a logic over Boolean and temporal combinations of signal predicates which allows human-interpretable specification of continuous system requirements. For instance, in the automotive domain, STL can be used to formulate properties such as “the car successfully stops before hitting an obstruction” [54]. 26 Significant work has been done to learn STL specifications from data. The two primary methods for learning STL formulae from data are: template-free learning and template-based learning. Template-free methods learn the formula itself as well as the underlying STL formula’s structure. While these methods have proved successful in a wide range of applications [50, 54, 16, 74, 37], they typically explore only a fragment of STL, and may produce long and complicated STL classifiers. In template-based methods, the user provides a template parametric STL formula (PSTL formula) and the learning algorithm infers only the values of parameters from data [96, 97, 49, 47, 46]. Selecting the right PSTL formula might be difficult without a thorough understanding of the application area. Syntax-guided synthesis is a new paradigm in “learning from examples”, where the user provides a grammar for expressions, and the learning algorithm tries to learn a concise expression that explains a given set of examples. In [92], systematic enumeration has been used to generate candidate solutions, and for medium-sized benchmarks, the systematic enumeration algorithm, in spite of its simplicity, surprisingly outperforms several other learning approaches [2]. Inspired by the idea of learning expressions from grammars, in this work, we consider the problem of learning STL based formulas to classify a given labeled time-series dataset (with focus on the monotonic fragment of STL). A high-level view of our proposed solution is illustrated in Fig. 3.1. The key challenge in systematic enumeration for STL is that predicates over real-valued signals and time-bounds on temporal operators both involve real numbers. This means that even for a fixed length, there are an infinite number of STL formulas of that length. One solution is to apply the enumerative approach to PSTL, which uses parameters instead of numbers. The inference problem then tries to learn parameter values to separate labeled data into distinct classes. The parameter-valuation inference procedures are typically efficient, but over a large dataset, the cost for enumeration followed by parameter inference can add up. As a result, we explore an optimization which involves skipping formulas that are heuristically determined to be equivalent. 27 Figure 3.1: The high level flow of our enumerative learning technique. As a concrete application of enumerative search, we consider the problem of learning an STL-based classifier with a minimal misclassification rate for the given labeled dataset. Our tool generates the shortest possible STL formula. In the ML community, there is a preference to classifiers with smaller descriptive complexity to comply with the Occam’s Razor principle. Cf. [59] for results on shorter decision-trees being favorable. In the context of rule learning, shorter rules are considered more interpretable and are preferred by designers [45]. The main hypothesis is that shorter STL formulas are more human-interpretable. This is in contrast to ML methods where a classifier may be a linear combination of a number of features such as statistical moments of the signal, frequency domain coefficients, or other ad hoc user-defined features. We also conduct a user study to show that longer STL fomulas are more difficult to understand by humans. To summarize, our key contributions are as follows: • We extend the work in [96, 97, 49, 47, 46] by learning the structure or template of PSTL formulas automatically. The enumerative solver furthers the Occam’s razor principle in learning (simplest 28 explanations are preferred). Thus, it produces simpler STL formulas compared to existing template- free methods [50, 54, 16, 74]. • We introduce the notion of formula signature as a heuristic to prevent enumeration of equivalent formulas. • We bridge formal methods and machine learning algorithms by employing STL, which is a language for formal specification of continuous and discrete system behaviors [64]. We use Boolean satisfaction of STL formulas as a formal measure for measuring the misclassification rate. • We showcase our technique on real world data from several domains, including automotive and trans- portation domains. • We perform a user study to measure the extent that STL formulas are interpretable for humans. The remainder of this Chapter is structured as follows. We present an approach for learning parameters of a PSTL formula in Section. 3.2.1. We explain our enumerative techniques to learn the structure of PSTL formulas in addition to our signature-based optimization technique in Section. 3.2.2. We elaborate on our experimental results in Section. 3.3. Our user study on interpretability of STL is explained in Section. 3.4. Finally, we give a summary in Section. 3.5. 3.2 Methods 3.2.1 Learning Parameters of PSTL formulas In this section, we propose an algorithm to learn the parameters of a monotonic PSTL formula to learn an STL formula that serves as a good classifier. The algorithm is based on learning the validity domain boundary of the PSTL formula with respect to a given set of traces. We first define a monotonic PSTL formula and the validity domain of a PSTL formula with respect to a set of traces. 29 Definition 8 (Monotonic PSTL). A parameter p i is said to be monotonically increasing or have positive polarity in a PSTL formulaϕ if condition (3.1) holds for all x, and is said to be monotonically decreasing or have negative polarity if condition (3.2) holds for all x, and monotonic if it is either monotonically increasing or decreasing [7]. ν(p i )≤ ν ′ (p i )⇒ x|=ϕ(ν(p i ))⇒ x|=ϕ(ν ′ (p i )) (3.1) ν(p i )≥ ν ′ (p i )⇒ x|=ϕ(ν(p i ))⇒ x|=ϕ(ν ′ (p i )) (3.2) Example. In the formula F [0,τ] (x> c), polarity of τ is positive (the formula is monotonically increasing with respect to τ), and polarity of c is negative (the formula is monotonically decreasing with respect to c). If a trace satisfies F [0,1] (x> 0) (i.e., exists at least a time instance t ′ ∈[0,1] where x(t ′ )> 0) , it will definitely satisfy F [0,2] (x> 0) (t ′ ∈[0,2]). Definition 9 (Validity domain). The validity domain for a given set of parametersP is the set of all val- uations s.t. for the given set of traces X, each trace satisfies the STL formula obtained by instantiating the given PSTL formula with a parameter valuation in the set. The boundary of the validity domain is the set of valuations where the robustness value of the given STL formula with respect to at least one trace in X is 0. Example. Consider a set of traces x that are all bounded above by 1, then for the formula G [0,10] (x< c), the validity domain is the set(1,∞), and the validity domain boundary is the single point c= 1. Essentially, the validity domain boundary serves as a classifier to separate the set of traces satisfying the STL formula from the ones violating the formula. Our method based on validity domain boundary to learn parameters of a PSTL formula is formalized in Algo. 1. Our algorithm tries to learn parameters of a PSTL formulaϕ(p) that result in an STL formulaϕ(ν(p)) that is a good classifier for X 0 and X 1 . 30 Theorem. For a given PSTL formula ϕ(p), labeled traces X 0 , X 1 , a threshold on MCR * ε and maximum number of recursionsmaxRecurs, Algo. 1 will either: (1) terminate with an STL formulaϕ, s.t. MCR is≤ ε, or (2) terminate with / 0 indicating that the current PSTL formula is not a suitable classifier. The proof follows from the structure of the algorithm. First, the algorithm obtains a point on the validity domain boundary of the enumerated formulaϕ(p) that yields an accurate STL classifier. Variable p denotes the parameters of the PSTL formula ϕ. To explain this procedure further, we first recall the algorithm to approximate the satisfaction boundary of a given PSTL formulaϕ(p) from [97]. Algorithm 1: Learning parameters of PSTL formulas using Pareto Fronts Input: ϕ(p),X 0 ,X 1 ,ε,maxRecurs Output: ϕ,MCR 1 Function LearnPST LParamsPF(ϕ(p),X 0 ,X 1 ,ε,maxRecurs): 2 whilenumRecurs<maxRecurs do // Selects a candidate point on satisfaction boundary from [63] 3 ν(p)← pointOnBoundary(ϕ(p),X 1 ); // Replaces candidate point on satisfaction boundary in ϕ(p) 4 ϕ← ϕ(ν(p)); // Compute Boolean satisfaction for traces with label 0 and 1 5 f alsePos←| x| x∈ X 0 ∧ x⊨ ϕ| ; 6 trueNeg←| x| x∈ X 1 ∧ x⊭ϕ| ; 7 MCR← ( f alsePos+trueNeg)/(|X 0 |+|X 1 |); 8 if MCR≤ ε then 9 returnϕ,MCR; 10 else 11 return / 0; The validity domain boundary of a formulaϕ(p) with respect to a set of traces X is essentially the set of parameter valuationsν(p) where the robustness value ofϕ(ν(p)) for at least one trace is 0, i.e. the formula is marginally satisfied by at least one trace in X. In general, computing the validity domain boundary for arbitrary PSTL formulas is difficult; however, for formulas in the monotonic fragment of PSTL [97], there is an efficient procedure to approximate the * MCR is defined in Eq. (3.3). 31 Figure 3.2: Illustration of the method to recursively approximate satisfaction boundary of a PSTL formula to an arbitrary precision. Green arrows indicate the monotonicity direction (both decreasing). validity domain boundary [63]. The procedure in [63] recursively approximates the validity domain bound- ary to an arbitrary precision by performing binary search on diagonals of sub-regions within the parameter space. The idea is that in an n-dimensional parameter space of a monotonic PSTL formula, a parameter valuation on the diagonal may correspond to a formula with zero robustness value. This point subdivides the parameter space into 2 n distinct regions: one where all valuations correspond to formulas that are valid over all traces, one where all valuations correspond to formulas that are invalid, and 2 n − 2 regions where satisfaction is unknown. The algorithm then proceeds to search along the diagonals of these remaining regions. This approximation results in a series of overlapping axis-aligned hyper-rectangles guaranteed to include the satisfaction boundary [97]. More details of this procedure can be found in [63]. We visualize an instance of the method in Fig. 3.2. Our contribution in this work is to combine the procedure from [63] with a classification algorithm to learn STL formula classifiers as follows. In each recursive iteration of the multi-dimensional binary search, the algorithm identifies a point on the satisfaction boundary of ϕ(p) with respect to the 1-labeled traces, i.e. X 1 . Let this point be denoted as ν(p), and the resulting STL formula ϕ(ν(p)) be denoted as ϕ in short- hand. We then check the Boolean satisfaction ofϕ on the traces in X 0 . The mis-classification rate ( MCR) is computed as follows: MCR= x| x∈ X 0 ∧ x⊨ ϕ + x| x∈ X 1 ∧ x⊭ϕ |X 0 |+|X 1 | . (3.3) 32 If MCR is less than the specified threshold ( ε = 0.1 in our implementation), the algorithm terminates, andϕ is returned as the binary STL classifier for the traces X 0 and X 1 . Otherwise, the algorithm goes to the next recursive computation of a boundary point (in another region of the parameter space). If the number of recursions (denoted by the variable numRecurs in Algo. 1) exceeds a user-specified maxRecurs, the algorithm returns control to Algo. 2, which proceeds to enumerate the next PSTL formula for consideration. Theorem. The worst case time-complexity of computing boundary points in Algo. 1 is O((2 numParams − 2) numRecurs )· O(logℓ)· O(|ϕ|·| X 1 |· T), wherenumParams is the number of parameters of PSTL formulaϕ, numRecurs is the number of recursive iterations of Algo. 1, l is the size of the diagonal of the parameter space,|ϕ| is length of the formulaϕ,|X 1 | is the number of traces X 1 , and the variable T is the number of instances in each of the X 1 traces. Proof. To compute boundary points of the validity domain, we investigate(2 numParams − 2) numRecurs regions in total. To find the boundary point in each region, we perform the binary search on its diagonal. If we denote the diagonal size withℓ, the complexity of binary search is O(logℓ). At each step of binary search, we compute the boolean satisfaction ofϕ with respect to all the traces X 1 for the candidate point obtained by binary search. Assuming piecewise constant interpolation for a trace, the boolean satisfaction of an STL formula with respect to the trace can be computed in time linear with respect to the length of the formula|ϕ| and size of the trace (number of points in the trace or T ) [31]. Thus, the overall complexity is O((2 numParams − 2) numRecurs )· O(logℓ)· O(|ϕ|·| X 1 |· T). 3.2.2 Learning Structure of PSTL formulas In this section, we introduce systematic PSTL enumeration for learning STL formula classifiers from time- series data, which the enumeration procedure is formalized in Algo. 2. From a grammar-based perspective 33 a PSTL formula can be viewed as atomic formulas combined with unary or binary operators. For instance, PSTL formula G [0,τ 1 ] (x(t)> c 1 )∧ F [0,τ 2 ] (x(t)< c 2 ) consists of binary operator ∧, unary operators G and F, and atomic predicates x(t)> c 1 and x(t)< c 2 . ψ := atom| unaryOp(ψ)| binaryOp(ψ,ψ) unaryOp :=¬| F| G binaryOp :=∨|∧| U I |⇒ (3.4) Formally, Algo. 2 takes three ordered sets: atoms, unaryOps and binaryOps as grammars and then, systematically applies production rules to enlist valid STL formulas of up to lengthℓ max in the grammar. Algo. 2 is algorithm with several nested iterations. The outermost loop iterates over the length of the formula, and in the first iteration of the loop, we basically enumerate formulas of length 1, or parameterized signal predicates using the EnumAtoms function. At end of theℓ th iteration of the algorithm, all formulas up to lengthℓ are stored in a database DB ψ that is an array of lists. Each array index corresponds to the formula length, and each of the lists stored at location ℓ is the list of all formulas of length ℓ. In each iteration corresponding to lengths greater than 1, the algorithm calls the function ApplyUnaryOps, and in all iterations for lengths greater than 2, the function also calls ApplyBinaryOps. When enumerating formulas of lengthℓ, ApplyUnaryOps function iteratively applies each unary operator from the ordered list unaryOps to all formulas of lengthℓ− 1 to get a new formula. The ApplyBinaryOps iteratively applies each binary operator from the ordered list binaryOps to a pair of formulas of lengths a and b, where a,b∈[1,ℓ− 2] and a+ b=ℓ− 1. We use atomIt, unOpIt, binaryOpIt, argIt, lhsIt, rhsIt as iterators (indices) on the lists atoms, unaryOps, binaryOps, DB ψ (ℓ− 1), and lhsArgs and rhsArgs, respectively. For each PSTL formulaψ generated by Algo. 2, we apply the procedure LearnPST LParamsPF (Algo. 1) to learn its parameters which results in an STL formula ϕ that serves as a good classifier. If ϕ is a good 34 classifier (small misclassification rate), all loops terminate and ϕ is returned. Otherwise, the procedure con- tinues to generate new PSTL formulas. Next, in order to avoid enumerating LearnPST LParamsPF (Algo. 1) for equivalent PSTL formulas, we use the idea of formula signatures to heuristically detect equivalent PSTL formulas. Theorem. The time-complexity of Algo. 2 is equal to O(a+ u· a+ ℓ max ∑ ℓ=3 (u ℓ− 1 · a+ ℓ− 2 ∑ i=1 b· n i · n ℓ− i− 1 )), where a is the number of atoms, u is the number of unary operators, b is the number of binary operators, and n i denotes the number of enumerated formulas of length= i. The space-complexity of Algo. 2 is equal to O(a+ 2· u· a+ ℓ max ∑ ℓ=3 ℓ· (u ℓ− 1 · a+ ℓ− 2 ∑ i=1 b· n i · n ℓ− i− 1 )). Proof. The number of formulas of length= 1 is equal to n 1 = a (number of atoms), and the number of formulas of length= 2 is equal to n 2 = u· a as we apply unary operators on atoms. Generally, to obtain formulas of lengthℓ forℓ≥ 3 , we apply unary operator on formulas of lengthℓ− 1 or binary operators on formulas of lengthℓ 1 andℓ 2 such thatℓ 1 +ℓ 2 =ℓ− 1. Thus, the number of formulas of length=ℓ denoted as n ℓ forℓ≥ 3 is computed as follows: n ℓ = u ℓ− 1 · a+ ℓ− 2 ∑ i=1 (b· n i · n ℓ− i− 1 ). Hence, the total number of formulas of up to lengthℓ max is equal to: n 1 +...+ n ℓ max = a+ u· a+ ℓ max ∑ ℓ=3 (u ℓ− 1 · a+ ℓ− 2 ∑ i=1 b· n i · n ℓ− i− 1 ). 35 Algorithm 2: Formula enumeration algorithm Input: atoms, unaryOps, binaryOps,ℓ max ,DB ψ Output: ψ 1 Init:ℓ← 1; 2 whileℓ≤ ℓ max do // Enumerate formulas of length = 1 3 ifℓ= 1 then EnumAtoms() ; // Apply unary operators on formulas of length = 1 4 else ifℓ= 2 then ApplyUnaryOps() ; // Apply unary and binary operators on previous formulas 5 else ApplyUnaryOps(); ApplyBinaryOps() 6 ; 7 ℓ← ℓ+ 1; 8 Function EnumAtoms(): 9 atomIt← 1; 10 while atomIt≤| atoms| do 11 ψ← get(atoms,atomIt); // Add to the data base of formulas 12 add(DB ψ ,ℓ,ψ); 13 atomIt← atomIt+ 1; 14 Function ApplyUnaryOps(): 15 argIt← 1, unOpIt← 1 ; 16 while unOpIt≤| unaryOps| do 17 op← get(unaryOps,unOpIt); 18 while argIt<|DB ψ (ℓ− 1)| do 19 unaryArg← get(DB ψ (ℓ− 1),argIt); 20 ψ← op(unaryArg) ; // Add to the data base of formulas 21 Add(DB ψ ,ℓ,ψ); 22 argIt← argIt+ 1; 23 unOpIt← unOpIt+ 1 ; 24 Function ApplyBinOps(): 25 lhsIt, rhsIt, binaryOpIt← 1 ; 26 while binaryOpIt≤| binaryOps| do 27 op← get(binaryOps,binaryOpIt); 28 for i← 1 toℓ− 2 do 29 while lhsIt<|DB ψ (i)| do 30 while rhsIt<|DB ψ (ℓ− i− 1)| do 31 lhs← get(DB ψ (i),lhsIt) ; 32 rhs← get(DB ψ (ℓ− i− 1),rhsIt); 33 ψ← op(lhs,rhs) ; // Add to the data base of formulas 34 Add(DB ψ ,ℓ,ψ); 35 rhsIt← rhsIt+ 1 ; 36 lhsIt← lhsIt+ 1; 37 binaryOpIt← binaryOpIt+ 1 ; 36 As the time-complexity of formula enumeration algorithm is linear with respect to the total number of enumerated formulas, the claim is proved. If we assume we need ℓ memory units to save a formula of length=ℓ, then the total required memory to save all formulas is 1· n 1 +2· n 2 +...+ℓ max · n ℓ max . If we replace each n l with its corresponding expression, the claim about memory complexity of the formula enumeration algorithm is proved. 3.2.2.1 Signature-based Optimization A naive procedure would be to do systematic enumeration followed by parameter inference for every enu- merated formula. While parameter inference procedures are typically efficient, but over a large dataset, the cost for enumeration followed by parameter inference can add up. One optimization that can speed up this process is to avoid doing parameter inference for semantically equivalent but syntactically different PSTL formulas. However, checking equivalence of PSTL formulas is in general undecidable [49]. Even if we restrict ourselves to a fragment of PSTL, equivalence checking is a computationally demanding task. Thus, we use signatures to avoid enumerating logically equivalent formulas. A signature is an approximation of a PSTL formula. Inspired by polynomial identity testing [83], we use the notion of signature to check the equivalence of two PSTL formulas. Let X n ⊆ X be a randomly chosen subset of X (the traces) of cardinality n. LetD P m ={ν 1 ,...,ν m } be a finite subset of the parameter space D P . The signature S of a formula ψ mapsψ to a matrix of real numbers, defined as below: S ψ (i, j)=ρ(ψ(ν j (p)),X n (i),0) The(i, j) th element of the matrix represents the robustness of the i th trace, X n (i) with respect to the j th STL formula ψ(ν j (p)). For checking the satisfaction of an STL specification by a trace we use Breach [30], a toolbox for verification and parameter synthesis of hybrid systems. This procedure is implemented in 37 computeSignature function in Algo. 3. computeSignature function in Alo. 3 is used to detect whether an enumerated PSTL formulaψ is new or its equivalent has been enumerated. Consider two PSTL formulas: F [0,τ 1 ] (G [0,τ 2 ] (x(t)> c)) and¬(G [0,τ 1 ] (F [0,τ 2 ] (x(t)≤ c))). We illustrate the working of the computeSignature function using 4 randomly chosen subset of traces (i.e., n=4) from case studies and 5 random parameter samples (i.e., m=5) from time domainT =[0,3] and value domain V =[0.8,1.2]. For both PSTL formulas the resulting signature is a 4× 5 matix with exactly same elements (hence, the two PSTL formulas are equivalent): S ψ = 0.8786 0.9294 − 0.4984 0.7571 0.9404 0.3317 0.2296 − 0.1853 0.9455 0.2218 0.4033 0.2785 − 0.5105 0.8429 0.2890 0.1742 0.1816 − 0.6257 0.1873 0.1738 . Algorithm 3: Signature-based optimization Input: ψ,X 0 ,X 1 ,DB s Output: true/ f alse 1 Function isNew(ψ): 2 S ψ ← computeSignature(ψ); // Compare signature with formula signatures in the database DB s which is a hash map 3 if S ψ ∈ DB s then return f alse ; 4 else // Add signature to database 5 Add(DB s ,ψ,S ψ ); 6 return true; 7 Function computeSignature(ψ): // Randomly choose n traces 8 X n ← selectRandom(X 0 ,X 1 ,n); // Randomly sample m parameter values from parameter space 9 D P m ← sampleRandom(D P ,m); 10 for j← 1 to m do 11 ψ m ← setParams(ψ,D P m ( j)); 12 for i← 1,n do 13 S ψ (i, j)← ρ(ψ m ,X n (i),0); 38 Remark. The computeSignature function in Algo. 3 only compares signatures of formulas with the same parameters. For instance, formulas G [0,τ] (x(t)> c 1 ) and¬(F [0,τ] (x(t)≤ c 2 )) are semantically equivalent; however, computeSignature function does not detect them as equivalent, and we are going to address this limitation in future work. Remark. Checking the equivalence of two PSTL formulasψ 1 andψ 2 can be reduced to checking the satis- fiability of the formula ψ =ψ 1 ̸≡ ψ 2 . Ifψ is unsatisfiable for all traces x, then,ψ 1 andψ 2 are equivalent. As the satisfiability of STL formulas is undecidable in general [49], checking if two arbitrary STL formulas are equivalent is also undecidable. Even if we restrict our attention to MITL formulas (where satisfiability is decidable), equivalence checking is computationally expensive (EXPSPACE complete). There is a recent tool for checking the satisfiability of MITL formulas [14]. To check the satisfiability of formulas of length less than 20, the tool takes about 10 seconds. The problem at hand is more complicated than the above problems as we are checking the equivalence of PSTL formulas that requires an additional quantifier on the parameter space. We have used two common performance parameters to evaluate signature-based optimization: preci- sion and recall in addition to computation time. Precision is an important figure of merit to examine the performance of the technique in not detecting two different PSTL formulas as equivalent. Recall is the performance of the method in detection of all equivalent enumerated PSTL formulas. 39 Precision= |{ψ|ψ≡ ψ ′ ∧ψ≡ sign ψ ′ }| |{ψ|ψ≡ ψ ′ ∧ψ≡ sign ψ ′ }|+|{ψ|ψ̸≡ ψ ′ ∧ψ≡ sign ψ ′ }| , Recall= |{ψ|ψ≡ ψ ′ ∧ψ≡ sign ψ ′ }| |{ψ|ψ≡ ψ ′ ∧ψ≡ sign ψ ′ }|+|{ψ|ψ≡ ψ ′ ∧ψ̸≡ sign ψ ′ }| , F 1 score= 2. Precision. Recall Precision+ Recall , whereψ≡ sign ψ ′ means Signature(ψ)= Signature(ψ ′ ), and F 1 score is the harmonic mean of the preci- sion and recall. To evaluate the performance of our technique using precision and recall, we enumerated 500 formulas using our enumerative solver. For computing signatures, traces were chosen randomly from case studies. We added white Gaussian noise with SNR= 2 to traces to make them expressive and discriminative enough for detecting equivalent PSTL formulas. The result shows that recall is always equal to 1. This means signature can detect all equivalent PSTL formulas. However, the precision changes with respect to m and n. The change of F 1 score with respect to m for fix n= 10 is shown in Fig. 3.3. As the figure shows, F 1 score almost increases by increasing m. For m= 5 and m= 9, F 1 score decreases which is due to the nature of added white Gaussian noise. Similarly, change of F 1 score with respect to n for fixed m= 10 is illustrated is Fig. 3.4. The overall trend of the plot is increasing except for n= 9, where F 1 score decreases and, the reason for that is the added white Gaussian noise to the traces. Using the idea of signatures reduced the computation time in the targeted case studies. The summary of the results are shown in Table 3.1. As Table 3.1 shows, the computation time difference before and after optimization is noticeable, yet, it is at best 30%. The reason is that the enumerative solver produces simple PSTL classifiers in the targeted case studies, and until reaching those PSTL classifiers, only a few formulas with equivalent signatures are enumerated. For producing complicated classifiers, the difference would be more noticeable. The run time of Signature-based optimization increases with the number of m and n. In future work, we plan to exploit (obvious) syntactic structural equivalences to further prune the space of 40 Figure 3.3: Change of F 1 score with respect to the number of random sample parameters (m) for fixed number of random traces (n=10). Figure 3.4: Change of F 1 score with respect to the number of random traces (n) for fixed number of sample parameters (m=10). equivalent formulas. In general, this will require a canonical representation of an STL formula, which can be an expensive step. Theorem. The time-complexity of computing signature of a PSTL formulaψ, i.e. S ψ and comparing it with the signatures saved in the data base DB s is O(|ψ|· n sigtraces · n samples · T)+O(|DB s |· n sigtraces · n samples ), where |ψ| is length of the formulaψ, n sigtraces is the number of S ψ ’s rows, n samples is the number of S ψ ’s columns, T is the number of time instances in each of the traces, and|DB s | is the number of saved signatures in DB s . The space-complexity of saving signatures off formulas in the data base is O(f· n sigtraces · n samples ). 41 Case Before After optimization (s) Optimization (s) Maritime Surveillance: 1st classifier 3903.02 2699.80 2nd classifier 53.78 48.73 3rd classifier 28.19 23.17 Linear system 44.25 39.05 Cruise control of train 35.54 30.45 Table 3.1: Optimization using signature Proof. Each signature is an n sigtraces × n samples matrix of robustness values. Since time-complexity of com- puting robustness degree for a trace with T instances with respect to ψ is O(|ψ|· T), the overall complex- ity of computing signature matrix is O(|ψ|· n sigtraces · n samples · T). The time-complexity of comparing an n sigtraces × n samples matrix with all n sigtraces × n samples matrices saved is DB s is O(|DB s |· n sigtraces · n samples ). Thus, the overall time-complexity is O(|ψ|· n sigtraces · n samples · T)+ O(|DB s |· n sigtraces · n samples ). Savingf matrices of size n sigtraces × n samples requiresf× n sigtraces × n samples units of memory. Hence, the overall space complexity is O(f· n sigtraces · n samples ). 3.3 Experimental Results To evaluate our supervised learning framework, we apply our method on various case studies from the automotive and transportation domains. The first case study is an anomalous trajectory detection problem in a maritime environment. The second and third case studies are to detect the behavior of two CPS models under attack. We chose all the three case studies from the previous works [16, 50] to make a fair comparison. We run the experiments on an Intel Core-i7 Macbook Pro with 2.7 GHz processors and 16 GB RAM. In comparison with the previous template-free techniques [16, 50], our technique learns simpler STL formulas with slightly (≤ 3%) accuracy degradation. While there are platform differences between our experiments and [16, 50], our results give comparable or favorable runtimes. 42 3.3.1 Maritime Surveillance We first consider the Maritime surveillance problem presented in [16] to compare our inference technique with their Decision Tree tool (DTL4STL) [16]. The synthetic data set employed in [16] is 2-dimensional and consists of one normal and two types of anomalous trajectories. In order to make a fair comparison, we used the exact data set from [16], as illustrated in Fig. 3.5. We used 100 traces for training and, 100 traces were reserved for testing. Our enumerative solver could learn three STL formulas to classify three kinds of trajectories. The results of our approach are shown in table 3.2, where ϕ green =(y(t)≥ 19.73)U [0,49.22] (x(t)≤ 24.86), ϕ blue = G [0,60] (x(t)≥ 36.32), ϕ red = F [0,10.90] (y(t)≤ 28.84). ϕ green is the STL formula that classifies green traces from the others. ϕ blue andϕ red categorize blue and red traces from the others, respectively. STL Classifier train test runtime before runtime after MCR MCR optimization (s) optimization (s) ϕ green 0.01 0.05 3903.02 2699.80 ϕ red 0.01 0.02 53.78 48.73 ϕ blue 0.00 0.02 28.19 23.17 Table 3.2: Results of our enumerative solver. 43 Figure 3.5: Naval surveillance data set [16] (Green traces: normal trajectories, red and blue traces: two kinds of anomalous trajectories and dash lines: the thresholds of the STL learned by enumerative solver (= 36.3260, 28.8430 )). The simplest STL formula learned by the DTL4STL tool [16] to classify green traces from the others is: ϕ =(ϕ 1 ∧(¬ϕ 2 ∨(ϕ 2 ∧¬ϕ 3 )))∨(¬ϕ 1 ∧(ϕ 4 ∧ϕ 5 )) ϕ 1 = G [199.70,297.27) (F [0.00,0.05) (x(t)≤ 23.60) ϕ 2 = G [4.47,16.64) (F [0.00,198.73) (y(t)≤ 24.20) ϕ 3 = G [34.40,52.89) (F [0.00,61.74) (y(t)≤ 19.62) ϕ 4 = G [30.96,37.88) (F [0.00,250.37) (x(t)≤ 36.60) ϕ 5 = G [62.76,253.23) (F [0.00,41.07) (y(t)≤ 29.90) with the average misclassification rate of 0 .007. This STL formula is long and complicated compared to the STL formulas ϕ green learned by our framework. Thus, [16] achieves better MCR, at the price of a highly uninterpretable formula. Our technique considers the space of all PSTL formulas in increasing order of their complexity which results in simple and interpretable STL classifiers. However, the DTL4STL tool [16] is only restricted to eventually and globally as PSTL templates which is the reason for generating 44 long and complicated STL classifiers. STL formula classifiers for higher-dimensional signals may be more complicated and hence less interpretable. Hence, in future work, we are going to design dimensionality reduction techniques based on robustness of the STL formulas to first identify the important dimensions. Intuitively, signals with closer robustness values are likely to be similar. An inherent advantage of systematic enumeration is that we encounter formulas in increasing order of complexity, thereby learning from as few features as possible. For instance, in this case study, three of the learned formulas only use 1 out of the 2 available features. 3.3.2 Linear System We now compare our technique with another supervised learning technique, and for a fair comparison, we use the same models as [50]. In [50], the authors use the following dynamic system model to benchmark the performance of their supervised learning procedure. ˙ x= 0.03· x+ w, attack= 0 or normal − 0.03· x+ w, attack= 1 or anomaly (3.5) Here, y(t)= x(t) is the observation, and w(t) is a white noise with 0.04 variance. The system was modeled in Simulink ® and interfaced with Breach [30] to simulate the data. We generated 100 time-series traces of the system for the two different system modes, resulting in a total of 200 time-series traces. Fig. 3.6 shows the simulation results. Green traces represent normal behaviors (absence of attack), and red traces represent the behavior of the system under attack. 50% of the data was split for training (50 normal and 50 anomaly), and the remaining data was reserved for testing. The enumerative solver was trained and tested on this data to extract an STL formula. The dashed blue line in the Fig. 3.6 shows the threshold (=0.9736) of the learned formula by our approach: G [0,3] (x(t)≥ 0.9736). Our procedure takes 39.05 seconds (with signature-based pruning) and 44.25 seconds (without signatures) with training MCR= 0 and testing MCR= 0.02. 45 Figure 3.6: Simulation results of the linear system (Green traces: normal operation of the system, red traces: anomalous behavior of the system and dash line: the threshold of the STL learned by enumerative solver (= 0.9736)). The STL formula learned using our framework is simpler compared to the one obtained in [50], which is F [0,3.0) (G [0.5,2.0) (y(t)> 0.9634)), with training and testing MCRs= 0 and 0.01, respectively. Hence, we learn simpler formulas with slightly accuracy degradation. 3.3.3 Cruise Control of Train We also benchmark an example for cruise control in a train from [50]. The system consists of a 3-car train, where each car has its own electronically-controlled pneumatic (ECP) braking mechanism. The velocity of the entire train is modeled as a single system. Hence, there are a total of 4 models (1 for velocity + 3 for braking in each car). The train system is constructed as a hybrid system/automata model, having continuous and discrete transitions for the velocity system. Similarly, the ECP braking system of each car is modeled as a hybrid system. The definitions of the parameters used in the above train cruise control models are as follows: • t represents a time instance. • v(t) is the velocity at the time instant t. • c 1 and c 2 are the clock variables to maintain a counter. 46 • i is the number of brakes engaged which is an integer in the interval [0, 3]. • v max and v min are the upper and lower velocity limits, respectively. In our experiments, we use v max = 28.5m/s and v min = 20m/s. • Qv 1 ,Qv 2 and Qv 3 represent the different velocity states. Qb 1 ,Qb 2 ,Qb 3 ,Qb 4 and Qb 5 are the braking states. Qb 5 state is entered when there is an attack on the system, in which case, the number of brakes engaged is 0. • n 1 ∼ N(0,1), n 2 ∼ N(0,0.1), n 3 ∼ N(0,0.3), n 4 ∼ N(0,3) and n 5 ∼ N(0,3) are the Gaussian noise variables. The observations/readings of the train velocity is assumed to be noisy. The cruising speed of the train is set to 25m/s and the train oscillates about this speed by± 2.5m/s. Under normal conditions, the train maintains the cruising speed and applies its brakes when the speed exceeds a threshold/upper limit. In an anomalous situation (or attack), all the brakes fail to engage and hence the train fails to maintain its speed within the desired/set limits. The velocity parameters were set to be in the interval [0,30] m/s. For this system, we generated data from 200 simulations (100 for normal and 100 for anomaly behavior). The time- series data corresponding to the simulations is shown in Fig. 3.7, in which the green traces represent normal behaviors, while the red traces represent anomalies. Since Breach [30] could possibly choose a low initial velocity during an anomaly situation, a very low percentage of the traces tend to exhibit normal behaviors, as seen from the Fig. 3.7. Similar to our previous approaches, the data was split 50% (50 normal and 50 anomaly traces) for training and the rest for testing. We applied our enumerative solver to extract the STL formula: G [0,100] (x(t)< 35.8816), where the threshold learned by our solver is 35.8816 (shown by the dashed blue line in the figure). The MCRs for training and testing are 0.01 and 0.02 respectively. The execution time of our approach is 30.45s (with signatures) and 35.54s (without signatures). 47 Figure 3.7: Simulation results of cruise control of the train (Green traces: normal operation, red traces: anomalous behavior and dash line: the threshold of the STL learned by enumerative solver (= 35.8816)). The STL formula obtained by our enumerative solver is simpler compared to the one extracted in [50], which is: F [0,100) (F [10,69) (y(t)< 24.9)∧ F [13.9,44.2) (y(t)> 17.66))) with MCRs= 0 for both training and testing. 3.4 User Study We conducted a user study to investigate whether STL formulas are indeed understandable to humans. Our study tests the following hypotheses: Hypothesis H1: STL formulas are understandable by humans. Hypothesis H2: Longer STL formulas potentially hamper interpretability. To test the aforementioned hypotheses, the study focuses on answering the following research questions: • RQ1: Can users map an STL formula to its corresponding natural language description? • RQ2: Can users map one or more time-series data to the best STL formula that describes the data? • RQ3: Can users map one or more robot trajectories to a correct STL formula? 48 • RQ4: Can users identify a faulty STL formula and repair it? • RQ5: Do the users find STL formulas understandable? • RQ6: Which STL operators are more difficult to interpret? • RQ7: What are the challenges in understanding an STL formula? • RQ8: Is it difficult for users to understand longer STL formulas? We recruited 26 graduate students with engineering backgrounds from US universities. 73% of the participants did not have any familiarity with STL, and the rest had some degree of familiarity with STL by taking a course, conducting a project, or reading a few publications related to STL in the past. None of the participants had expertise in the field of STL. This user study was approved by the Institution Review Board (IRB) of the University of Southern California. At the beginning of the study we provided a brief introduction (16 minutes) to time-series data and Signal Temporal Logic (STL) formulas † . Then, we asked the participants to answer a quiz. The purpose of the quiz was to gauge the level of attention and preliminary understanding of STL that the participants had acquired through the tutorial. As a prerequisite for our study is at least a basic familiarity with the structure of STL formulas, and the meaning of temporal and Boolean operators, the purpose of the quiz was to filter out participants who lacked this basic familiarity after the tutorial (for example, if a participant was not attentive when the tutorial was presented). All of the participants passed the quiz. For the study, we randomly divided participants into four groups. The only difference between these four groups was that the STL formulas used in the study were different for each group. The purpose was to include a variety of STL formulas in the study. We used our own expertise to categorize the STL formulas considered for the study into groups of “easy”, “medium” and “difficult/challenging” formulas. The lengths of STL formulas chosen for this study ranges from 2 to 10. † Tutorial link. 49 Figure 3.8: User study RQ1: Choosing the best natural language description for a given STL formula. To investigate RQ1, for each group of users, for each STL formula presented to that group, we provided four different natural language descriptions of the formula and asked the users to select the best NL descrip- tion. 88% of the users could identify the correct natural language description (see Fig. 3.8 as an example). In average, it took 1.11 minutes for users to answer this question. For RQ2, for each group of users, for a set of time-series data presented to that group, we provided four different STL formulas and asked the users to select the best STL formula. 88% of the users chose a correct STL formula but only 65% of them could choose the best STL formula. For example, in Fig. 3.9, both option 1 and 3 are correct, but option 1 is a more nuanced description of the trace. Option 1 describes the global behavior of the trace but option 3 only describes a local behavior of the trace, and option 1 subsumes option 3. In average, it took 2.39 minutes for users to finish this task. To answer RQ3, we showed participants one or two robot behaviors as videos and asked the users to choose the STL formula that correctly describes the videos. For instance, for the videos v1 and v2, we asked the users to choose one of the STL formulas illustrated in Fig. 3.10. 50% of the participants could identify the correct STL formulas among the given 5 options. Based on the qualitative feedback we received from the users at the end of the study, they had difficulties in identifying the relation between two videos. Plus, 50 Figure 3.9: User study RQ2: Choosing the best STL formula for a given trace. Figure 3.10: User study RQ3: Choosing the best STL formula that describes the two videos v1 and v2. they believed that the simulation environment was not clear enough, e.g., items picked up by robots were shown below the screen, and some users missed that (See video). In average, it took 3.43 minutes for users to do this task, which was the longest compared to other tasks. For RQ4, we showed a set of traces and their corresponding STL formulas to participants and asked them whether the given STL formula is correct or not. If the participants think the STL formula is incorrect, we ask them to repair the STL formula. All users could identify the incorrect STL formula and 85% of them could correctly repair the faulty STL formula (See Fig. 3.11). In average, the users could finish this task in 2.59 minutes. 51 Figure 3.11: User study RQ4: Repairing the incorrect STL formula. To answer RQ5, we asked users to give a score from 1 to 10 to the degree that they find STL formulas understandable. The distribution of the answers is illustrated in Fig. 3.12. In average, the users gave a score of 7.92 from 10 to interpretability of STL formulas. The minimum score was 3 and the maximum score was 10. For RQ6, we asked users to sort STL operators from easiest to most difficult to interpret. We used the index of each operator in the sorted least as its difficulty score given by each user. We computed average score for each operator and the results are illustrated in Fig. 3.13. Based on the results, the order of operators from easiest to most difficult is: not, or, and, always, eventually, implies, until. For RQ7, 96% of the users think the challenge in understanding STL formulas is nested operators. The other 4% think the main challenge is in understanding the Until operator. Finally for RQ8, we showed users a set of STL formulas with different lengths (Fig. 3.14) and asked them to sort the formulas from easiest to most difficult. We considered the index of each formula in the sorted list as its difficulty score. We computed the average score for each formula based on all participants’ 52 Figure 3.12: User study RQ5: The distribution of interpretability score of STL formulas based on the users’ opinions. Score range is from 1 to 10 which 10 shows the highest degree of interpretability. Figure 3.13: User study RQ6: Comparison of STL operators in terms of difficulty to interpret. 53 Figure 3.14: User study RQ8: Asking users to sort the STL formulas from easiest to most difficult. opinions. Fig. 3.15 shows the difficulty score of formulas with respect to their lengths. Approximately, as formula length increases, it becomes more difficult for users to interpret. For two formulas of the same length=4, the difficulty score is different and the reason is that one of them includes the Until operator which is the most challenging operator based on users’ opinions. Thus, long STL formulas potentially hamper interpretability, but there are other factors such as the operators used, and the level of nesting in temporal operators that affect the interpretability of STL formulas. Fig. 3.16 shows the relation between the nesting level and the difficulty degree of STL formulas. As the number of nested operators increase, the STL formula becomes more difficult to interpret. 3.5 Summary In this Chapter, we proposed a new technique for multi-class classification of time-series data using Signal Temporal Logic formulas. The key idea is to combine an algorithm for systematic enumeration of PSTL formulas with an algorithm for estimation of the satisfaction boundary of the enumerated PSTL formula. We also proposed a new optimization technique based on formula signatures to avoid enumeration of equivalent PSTL formulas. Finally, We showcase our technique with a number of case studies on real world data from different domains. The results show that our enumerative solver has a number of advantages compared 54 Figure 3.15: User study RQ8: Longer STL formulas potentially hamper interpretability. Figure 3.16: User study RQ8: As the number of nested operators increases, the formula becomes more difficult to interpret. 55 to existing approaches. We also conducted a user study to investigate whether STL formulas are indeed understandable to humans. 56 Chapter 4 Mining environment assumptions for CPS 4.1 Introduction Autonomous cyber-physical systems such as self-driving cars, unmanned aerial vehicles, general purpose robots, and medical devices can often be modeled as a system consisting of heterogeneous components. Each of these components could itself be quite complex: for example, a component could contain design elements that include but are not limited to a model predictive controller, a deep neural network, rule-based control and high-dimensional lookup tables to identify operating regime. Understanding the high-level behavior of such components at an abstract, behavioral level is thus a significant challenge. The complexity of individual components makes compositional reasoning about global properties a difficult task. Recently, there is considerable momentum to express formal requirements of design components using real-time temporal logics such as Signal Temporal Logic (STL) [48, 84, 40, 51, 13, 19]. Typical STL requirements express families of excitation patterns on the model inputs or designer-specified pre-conditions that guarantee desirable behavior of the model outputs [34]. In this work, we consider the dual problem: Given an output requirementϕ out , what are the assumptions on the model environment, i.e., input traces to the model, that guarantee that the corresponding output traces satisfy ϕ out ? Drawing on the terminology from [49, 41], we call this problem the assumption mining problem. 57 Figure 4.1: The LHS illustrates our STL-based classification framework (Chapter 3), and the RHS shows the high level flow of our environment assumption mining technique that leverages our classification tool. We propose an approach that reduces the assumption mining problem to supervised learning. We as- sume that input traces can be assigned labels desirable and undesirable based on whether the corresponding output traces satisfy or violate ϕ out respectively. A potential approach is to then use off-the-shelf super- vised learning methods for time-series data from the machine learning (ML) community. However, such techniques typically train discriminators in high dimensional spaces which may not be human-interpretable [50]. Interpretability is an important factor for safety-critical applications as components are usually de- veloped by independent design teams, and articulating the assumptions and guarantees in an interpretable format can reduce downstream bugs introduced during system integration. In this work, we assume that environment assumptions can be expressed in STL. The use of STL to express such assumptions has been explored before in [34, 48]. However, there is no existing work on au- tomatically inferring such assumptions from component models. The primary contribution of this work is a new algorithm to mine environment assumptions (expressed in STL). We leverage our STL-based classifica- tion tool presented in Chapter 3 to learn environment assumptions for a back-box model. Our environment assumption mining technique is illustrated in the RHS of Fig. 4.1, whose input consists of a back-box model 58 and an STL formula as the output specification. It produces an STL formula as the final output to serve as the environment assumption. Our environment assumption mining algorithm iterates between two main meth- ods: (1) generation of an STL classifier that can separate good and bad inputs with low misclassification rate. Input traces are labeled based on whether their corresponding outputs satisfy the output specification or not. The generated STL classifier is called environment assumption. (2) falsification-based assumption refinement to refine the learned environment assumption. We make use of a falsification procedure to check if there exists an input trace to the model that satisfies the learned environment assumption but the corre- sponding output does not satisfy the output specification. If such a trace exists, our algorithm goes to the first step of searching for an accurate STL-based classifier. To summarize, our key contributions are as follows: • We propose a new algorithm to mine environment assumptions (expressed in STL) automatically. • As our algorithm systematically increases the syntactic complexity of the PSTL formulas, it uses the Occam’s Razor principle to learn environment assumptions, i.e., it attempts to learn STL classifiers that are short, and hence simple and more interpretable * . • We propose a new technique based on decision-trees (DT) for inferring parameter valuations of a PSTL formula. Our DT-based approach performs faster compared to our validity domain boundary based approach (Algo. 1). • We demonstrate the capability of our assumption mining algorithm on a few benchmark models. The remainder of this Chapter is structured as follows. We present an approach for learning the param- eters of a PSTL formula in Section. 4.2.1. Next, we explain our environment assumption mining technique in Section. 4.2.2. We elaborate on our experimental results in Section. 4.3. Finally, we provide a summary in Section. 4.4. * We prevent excessive generalization and simplification by assuming a threshold on the accuracy of the learned STL formula. 59 4.2 Methods 4.2.1 Learning Parameters of PSTL formulas Here, we propose a new algorithm based on decision trees to learn the parameters of a PSTL formula that result in an STL formula that serves as a good classifier. A key limitation of the validity domain boundary based approach (presented in Chapter 3) is that it only works for monotonic PSTL formulas, and when the number of parameters is high, computing the validity domain boundary can be time-consuming. Instead, here, we consider an approach based on sampling the parameter space, obtaining robustness values for a given set of “seed” traces at each of the sampled points, and using these values as features in a decision-tree based classification algorithm. We now explain each of these steps in detail. Decision trees are a non-parametric supervised learning method used for classification and regression. Learned trees can also be represented as sets of if-then else rules which are understandable by humans. The depth of a decision tree is the length of the longest path from the root to the leaf nodes, and the size of a decision tree is the number of nodes in the tree. A binary decision tree is a tree that every non-terminal node has at most two children. Decision trees represent a disjunction of conjunctions of constraints represented by nodes in the tree. Each path from the tree root to a leaf corresponds to a conjunction of constraints while the tree itself is a disjunction of these conjunctions [67]. While decision trees improve human readability [67], they are not specialized in learning temporal properties of timed traces. A naïve application of a decision tree to timed traces would treat every time instant in the trace as a decision variable, leading to deep trees that lose interpretability. Example. We applied decision trees on a 2-dimensional synthetic data set. The data set consists of two sets of traces corresponding to signals x and y. In both sets y(t)= x(t− d), which d represents the delay between x and y. For label 1 traces d< 20, and for label 0 traces d> 30. Each node in decision tree corresponds a point of x and y signals in time. Decision trees failed to classify the data set properly since the resulting 60 tree has 179 nodes, and the accuracy of training is 50%, which is the accuracy of random classification. On the other hand, this data set can be easily classified using STL formula ϕ = G [0,100] (x(t)≥ 0.1 =⇒ F [0,20) (y(t)≥ 0.1)). A naïve use of decision trees thus does not provide the same dynamic richness as many temporal logic formulas. In our work, we use robustness values of a given PSTL formula at different parameter valuations as features for decision trees. For a PSTL formula containing only one parameter this is unnecessary, as we can simply determine the validity domain boundary (corresponding to 0 robustness value) by a simple binary search. However, for PSTL formulas with multiple independent parameters, random samples of the parameter space can be informative about the validity domain boundary and hence serve as features for our decision-tree based learning algorithm. Formally, Algo. 4 assumes that we are given sets of traces X 0 and X 1 , a PSTL formula ψ with the parameter spaceD P . D P is computed using upper and lower bounds on the values appearing in the input traces (e.g. in Fig. 3.5, for time instances = [0,60],D P = [0,60]× [0,80]× [15,45]). Theorem. For a given PSTL formula ψ, labeled traces X 0 , X 1 , and a threshold on MCR ε, Algo. 4 will either: (1) terminate with an STL formula ϕ, s.t. MCR is≤ ε, or (2) terminate with / 0 indicating that the current PSTL formula is not a suitable classifier. Proof follows from the structure of the algorithm. In Line 2, we split the given set of traces into training and test sets; 0.7 is an arbitrary heuristic indicating the ratio of the size of the training set to the total number of traces. In Line 3, we invoke the function computeFeatures. Essentially, this function maps each trace x(t) in the set X train to an n-element feature vector µ(x(t)). To produce this vector, we obtain n samples of the parameter space along a user-defined grid. In principle, we can use n random samples of the parameter spaceD P ; however, in our experiments we found that random sampling may miss parameter values crucial to obtain high accuracy. In some sense, grid sampling covers the parameter space more evenly leading to better classification accuracy. Note that the grid sampling procedure also checks for validity of a parameter 61 Algorithm 4: Learning parameters of PSTL formulas using Decision Tree Input: ψ,X 0 ,X 1 ,ε Output: ϕ,MCR 1 FunctionLearnPSTLParamsDT(ψ,X 0 ,X 1 ,ε): // Split data for train and test 2 X train ,X test ← split(X 0 ∪ X 1 ,0.7) ; // Compute robustness values as features for training 3 µ train ← computeFeatures(X train ,ψ,D P (ψ)) ; 4 foreach x(t)∈(X test ∪ X train ) do 5 if (x(t)∈ X 1 ) then 6 ℓ(x(t))= 1 7 else 8 ℓ(x(t))= 0 // Train decision tree using computed features 9 ϒ[ψ]← TrainDecisionTree(µ,ℓ(µ)) ; // Compute accuracy 10 foreach x(t)∈ X test do 11 µ test (x(t))← computeFeatures(X test ,ψ,D P (ψ)) ; 12 ℓ ∗ (x(t))← ϒ[ψ](µ test (x(t))) 13 MCR← computeMCR(ℓ,ℓ ∗ ) ; 14 ϕ =GetSTL(ϒ[ψ]) ; 15 if MCR≤ ε then 16 returnϕ,MCR; 17 else 18 return / 0; 19 FunctioncomputeFeatures(X,ψ,D P (ψ)): // Sample n parameter values 20 D P n ← gridSample(D P ,n) ; 21 foreach x(t)∈ X do 22 for i∈[1,n] do 23 ψ i ← ψ(ν(D P n (i))) ; 24 µ(x(t))[i]← ρ(ψ i ,x,0) ; 25 returnµ ; 62 Figure 4.2: Grid sampling of time parameters for formula G [τ 1 ,τ 2 ] (x(t)> c). Sinceτ 1 should be less thanτ 2 , the area aboveτ 1 =τ 2 line (green line) is sampled. sample; e.g. if τ 1 and τ 2 are parameters belonging to the same time-interval [τ 1 ,τ 2 ], then it imposes that τ 1 <τ 2 . See Fig. 4.2 for an example of grid sampling. Each sample in the parameter space corresponds to a valuation for the parameters in the PSTL formulaψ, and applying the i th valuation yields the STL formula ψ i (Line 23). We then use the robustness value of x(t) w.r.t. ψ i as the i th element of the feature vector, i.e. µ(x(t))[i]. For each trace in the set X train and X test , we assign it label 1 if it belongs to X 1 , and 0 otherwise (Line 8). In Line 9, we invoke the decision tree procedure on the feature vectors and the label sets. The edge be- tween any node in the decision treeϒ[ψ] and its children is annotated by a constraint of the formρ(ψ i ,x,0)< c for the left child, and its negation for the right child. Here, c is some real number. Next, we compute the MCR of the decision tree by computing the labels of the traces in the test set X test and comparing them to their ground truth labels. The functioncomputeMCR simply computes the ratio|{x(t)|ℓ(x(t))̸=ℓ ′ (x(t))}|/ |X test |. Extracting Interpretable STL formulas. In Line 9 of Algo. 4, the function TrainDecisionTree returns a decision tree ϒ[ψ] of the form shown in Fig. 4.3. We note that the edge labels correspond to inequality tests over STL formulasψ i corresponding to the same PSTL formulaψ, but with different valuations for the parameters. Each path from the root of the tree to a leaf node represents a conjunction of the edge labels, 63 1 2 3 4 5 ℓ= 1 ℓ= 0 ℓ= 1 ρ(ψ 1 ,x,0)< c 1 ρ(ψ 1 ,x,0)≥ c 1 ρ(ψ 2 ,x,0)< c 2 ρ(ψ 2 ,x,0)≥ c 2 Figure 4.3: Example Tree returned byTrainDecisionTree function in Algo. 4. and the disjunction over all paths leading to the same label represents the symbolic condition for mapping a given trace to a given label. We now show that given a decision tree of this form, it is always possible to extract an STL formula from the symbolic condition that the decision tree represents. Lemma. For any STL formula ϕ, any trace x(t), and any time instance t, for∼∈{ <,≥ ,>,≤} , any con- straint of the formρ(ϕ,x,t)∼ c can be transformed to the satisfaction or violation of a formula ˆ ϕ by x(t), whereρ(ϕ,x,t) denotes the robustness ofϕ with respect to the trace x(t) at time t, and ˆ ϕ can be obtained fromϕ and c using simple transformations (shifts in space parameters). Proof. We prove the above lemma using structural induction on the syntax of STL. The base case is for atomic predicates. Suppose ϕ = f(x)> c, then if ρ(ϕ,x,t)> c, by the definition of a robustness value, f(x(t))− (c+ c)> 0. Let ˆ ϕ = f(x)>(c+ c). Then, ρ(ϕ,x,t)> c implies that at time t, x(t)|= ˆ ϕ. The proof for atomic predicates indicating other kinds of inequalities is similar. The inductive hypothesis is that the above lemma holds for all proper subformulas of ϕ, and in the in- ductive step we show that if this is true, then the lemma holds forϕ. (1) Letϕ =¬ψ. Then,ρ(¬ψ,u,t)> c implies that− ρ(ψ,x,t)> c, or ρ(ψ,x,t)≤− c. Let c ′ =− c, then by the inductive hypothesis, there is a formula ˆ ψ such thatρ( ˆ ψ,x,t)≤ 0. 64 (2) Letϕ =ψ 1 ∧ψ 2 . Ifρ(ψ 1 ∧ψ 2 ,x,t)> c, then min(ρ(ψ 1 ,x,t),ρ(ψ 2 ,x,t))> c , which implies that ρ(ψ 1 ,x,t)> c and ρ(ψ 2 ,x,t)> c. Again, by the inductive hypothesis, this im- plies that there are formulas ˆ ψ 1 and ˆ ψ 2 such that ρ( ˆ ψ 1 ,x,t)> 0 and ρ( ˆ ψ 2 ,x,t)> 0. This implies that min(ρ( ˆ ψ 1 ,x,t),ρ( ˆ ψ 2 ,x,t))> 0, orρ( ˆ ψ 1 ∧ ˆ ψ 2 ,x,t)> 0. (3) Let ϕ =ψ 1 ∨ψ 2 . An argument similar to (2) can be used to prove that we can obtain ˆ ψ 1 and ˆ ψ 2 such thatρ( ˆ ψ 1 ∨ ˆ ψ 2 ,x,t)> 0. (4) Let ϕ = G I ψ. ρ(G I ψ,x,t)> c implies that∀t ′ ∈ t⊕ I, ρ(ψ,x,t)> c. Following similar reasoning as (2), and using the inductive hypothesis, we can show that there exists an STL formula ˆ ψ such that the above is equivalent toρ(G I ˆ ψ,x,t)> 0. (5) Forϕ = F I ψ, andϕ =ψ 1 U I ψ 2 similar reasoning as (4) can be used. We omit the details for brevity. Finally, we can have a similar proof for any constraint of the form ρ(ϕ,x,t)< c. For example, con- sider ϕ =ψ 1 ∧ψ 2 . ρ(ϕ,x,t)< c implies that min(ρ(ψ 1 ,x,t),ρ(ψ 2 ,x,t))< c, which in turn implies that ρ(ψ 1 ,x,t)< c orρ(ψ 2 ,x,t)< c. By the inductive hypothesis, we can obtain ˆ ψ 1 and ˆ ψ 2 such thatρ( ˆ ψ 1 ,x,t)< 0 orρ( ˆ ψ 2 ,x,t)< 0, which implies thatρ( ˆ ψ 1 ∧ ˆ ψ 2 ,x,t)< 0. As we are able to prove the inductive step for any kind of STL operator, and for all types of constraints on the robustness value, by combining the different cases, we can conclude that the lemma holds for an arbitrary STL formula. 65 Theorem. Given a decision treeϒ[ψ] where edge labels denote constraints of the formρ(ψ i ,x,0)> c i , we can obtain an STL formula that is satisfied by all input traces that are labeled 1 by the decision tree. Proof. The proof follows from the proof of Lemma 25. Essentially, each constraint corresponding to an edge label can be transformed into an equivalent STL formula, and each path is a conjunction of edge labels; so each path gives us an STL formula representing the conjunction of formulas corresponding to each edge label. Finally, a disjunction over all paths corresponds to a disjunction over formulas corresponding to each path. Remark. We note that the above procedure does not require the PSTL formula to be monotonic. If the chosen PSTL formula is monotonic, then it is possible to simplify the formula further. Essentially, along any path, we can retain only those formulas corresponding to parameter valuations that are incomparable according to the order imposed by monotony. Furthermore, each of these valuations corresponds to points on the validity domain boundary as the robustness value for these valuations is close to zero. We also remark that Lemma 25 gives us a constructive approach to build an STL formula from the decision tree – we simply need to follow the recursive rules to push the constants appearing in the inequalities on the robustness values to the atomic predicates. Theorem. The worst case time-complexity of learning the decision treeϒ[ψ] is O(|X train |·|ψ|· n grid · T)+ O(|X train |· n 2 grid ) , where|X train | is the size of traces for training,|ψ| is the length of ψ, n grid is the number of grid samples, and T is the number of instances in each trace. Proof. We need to compute the robustness of |ψ| with respect to traces X train for all n grid grid samples, resulting in a time-complexity of O(|X train |·|ψ|· n grid · T). The time-complexity of decision tree learning 66 on X train data each with n grid features is O(|X train |· n 2 grid ) † . Thus, the overall time-complexity of learning O(|X train |·|ψ|· n grid · T)+ O(|X train |· n 2 grid ). 4.2.2 Environment Assumption Mining In this section, we describe our overall approach to mine environment assumptions for a CPS component. The central idea in our approach is a counterexample-guided inductive synthesis (CEGIS) algorithm com- bined with our previously developed supervised learning technique (Chapter 3) to mine environment as- sumptions. The key steps of this process are shown in Algo. 5. First, we define the dynamical Model of a CPS component as follows: Definition 10 (Dynamical Model of a CPS component). A dynamical model M C of a CPS component C is defined as a machine containing a set of input signals (i.e. input trace variables) u, output signals y, and state signals x. We assume that the domains of u, y and x are U, Y and X respectively. Let x(0) denote the initial valuation of the state variables. The dynamical model M C takes an input trace u(t), an initial state valuation x(0) and produces an output trace y(t), denoted as y(t)= M C (u(t),x(0)). We note that typically, there may be a state trace x(t) denoting a system trajectory that evolves according to certain dynamical equations that depend on x(τ) forτ< t and u(t). Further, y(t) is usually a function of x(t) and u(t). However, for the purpose of this work, we are only concerned with the input/output behavior of C, and do not explicitly reason over x(t). We also assume that the initial valuation for the state variables is fixed ‡ . Further, if the component C under test is obvious from the context, we drop the subscript. Thus, we can simply state that y(t)= M(u(t)) to denote the simplified view that the model M is a function over traces that maps input traces to output traces. † A typical decision tree learning algorithm, such as C4.5, has a time complexity of O(m · n2), where m is the size of the training data and n is the number of features [90]. ‡ This is not limiting as we can simply have an input variable that is used to set an initial valuation for x(t) at time 0 and is ignored for all future time points. 67 Next, we formalize the notion of output requirements, support variables of a formula and environment assumptions as follow: Definition 11 (Output requirement). Output requirement orϕ out is an STL formula that is satisfied by output traces of the system if their behavior is desirable and is not satisfied otherwise. Definition 12 (Support Variables of a Formula). Given an STL formulaϕ, the support variables ofϕ is the set of signals appearing in atomic predicates in any subformula. We denote support ofϕ bysupp(ϕ). Definition 13 (Environment Assumption). Given a dynamical component model M C =(u,x,y), an output requirement is an STL formula ϕ out , where supp(ϕ out ) = y. Given an output requirement ϕ out , an STL formulaϕ in is called an environment assumption if: 1. supp(ϕ in )= u, 2. ∀u(t) :(u(t)|=ϕ in ) =⇒ (M(u(t))|=ϕ out ). Essentially, an environment assumption is an STL property on the input traces to the model that guaran- tees that the corresponding output traces satisfy the output requirementϕ out . Example. Consider a simple model M that simply delays a given input signal by 1 second, i.e. the value of the output at time 1 is the value of the input signal at time 0 (and the values of the output in times[0,1) are defined as some default output trace value). Suppose the output requirement is G [1,100] (y> 0), then the property G [0,99] (u> 0) is a valid environment assumption for the model. In software verification parlance, the environment assumption could be viewed as a pre-condition over the input trace to the model that guarantees an assertion on the output trace. We assume that the user provides us a description of the input signal domain U (i.e. upper and lower bounds on the values appearing in the input traces), as well as a set of time instants on which input traces are expected to be defined (i.e. T(u)). Initially, we randomly sample input traces (Line 1 and label them as 68 Algorithm 5: Environment Assumption Mining Algorithm Input: Input signal domain U, Output requirementϕ out , Input signal time domainT(u), Model M=(u,y), Simulation Budget N for Falsification, Formula length limit ℓ max , Classification Accuracy 1− ε Output: Environment Assumptionϕ in 1 T = Sample input traces from U using time instants fromT(u) 2 foreach u(t)∈T do 3 if M(u(t))|=ϕ out then T good =T good ∪{u(t)} 4 else T bad =T bad ∪{u(t)} // Algo. 2 5 ψ proposed =EnumerateNextPSTL() 6 while|ψ proposed |<ℓ max do // Algo. 4 7 (ϕ proposed in , MCR) = LearnPSTLParamsDT(ψ proposed ,T good ,T bad ) 8 accuracy = 1-MCR 9 if accuracy> 1− ε then 10 cex(t) =Falsify(y|=ϕ out ,N) 11 subject to u(t)|=ϕ proposed in 12 y(t)= M(u(t)) 13 if cex(t)̸= / 0 then T bad =T bad ∪{cex(t)} 14 else returnϕ proposed in 15 else 16 ψ proposed =EnumerateNextPSTL() 69 good or bad (resp. Lines 3,4) depending on whether their corresponding outputs satisfy the given ϕ out . At the beginning of the while-loop, we assume that there is a PSTL formulaψ proposed that is being considered as a candidate environment assumption. The first time the loop body is executed, this enumeration occurs in Line 5, otherwise a new PSTL formula is obtained in the loop in Line 16. Once we have a candidate PSTL formulaψ proposed , we use Algo. 1 or Algo. 4 to learn the parameters ofψ proposed , which results in an interpretable STL formulaϕ proposed in . Ifϕ proposed in does not give a high classification accuracy for the given set of good/bad traces § , we move to the next PSTL formula to be enumerated till we reach a user-defined upper bound on the maximum formula length. If we exceed this bound, our procedure fails to find an accurate environment assumption. We note that it is possible that the candidate formula ϕ proposed in while being accurate in classifying the set of traces inT good andT bad is too permissive. This means that it may allow for input traces not present inT good for which the corresponding output traces do not satisfy ϕ out . We wish to constrain the environ- ment assumption to exclude such signals. Thus, we invoke an off-the-shelf falsification technique using the Falsify function to refine the synthesized environment assumption. There are many promising falsification tools such as [4, 27, 30] that our technique could use. The falsifier uses a global optimizer to identify an input trace u(t) satisfyingϕ proposed in for which M(u(t))⊭ϕ out (Line 10). Typical falsifiers parameterize the input trace using a finite number of control points, i.e., time points at which the signal value is deemed to be an optimization variable. At all other time points, the intermediate signal values are obtained through § Initially, it is possible that we do not get any bad traces by random sampling. In this case, we can replace the decision tree based classifier by a procedure that infers tight parameter valuations from only the positive examples using approaches such as [7, 49]. A potential drawback is that we may learn an environment assumption that is narrowly applicable only to the good traces and does not generalize well. 70 a user-specified interpolation scheme. Let ˆ u denote the control point vector used by the falsifier to gen- erate the input trace u(t). Then, consider an optimizer that tries to minimize the following cost function: cost(ˆ u)=(max(0,− ρ(ϕ proposed in ,u,0))+ 1) 2k − 1+ρ(ϕ out ,y,0) Essentially, this cost function represents a quantity that is highly positive if the input trace does not satisfy ϕ proposed in , thus favoring input control point vectors leading to traces that satisfyϕ proposed in . The constant k is a positive integer chosen to overpower the maximum negative robustness that can result from the output trace u(t) not satisfyingϕ in . If the input satisfyϕ proposed in , the first term is simply 0, and we only look for outputs that violateϕ out . If such an input trace is found, we add it to the list of bad traces (Line 13), and restart the enumerative solver from the last formula that it had enumerated. If there is no counterexample found, the algorithm terminates with an STL formula representing the environment assumption. Note that our algorithm auto- matically learns the structure of the environment assumption as well as the parameter values. In the next section, we propose a few benchmark models to showcase the working of our environment assumption mining framework. 4.3 Experimental Results 4.3.1 Benchmarking our DT-based parameter Inference To evaluate our DT-based parameter Inference framework, we apply our method on case studies presented in Chapter 3 and compare its performance with our boundary based approach (Algo. 1). Detecting abnormal trajectory in a maritime setting is the subject of the first case study. The second and third case studies are to detect the behavior of two CPS models under attack. We run the experiments on an Intel Core-i7 Macbook Pro with 2.7 GHz processors and 16 GB RAM. The results show that our decision tree based technique 71 performs faster compared to our boundary based approach; however, our boundary based technique may learn simpler formulas. The accuracy of the two techniques are comparable. 4.3.1.1 Maritime Surveillance We start with the Maritime surveillance problem presented in [16]. The synthetic data set is 2-dimensional and consists of one normal and two types of anomalous trajectories (Fig. 3.5). We used 100 traces for training and, 100 traces were reserved for testing. Our enumerative solver could learn three STL formulas to classify three kinds of trajectories. The results of using our decision tree based approach are as follows: ϕ green =¬G [15,30] (x(t)≤ 40.49)∧ G [30,45] (x(t)≤ 42.05), ϕ blue =¬G [30,45] (x(t)≤ 42.05), ϕ red = G [15,30] (x(t)≤ 40.49)∧ G [30,45] (x(t)≤ 42.05). The training MCR is 0 and, the testing MCR is 0.03 with training time = 44.17 seconds. The results of our validity domain boundary based approach are shown in table 3.2 in Chapter 3. Thus, the decision tree based approach is faster in comparison with the validity domain boundary based method, but the formulas learned 72 (ϕ green andϕ red ) by the decision tree based approach are longer. The MCRs are comparable. The simplest STL formula learned by the DTL4STL tool [16] to classify green traces from the others is: ϕ =(ϕ 1 ∧(¬ϕ 2 ∨(ϕ 2 ∧¬ϕ 3 )))∨(¬ϕ 1 ∧(ϕ 4 ∧ϕ 5 )) ϕ 1 = G [199.70,297.27) (F [0.00,0.05) (x(t)≤ 23.60) ϕ 2 = G [4.47,16.64) (F [0.00,198.73) (y(t)≤ 24.20) ϕ 3 = G [34.40,52.89) (F [0.00,61.74) (y(t)≤ 19.62) ϕ 4 = G [30.96,37.88) (F [0.00,250.37) (x(t)≤ 36.60) ϕ 5 = G [62.76,253.23) (F [0.00,41.07) (y(t)≤ 29.90) with the average misclassification rate of 0 .007. This STL formula is long and complicated compared to the STL formulasϕ green learned by our framework. Thus, [16] achieves a better MCR, at the price of a highly uninterpretable formula. 4.3.1.2 Linear System Here, we consider the second case study (Fig. 3.6) presented in Chapter 3. The enumerative solver was trained and tested on this data to extract an STL formula. The STL formula learned using our decision tree based method is G [0,3] (x(t)≥ 0.968), with training and testing MCRs = 0 and 0.03, respectively. The training time is 12.99 seconds. G [0,3] (x(t)≥ 0.9736) is the STL formula learned using our boundary based approach. Our procedure takes 39.05 seconds with training MCR= 0 and testing MCR= 0.02. Therefore, the decision tree based algorithm performs faster, while accuracy of the two approaches are comparable. The STL formulas learned using our framework are simpler compared to the one obtained in [50], which 73 is F [0,3.0) (G [0.5,2.0) (y(t)> 0.9634)), with training and testing MCRs= 0 and 0.01, respectively. Hence, we learn simpler formulas with slightly accuracy degradation. 4.3.1.3 Cruise Control of Train We also benchmark an example for cruise control in a train from [50] that is the third case study in Chapter 3 (See Fig. 3.7). Our decision tree based method learns STL classifier G [0,42.8571] (x(t)< 39.0586), with both training and test MCRs= 0. The time for learning the STL formula by our decision tree based tool is 27.23 seconds. We applied our boundary based solver to extract the STL formula: G [0,100] (x(t)< 35.8816), where the threshold learned by our solver is 35.8816 (shown by the dashed blue line in the figure). The MCRs for training and testing are 0.01 and 0.02 respectively. The execution time of our approach is 30.45s. The STL formula obtained by our two approaches are simpler compared to the one extracted in [50], which is: F [0,100) (F [10,69) (y(t)< 24.9)∧ F [13.9,44.2) (y(t)> 17.66))) with MCRs= 0 for both training and testing. 4.3.2 Benchmarking Environment Assumption mining In this section, we benchmark our environment assumption mining framework on a few case studies. The first case study is a synthetic model that we hand-crafted to demonstrate learning STL assumptions. The second case study is a model of an automatic transmission system. A model of an abstract fuel control is considered as the third case study. 4.3.2.1 Synthetic model Simulink ® is a visual block diagram language commonly used in industrial settings to model component- based CPS designs. We created a Simulink ® model that we know how it works to evaluate our environment assumption mining framework. Our synthetic model is an oscillator component that has two input signals u 1 and u 2 and an output signal y. The model has an internal flag that is turned on if within 3 seconds of the 74 input value u 1 falling below 0, the input value u 2 also falls below 0. When the flag is turned on, the oscillator outputs a sinusoidal wave with amplitude 5 units, and outputs a sine wave with amplitude 1 unit otherwise. We imagine a scenario where a downstream component requires the output of the oscillator component to be bounded by[− 1,1]. I.e., we require that the output y satisfies the STL requirement ϕ out = G(− 1≤ y(t)≤ 1). In this example, we generate 100 input traces using a Simulink ® based signal generator. We pick a small subset of these traces that includes input traces that both lead to outputs satisfyingϕ out (i.e. the good traces), and violatingϕ out (bad traces). We use our supervised learning framework to learn an STL classifier ϕ in . We then invoke the counterexample-guided refinement step of Algo. 5 to improve ϕ in . We learned the environment assumption ϕ in = G [0,20] (u 1 (t)< 0 =⇒ G [0,5] (u 2 (t)≥ 0)). This means that when input u 1 becomes negative, u 2 should stay non-negative within[0,5] seconds. Otherwise, the output will violateϕ out . The time taken to learn thisϕ in is 6084 seconds and both the training and testing MCRs are 0. We note that in this case, the learned formula ϕ in is stronger than the theoretical environment assumption that we had in mind when designing the model. This discrepancy can be due to the reason that our training set did not include trajectories where u 1 (t 1 )< 0 and u 2 (t 2 )< 0 occurred when 3< t 2 − t 1 < 5. 4.3.2.2 Automatic Transmission Controller We consider automatic transmission controller which is a built-in model in Simulink ® , shown in Fig. 4.4. This model consists of modules to represent the engine, transmission, the vehicle, and a shift logic block to control the transmission ratio. User inputs to the model are throttle and brake torque. Engine speed, gear and vehicle speed are outputs of the system. We are interested in the following signals: the throttle, the vehicle speed, and the engine speed measured in RPM (rotations per minute). We wish to mine the environment assumptions on the throttle that ensures that the engine speed never exceeds 4500 rpm, and that the vehicle 75 Figure 4.4: The Simulink model of automatic transmission controller [49]. Inputs of the system are throttle and brake. RPM, gear and speed are outputs of the system. never drives faster than 120 mph. In other words, we want to mine the STL specification ϕ in on the input of the system (throttle) that results in meeting the following output requirement: ϕ out = G(RPM≤ 4500)∧ G(speed≤ 120). A set of traces that violate this requirement is shown in Fig. 4.5. We applied our assumption mining method on 600 throttle traces (300 for training and 300 for testing). The formula produced by our framework is ϕ in = G [240,480] (x(t)< 40.4281) with training and testing MCRs equal to 0 and 0.02, respectively. This formula implies that if the throttle stays below 40.4281 in the time interval[240,480], the engine and vehicle speed will meet the requirement. Otherwise, engine or vehicle speeds violate the specifications and go beyond the specified threshold. It is difficult to mine such behaviors by looking at input and output traces of the system, and our technique helped in mining such assumptions on the input automatically. The training time for learning the STL formula is 28.18 seconds. 4.3.2.3 Abstract Fuel Control Model In [48], the authors provide a Simulink ® model for a powertrain control system. The model takes as input the throttle angle and engine speed, and outputs the Air-to-Fuel ratio. As specified in [48], the authors indicate that it is important for the A/F ratio to stay within 10% of the nominal value. The output requirements in 76 Figure 4.5: Violating traces for the automatic transmission controller. [48] are applicable only in the normal mode of operation of the model, when the throttle angle exceeds a certain threshold, the model switches into the power mode. In this mode, the A/F ratio is allowed to be lower. We wanted to extract the assumptions on the throttle angle that lead to significant excursions from the stoichiometric A/F value. We were able to learn the formula G [0,100] (x(t)< 61.167) which confirms that the excursions happen when the model goes into the power mode. We were able to synthesize the environment assumptions in 13.40 seconds with both training and testing MCRs of 0. 4.4 Summary In this Chapter, we considered the problem of mining environment assumptions for CPS components using Signal Temporal Logic formulas. An input trace satisfying an environment assumption hopefully guarantees to produce an output that meets the requirement on output traces of the component. We use a falsification procedure that systematically enumerates parametric STL formulas and uses a decision tree based classifi- cation procedure to learn both the structure and precise numeric constants of an STL formula representing the environment assumption. Finally, we demonstrate our technique on a few benchmark CPS models. 77 Chapter 5 Learning Spatio-Temporal logic formulae 5.1 Introduction Due to rapid improvements in sensing and communication technologies, embedded systems are now often spatially distributed. Such spatially distributed systems (SDS) consist of heterogeneous components embed- ded in a specific topological space, whose time-varying behaviors evolve according to complex mutual inter- dependence relations [73]. In the formal methods community, tremendous advances have been achieved for verification and analysis of distributed systems. However, most formal techniques abstract away the specific spatial aspects of distributed systems, which can be of crucial importance in certain applications. For ex- ample, consider the problem of developing a bike-sharing system (BSS) in a “sharing economy”; each BSS consists of a number of bike stations which could be arbitrary locations in a city. The design of an effective BSS would require reasoning about the distance to nearby locations, and the time-varying demand or supply at each location. For instance, the property “there is always a bike and a slot available at distance d from a bike station” depends on the distance of the bike station to its nearby stations. Evaluating whether the BSS functions correctly is a verification problem where the specification is a spatio-temporal logic formula. Sim- ilarly, consider the problem of coordinating the movements of multiple mobile robots, or a HV AC controller that activates heating or cooling in parts of a building based on occupancy. Given spatio-temporal execution traces of nodes in such systems, we may be interested in analyzing the data to solve several classical formal 78 methods problems such as fault localization, debugging, invariant generation or specification mining. It is increasingly urgent to formulate methods that enable reasoning about spatially-distributed systems in a way that explicitly incorporates their spatial topology. In this work, we focus on one specific aspect of spatio-temporal reasoning: mining interpretable logical properties from data in an SDS. We model a SDS as a directed or undirected graph where individual compute nodes are vertices, and edges model either the connection topology or spatial proximity. Traditional machine learning (ML) approaches have also been used to uncover the structure of such spatio-temporal systems, but these techniques also suffer from the lack of interpretability. Our proposed method draws on a recently proposed logic known as Spatio-Temporal Reach and Escape Logic (STREL) [11]. Recent research on STREL has focused on efficient algorithms for runtime verification and monitoring of STREL specifications [11, 12]. However, there is no existing work on mining STREL specifications. Mined STREL specifications can be useful in many different contexts in the design of spatially dis- tributed systems; an incomplete list of usage scenarios includes the following applications: (1) Mined STREL formulas can serve as spatio-temporal invariants that are satisfied by the computing nodes, (2) STREL formulas could be used by developers to characterize the properties of a deployed spatially dis- tributed system, which can then be used to monitor any subsequent updates to the system, (3) Clustering nodes that satisfy similar STREL formulas can help debug possible bottlenecks and violations in communi- cation protocols in such distributed systems. To address the shortcomings of previous techniques, we introduce PSTREL, by treating threshold con- stants in signal predicates, time bounds in temporal operators, and distance bounds in spatial operators as parameters. We then identify a monotonic fragment of PSTREL, and propose a multi-dimensional binary- search based procedure to infer tight parameter valuations for the given PSTREL formula. We also explore the space of implied edge relations between spatial nodes, proposing an algorithm to define the most suit- able graph. After defining a projection operator that maps a given spatio-temporal signal to parameter values 79 of the given PSTREL formula, we use an agglomerative hierarchical clustering technique to cluster spatial locations into hyperboxes. We improve the method of [96] by introducing a decision-tree based approach to systematically split overlapping hyperbox clusters. The result of our method produces axis-aligned hy- perbox clusters that can be compactly described by an STREL formula that has length proportional to the number of parameters in the given PSTREL formula (and independent of the number of clusters). Finally, we give human-interpretable meanings for each cluster. We show the usefulness of our approach considering four benchmarks: COVID-19 data from LA County, Outdoor air quality data, BSS data and movements of the customer in a Food Court. Running Example: A Bike Sharing System (BSS) To ease the exposition of key ideas in this work, we use an example of a BSS deployed in the city of Edinburgh, UK. The BSS consists of a number of bike stations, distributed over a geographic area. Each station has a fixed number of bike slots. Users can pick up a bike, use it for a while, and then return it to another station in the area. The data that we analyze are the number of bikes (B) and empty slots (S) at each time step in each bike station. With the advent of electric bikes, BSS have become an important aspect in urban mobility, and such systems make use of embedded devices for diverse purposes such as tracking bike usage, billing, and displaying information about availability to users over apps. Fig. 5.1b shows the map of the Edinburgh city with the bike stations. Different colors of the nodes represent different learned clusters as can be seen in Fig. 5.1a. For example, using our approach, we learn that stations in orange cluster have a long wait time, and stations in red cluster are the most undesirable stations as they have long wait time and do not have nearby stations with bike availability. If we look at the actual location of red points in Fig. 1b, they are indeed far away stations. The remainder of the Chapter is structured as follows. We present several approaches to construct a spatial model in Section. 5.2.1. Next, we explain our STREL learning technique in Section. 5.2.2. We elaborate on our experimental results in Section. 5.3. Finally, we provide a summary in Section. 5.4. 80 Figure 5.1: Interpretable clusters automatically identified by our technique. (a) Clusters learned, (b) BSS locations in Edinburgh. 5.2 Methods 5.2.1 Constructing a Spatial Model In this section, we present four approaches to construct a spatial model (Fig. 5.2), and discuss the pros and cons of each approach. 1. (∞,d)-connectivity spatial model: This spatial model corresponds to the(δ,d)-connectivity spatial model as presented in Definition 5, where we set δ =∞. We note that this gives us a fully connected graph, i.e. where|W| is O(|L| 2 ). We remark that our learning algorithm uses monitoring STREL formulas as a sub-routine, and from Lemma. 1, we can see that as the complexity of monitoring a STREL formula is linear in|W|, a fully connected graph is undesirable (Fig. 5.2a). Lemma. Let⟨L,W⟩ be a spatial model where d S min is the minimum distance between two locations and let H [d 1 ,d 2 ] be a STREL formula where H is an arbitrary spatial operator. Then, the complexity of monitoring formula is O(k 2 ·| L|·| W|), where k= min{i|i· d S min > d 2 }. 81 Figure 5.2: Different approaches for constructing the spatial model for the BSS. (a) shows an (∞,d hvrsn )- connectivity spatial model where d hvrsn is the Haversine distance between locations. (b) shows a(δ,d hvrsn )- connectivity spatial model where δ = 1km. Observe that the spatial model is disconnected. (c) shows an MST-spatial model. (d) shows an (α,d hvrsn ) enhanced MSG spatial model with α = 2. Observe that this spatial model is sparse compared even to the(δ,d hvrsn )-connectivity spatial model. 2. (δ,d)-connectivity spatial model: This is the model presented in Definition 5, where δ is heuristi- cally chosen in an application-dependent fashion. Typically, the δ we choose is much smaller com- pared to the distance between the furthest nodes in the given topological space. This gives us W that is sparse, and thus with a lower monitoring cost; however, a smallδ can lead to a disconnected spatial model which can affect the accuracy of the learned STREL formulas. Furthermore, this approach may overestimate the spatial model induced distance between two nodes (as in Definition 7) that are not connected by a direct edge. For instance, in Fig. 5.2b, nodes 1 and 8 are connected through the route 1→ 9→ 8, and sum of the edge-weights along this route is larger than the actual (metric) distance of 1 and 8. 82 3. MST-spatial model: To minimize the number of edges in the graph while keeping the connectivity of the graph, we can use Minimum Spanning Tree (MST) as illustrated in Fig. 5.2c. This gives us|W| that is O(|L|), which makes monitoring much faster, while resolving the issue of disconnected nodes in the (δ,d)-spatial model. However, an MST can also lead to an overestimate of the spatial model induced distance between some nodes in the graph. For example, in Fig. 5.2c, the direct distance between nodes 1 and 8 is much smaller than their route distance (through the route 1→ 2→ 3→ 4→ 5→ 6→ 7→ 8). 4. (α,d)-Enhanced MSG Spatial Model: To address the shortcomings of previous approaches, we pro- pose constructing a spatial model that we call the(α,d)-Enhanced Minimum Spanning Graph Spatial model. First, we construct an MST over the given set of locations and use it to define W and pickα as some number greater than 1. Then, for each distinct pair of locationsℓ 1 ,ℓ 2 , we compute the short- est route distance d S (ℓ 1 ,ℓ 2 ) between them in the constructed MST, and compare it to their distance d(ℓ 1 ,ℓ 2 ) in the metric space. If d S (ℓ 1 ,ℓ 2 )>α· d(ℓ 1 ,ℓ 2 ), then we add an edge(ℓ 1 ,d(ℓ 1 ,ℓ 2 ),ℓ 2 ) to W. The resulting spatial model is no longer a tree, but typically is still sparse (See Algo. 6). Algo. 6 is a simple way of constructing an (α,d)-enhanced MSG spatial model, and incurs a one-time cost of O(|L| 2 · (|L|+|W|· log(|L|))). We believe that the time complexity can be further improved using a suitable dynamic programming based approach. In our case studies, the cost of building the enhanced MSG spatial model was insignificant compared to the other steps in the learning procedure. 5.2.2 Learning STREL formulas from data In this section, we first introduce Parametric Spatio-Temporal Reach and Escape Logic (PSTREL) and the notion of monotonicity for PSTREL formulas. Then, we introduce a projection function π that maps a spatio-temporal trace to a valuation in the parameter space of a given PSTREL formula. We then cluster the 83 Algorithm 6: Algorithm to create an(α,d hvrsn )-Enhanced MSG Spatial Model Input: A set of locations L (vertices of the graph), Longitudes and Latitudes of the locations, factor α> 1 Output:S 1 S = minSpanningTree(L,Longitudes,Latitudes)// Prim’s algorithm 2 3 for i← 1 to|L| do 4 for j← i+ 1 to|L| do // Length of the shortest path between i and j 5 shortestPathMST i j = length(shortestPath(S,i, j)) // Compute the Haversine distance between i and j 6 directDistance i j = d hvrsn (longitudes[i, j],latitudes[i, j]) 7 if shortestPathMST i j >α· directDistance i j then addEdge(S,i, j) 8 returnS trace-projections using Agglomerative Hierarchical Clustering, and finally learn a compact STREL formula for each cluster using Decision Tree techniques. Parametric STREL (PSTREL). Parametric STREL (PSTREL) is a logic obtained by replacing one or more numeric constants appearing in STREL formulas by parameters; parameters appearing in atomic pred- icates are called magnitude parametersP V , and those appearing in temporal and spatial operators are called timingP T and spatial parametersP d S respectively. Each parameter inP V take values from V , those in P T take values fromT , and those inP d S take values fromR ≥ 0 (i.e. the set of values that the d S metric can take for a given spatial model). We define a valuation function ν that maps all parameters in a PSTREL formula to their respective values. Example. Consider the PSTREL formula ϕ(p τ ,p d ,p c ) = G [0,p τ ] 3 [0,p d ] (B> p c ). The valuation ν: p τ 7→ 3hours, p d 7→ 1km, and p c 7→ 10 returns the STREL formulaϕ = G [0,3hours] 3 [0,1km] (B> 10). Definition 14 (Parameter Polarity, Monotonic PSTREL). A polarity function sgn maps a parameter to an element of{+,−} , and is defined as follows: sgn(p)=+ def = ν ′ (p)>ν(p)∧(σ,ℓ)|=ϕ(ν(p))⇒(σ,ℓ)|=ϕ(ν ′ (p)) sgn(p)=− def = ν ′ (p)<ν(p)∧(σ,ℓ)|=ϕ(ν(p))⇒(σ,ℓ)|=ϕ(ν ′ (p)) 84 The monotonic fragment of PSTREL consists of PSTREL formulas where all parameters have either positive or negative polarity. In simple terms, the polarity of a parameter p is positive if it is easier to satisfy ϕ as we increase the value of p and is negative if it is easier to satisfy ϕ as we decrease the value of p. The notion of polarity for PSTL formulas was introduced in [7], and we extend this to PSTREL and spatial operators. The polarity for PSTREL formulasϕ(d 1 ,d 2 ) of the form3 [d 1 ,d 2 ] ψ,ψ 1 R [d 1 ,d 2 ] ψ 2 , andE [d 1 ,d 2 ] ψ are sgn(d 1 )=− and sgn(d 2 )=+, i.e. if a spatio-temporal trace satisfies ϕ(ν(d 1 ),ν(d 2 )), then it also satisfies any STREL formula over a strictly larger spatial model induced distance interval, i.e. by decreasingν(d 1 ) and increasing ν(d 2 ). For a formula□ [d 1 ,d 2 ] ψ, sgn(d 1 )=+ and sgn(d 2 )=− , i.e. the formula obtained by strictly shrinking the distance interval. The proofs are simple, and provided in Appendix for completeness. Definition 15 (Validity Domain, Boundary). LetD P = V |P V | × T |P T | × (R ≥ 0 ) |P d S | denote the space of parameter valuations, then the validity domainV() of a PSTREL formula at a locationℓ with respect to a set of spatio-temporal tracesT is defined as follows: V(()ϕ(p),ℓ,T)={ν(p)| p∈D P ,σ∈T,(σ,ℓ)|= ϕ(ν(p))} The validity domain boundary ∂V(()ϕ(ϕ),ℓ,T) is defined as the intersection of V(()ϕ,ℓ,T) with the closure of its complement. Spatio-Temporal Trace Projection. We now explain how a monotonic PSTREL formula ϕ(p) can be used to automatically extract features from a spatio-temporal trace. The main idea is to define a total order > P on the parameters p (i.e. parameter priorities) that allows us to define a lexicographic projection of the spatio-temporal traceσ at each locationℓ to a parameter valuationν(p) (this is similar to assumptions made in [49, 96]). We briefly remark how we can relax this assumption later. Let ν j denote the valuation of the j th parameter. 85 Definition 16 (Parameter Space Ordering, Projection). A total order on parameter indices j 1 >...> j n imposes a total order≺ lex on the parameter space defined as: ν(p)≺ lex ν ′ (p)⇔∃ j k s.t. sgn(p j k )=+⇒ν j k <ν ′ j k sgn(p j k )=−⇒ ν j k >ν ′ j k and∀m< P k,ν m =ν ′ m . Given above total order,π lex (σ,ℓ)= inf ≺ lex {ν(p)∈∂V(()ϕ(p),{σ}}. In simple terms, given a total order on the parameters, the lexicographic projection maps a spatio- temporal trace to valuations that are least permissive w.r.t. the parameter with the greatest priority, then among those valuations, to those that are least permissive w.r.t. the parameter with the next greater priority, and so on. Finding a lexicographic projection can be done by sequentially performing binary search on each parameter dimension [96]. It is easy to show thatπ lex returns a valuation on the validity domain boundary. The method for finding the lexicographic projection is formalized in Algo. 7. The algorithm begins by setting the lower and upper bounds of valuations for each parameter. Then, each parameter is set to a pa- rameter valuation that results in the most permissive STREL formula (based on the monotonicity direction). Next, for each parameter in the defined order > P we perform bisection search to learn a tight satisfying parameter valuation. After completion of bisection search, we return the upper (lower) bound of the search interval for parameters with positive (negative) polarity. Remark. The order of parameters is assumed to be provided by the user and is important as it affects the unsupervised learning algorithms for clustering that we apply next. Intuitively, the order corresponds to what the user deems as more important. For example, consider the formula G [0,3hours] 3 [0,d] (B> c). Note that sgn(d)=+, and sgn(c)=− . Now if the user is more interested in the radius around each station where the number of bikes exceeds some threshold (possibly 0) within 3hours, then the order is d> P c. If she is more interested in knowing what is the largest number of bikes available in any radius (possibly∞) always within 3hours, then c> P d. 86 Algorithm 7: Lexicographic projection of spatio-temporal traces using multi-dimensional bisec- tion search Input: A traceσ(ℓ), a spatial modelS , a PSTREL formulaϕ(p), a parameter setP, monotonicity directionsγ(p), defined order on parameters > P ,δ > 0 // π lex (σ,ℓ) is the projection of σ(ℓ) to a point in the parameter space of ϕ Output: π lex (σ,ℓ) // Lower and upper bounds of each parameter 1 ν l (p)← inf(P),ν u (p)← sup(P) ; // Initialize each parameter with a value in the parameter space that results in the most permissive formula (based on the monotonicity direction of each parameter) 2 for i← 1 to|P| do 3 ifγ(p i )==+ thenν(p i )← ν u (p i ); 4 elseν(p i )← ν l (p i ); // Optimize each parameter in the defined order > P 5 for i← 1 to|P > | do 6 while|ν u (p i )− ν l (p i )|≥ δ i do 7 ν(p i )= 1 2 (ν l (p i )+ν u (p i ))// The middle point 8 ; // Compute robustness of the middle point 9 ρ =ρ(ϕ(ν(p i )),S,σ,ℓ,0); 10 ifρ≥ 0 &γ(p i )==+ then ν u (p i )← ν(p i ) ; 11 else ifρ≥ 0 &γ(p i )==− then ν l (p i )← ν(p i ) ; 12 else ifρ< 0 &γ(p i )==+ then ν l (p i )← ν(p i ) ; 13 else ν u (p i )← ν(p i ) ; 14 ifγ(p i )==+ thenν f inal (p i )← v u (p i ); 15 elseν f inal (p i )← ν l (p i ) ; 16 ; 17 returnπ lex (σ,ℓ)← ν f inal (p) Remark. Similar to [97], we can compute an approximation of the validity domain boundary for a given trace, and then apply a clustering algorithm on the validity domain boundaries. This does not require the user to specify parameter priorities. In all our case studies, the parameter priorities were clear from the domain knowledge, and hence we will investigate this extension in the future. Clustering. The projection operator π lex (σ,ℓ) maps each location to a valuation in the parameter space. These valuation points serve as features for off-the-shelf clustering algorithms. In our experiments, we use the Agglomerative Hierarchical Clustering (AHC) technique [26] to automatically cluster similar valuations. AHC is a bottom-up approach that starts by assigning each point to a single cluster, and then merging clusters 87 in a hierarchical manner based on a similarity criteria * . An important hyperparameter for any clustering algorithm is the number of clusters to choose. In some case studies, we use domain knowledge to decide the number of clusters. Where such knowledge is not available, we use the Silhouette metric to compute the optimal number of clusters. Silhouette is a ML method to interpret and validate consistency within clusters by measuring how well each point has been clustered. The silhouette metric ranges from− 1 to+1, where a high silhouette value indicates that the object is well matched to its own cluster and poorly matched to neighboring clusters [85]. Example. Fig. 5.1a shows the results of projecting the spatio-temporal traces from BSS through the PSTREL formulaϕ(τ,d) shown in Eq. (5.1). ϕ(τ,d)= G [0,3] (ϕ wait (τ)∨ϕ walk (d)) (5.1) In the above formula, ϕ wait (τ) is defined as F [0,τ] (B≥ 1)∧ F [0,τ] (S≥ 1), and ϕ walk (d) is3 [0,d] (B≥ 1)∧ 3 [0,d] (S≥ 1). ϕ(τ,d) means that for the next 3 hours, either ϕ wait (τ) or ϕ walk (d) is true. Locations with large values ofτ have long wait times and with large d values are typically far from a location with bike/slot availability (and are thus undesirable). Locations with small τ,d are desirable. Each point in Fig. 5.1a showsπ lex (σ,ℓ) applied to each location and the result of applying AHC with 3 clusters. Let K be the number of clusters obtained after applying AHC to the parameter valuations. Let C denote the labeling function mapping π lex (σ,ℓ) to{1,...,K}. The next step after clustering is to represent each cluster in terms of an easily interpretable STREL formula. Next, we propose a decision tree-based approach to learn an interpretable STREL formula from each cluster. Learning STREL Formulas from Clusters. The main goal of this subsection is to obtain a compact STREL formula to describe each cluster identified by AHC. We argue that bounded length formulas tend * We used complete-linkage criteria which assumes the distance between clusters equals the distance between those two ele- ments (one in each cluster) that are farthest away from each other. 88 to be human-interpretable, and show how we can automatically obtain such formulas using a decision-tree approach. Decision-trees (DTs) are a non-parametric supervised learning method used for classification and regression [67]. Given a finite set of points X⊆ R m and a labeling functionL that maps each point x∈ X to some labelL(x), the DT learning algorithm creates a tree whose non-leaf nodes n j are annotated with constraintsφ j , and each leaf node is associated with some label in the range ofL . Each path n 1 ,...,n i ,n i+1 from the root node to a leaf node corresponds to a conjunction V i j=1 h j , where h j =¬φ j if h j+1 is the left child of h j andφ j otherwise. Each label thus corresponds to the disjunction over the conjunctions corresponding to each path from the root node to the leaf node with that label. Recall that after applying the AHC procedure, we get one valuationπ lex (σ,ℓ) for each location, and its associated cluster label. We apply a DT learning algorithm to each point π lex (σ,ℓ), and each DT node is associated with aφ j of the form p j ≥ v j for some p j ∈ p. Lemma. Any path in the DT corresponds to a STREL formula of length that is O((|P|+ 1)·|ϕ|). Proof. Any path in the DT is a conjunction over a number of formulas of the kind p j ≥ v j or its negation. Because ϕ(p) is monotonic in each of its parameters, if we are given a conjunction of two conjuncts of the type p j ≥ v j and p j ≥ v ′ j , then depending on sgn(p j ), one inequality implies the other, and we can discard the weaker inequality. Repeating this procedure, for each parameter, we will be left with at most 2 inequalities (one specifying a lower limit and the other an upper limit on p j ). Thus, each path in the DT corresponds to an axis-aligned hyperbox in the parameter space. Due to monotonicity, an axis-aligned hyperbox in the parameter space can be represented by a formula that is a conjunction of|P|+ 1 STREL formulas (negations of formulas corresponding to the|P| vertices connected to the vertex with the most permissive STREL formula, and the most permissive formula itself) [96] (see Fig. 5.3a for an example in a 2D parameter space). Thus, each path in the DT can be described by a formula of length O((|P|+ 1)·|ϕ|), where|ϕ| is the length ofϕ. 89 Example. The result of applying the DT algorithm to the clusters identified by AHC (shown in dotted lines in Fig. 5.1a) is shown as the axis-aligned hyperboxes. Using the meaning ofϕ(τ,d) as defined in Eq. (5.1), we learn the formula¬ϕ(17.09,2100)∧¬ϕ(50,1000.98)∧ϕ(50,2100) for the red cluster. The last of these conjuncts is essentially the formula true, as this formula corresponds to the most permissive formula over the given parameter space. Thus, the formula we learn is: ϕ red =¬G [0,3] (ϕ wait (17.09)∨ϕ walk (2100))∧¬G [0,3] (ϕ wait (50)∨ϕ walk (1000.98)) The first of these conjuncts is associated with a short wait time and the second is associated with short walking distance. As both are not satisfied, these locations are the least desirable. Pruning the Decision Tree. If the decision tree algorithm produces several disjuncts for a given label (e.g., see Fig. 5.6a), then it can significantly increase the length and complexity of the formula that we learn for a label. This typically happens when the clusters produced by AHC are not clearly separable using axis-aligned hyperplanes. We can mitigate this by pruning the decision tree to a maximum depth, and in the process losing the bijective mapping between cluster labels and small STREL formulas. We can still recover an STREL formula that is satisfied by most points in a cluster using a k-fold cross validation approach (Algo. 9). The idea is to loop over the maximum depth permitted from 1 to N, where N is user provided, and for each depth performing k-fold cross validation to characterize the accuracy of classification at that depth. If the accuracy is greater than a threshold (90% in our experiments), we stop and return the depth as a limit for the decision tree. Fig. 5.6b illustrates the hyper-boxes obtained using this approach. For this example, we could decrease the number of hyper-boxes from 11 to 4 by miss-classifying only a few data points (less than 10% of the data). 90 Algorithm 8: K-fold cross validation approach to determine the best maximum depth of the De- cision Tree Input: The learned projectionsπ, labeling function C, threshold on the maximum depth of the Decision Tree N, parameter K for k-fold cross validation method, threshold on the accuracy ACC th Output: The learned decision tree DT // Loop over the maximum depth of Decision Tree 1 for d← 1 to N do // Compute the cross validation accuracy for maxDepth= d 2 ACC d = k f oldCrossValidation(π,C(π),maxDepth= d,K); // Choose the max depth that gives the best cross validation accuracy 3 if ACC d > ACC th then // Train a Decision Tree with the chosen max depth 4 DT = f itDecisionTree(π,C(π),d); 5 return DT ; 6 return / 0; 7 Function k f oldCrossValidation(X,Y,maxDepth,K) // Shuffle the data 8 X,Y = Shu f f le(X,Y); // Devide the data into K subsets 9 X(1 : K),Y(1 : K)= DivideToKSubsets(X,Y); 10 sumACC= 0; // Train on K-1 subsets and test on 1 subset 11 for i← 1 to K do 12 X train ,Y train =[X(1 : i− 1),X(i+ 1 : K)],[Y(1 : i− 1),Y(i+ 1 : K)]; 13 X test ,Y test = X(i),Y(i); 14 DT = f itDecisionTree(X train ,Y train ,maxDepth); 15 ACC= predictDecisionTree(DT,X(i),Y(i)); 16 sumACC= sumACC+ ACC; // Return the average accuracy 17 return sumACC/K ; 91 Algorithm 9: K-fold cross validation approach to determine the best maximum depth of the De- cision Tree Input: The learned projectionsπ, labeling function C, threshold on the maximum depth of the Decision Tree N, parameter K for k-fold cross validation method, threshold on the accuracy ACC th Output: The learned decision tree DT // Loop over the maximum depth of Decision Tree 1 for d← 1 to N do // Compute the cross validation accuracy for maxDepth= d 2 ACC d = k f oldCrossValidation(π,C(π),maxDepth= d,K); // Choose the max depth that gives the best cross validation accuracy 3 if ACC d > ACC th then // Train a Decision Tree with the chosen max depth 4 DT = f itDecisionTree(π,C(π),d); 5 return DT ; 6 return / 0; 7 Function k f oldCrossValidation(X,Y,maxDepth,K) // Shuffle the data 8 X,Y = Shu f f le(X,Y); // Devide the data into K subsets 9 X(1 : K),Y(1 : K)= DivideToKSubsets(X,Y); 10 sumACC= 0; // Train on K-1 subsets and test on 1 subset 11 for i← 1 to K do 12 X train ,Y train =[X(1 : i− 1),X(i+ 1 : K)],[Y(1 : i− 1),Y(i+ 1 : K)]; 13 X test ,Y test = X(i),Y(i); 14 DT = f itDecisionTree(X train ,Y train ,maxDepth); 15 ACC= predictDecisionTree(DT,X(i),Y(i)); 16 sumACC= sumACC+ ACC; // Return the average accuracy 17 return sumACC/K ; 92 Figure 5.3: Illustration of clustering on the BSS locations. (a)ϕ yellow =ϕ(τ 2 ,d 2 )∧¬ϕ pink ∧¬ϕ blue . (b) The result of applying decision tree algorithm on labeled parameter valuations shown in Fig. 5.1 5.3 Experimental Results We now present the results of applying the clustering techniques developed on four benchmarks: (1) BSS data from the city of Edinburgh [56] (running example) (2) COVID-19 data from Los Angeles County, USA [53] (3) Outdoor Air Quality data from California, and (4) a synthetic dataset for tracking movements of people in a food court building. All experiments were performed on an Intel Core-i7 Macbook Pro with 2.7 GHz processor and 16 GB RAM. We use an existing monitoring tool MoonLight [12] in Matlab for com- puting the robustness of STREL formulas. For Agglomerative Hierarchical Clustering and Decision Tree techniques we use scikit-learn library in Python and the Statistics and Machine Learning Toolbox in Matlab. A summary of the computational aspects of the results is provided in Table. 5.1. The numbers indicate that our methods scale to spatial models containing hundreds of locations, and still learn interpretable STREL formulas for clusters. 93 Case |L| |W| run-time (secs) K |ϕ cluster | BSS 61 91 681.78 3 2·|ϕ|+ 4 COVID-19 235 427 813.65 3 3·|ϕ|+ 4 Air Quality 107 60 136.02 8 5·|ϕ|+ 7 Food Court* 20 35 78.24 8 3·|ϕ|+ 4 Table 5.1: Summary of results. The run-time of our learning approach on different types of graphs for our different case studies is il- lustrated in Table. 5.2. We also represent information about the number of isolated nodes in each graph. We consider the run-time greater than 30 minutes as time-out. The results demonstrate that (∞,d hvrsn )- connectivity spatial model usually results in time-out because of the large number of edges. While(δ,d hvrsn )- connectivity spatial model results in a better run-time, it has the problem of isolated nodes or dis-connectivity. MST-spatial model neither results in time-out or dis-connectivity; however, it has the problem of overesti- mating distances between nodes. While our approach, (α,d hvrsn )-Enhanced MSG Spatial Model, has a slightly worse run-time compared to MST-spatial model, it improves the distance over-approximation issue. Case (∞,d hvrsn )− (δ,d hvrsn )− MST (α,d hvrsn )− connectivity connectivity Enhanced MSG BSS time-out, 0 934.41s, 17 519.30s, 0 681.78s, 0 COVID-19 time-out, 0 1007.44s, 75 600.89s, 0 813.65s, 0 Air Quality time-out, 0 111.36s, 46 119.94s, 0 136.02s, 0 Food Court 170.62s, 0 84.25s, 0 73.53s, 0 78.24s, 0 Table 5.2: Run time of the learning algorithm (seconds), number of isolated nodes in the spatial model . , . (threshold for time-out is set to 30 minutes). 94 5.3.1 BSS data from the city of Edinburgh Bike-Sharing System is an affordable and popular form of transportation that has been introduced to many cities such as the city of Edinburgh in recent years [56]. Each BSS consists of a number of bike stations, distributed over a geographic area, and each station has a fixed number of bike slots. The users can pick up a bike, use it for a while, and then return it to the same or another station in the area. It is important for a BSS to satisfy the demand of its users. For example, “within a short time, there should always be a bike and empty slot available in each station or its adjacent stations”. Otherwise, the user either has to wait a long time in the station or walk a long way to another far away station that has bike/slot availability. Understanding the behavior of a BSS can help users to decide on which station to use and also help operators to choose the optimal number of bikes/empty slots for each station. In this work, we use STREL formulas to understand the behavior of BSS. Particularly, we consider the BSS in the city of Edinburgh which consist of 61 stations (excluding the stations with more than 15% missing value). We construct the spatial model using the (α,d hvrsn )-Enhanced MSG approach with α = 2 resulting in a total of 91 edges in the spatial model. A BSS should have at least one of two important properties to satisfy the demand of users. The first property is: “a bike/empty slot should be available in the station within a short amount of time”. The second property is “if there is no bike/empty slot available in a station, there should be a bike/empty slot available in the nearby stations”. The first property can be described using PSTREL formula ϕ wait (τ)= (F [0,τ] Bikes≥ 1)∧(F [0,τ] Slots≥ 1), which means that at some time within the next τ seconds, there will be at least one bike/empty slot available in the station. Stations with small values of τ have short wait time which is desirable for the users, and stations with large values of τ have long wait times which is undesirable. For the second case, the users might prefer to walk a short distance to nearby stations instead of waiting a long time in the current station for a bike/slot to be available; this is related to the second important property of a BSS. The associated PSTREL formula for the second property is ϕ walk (d)=(3 [0,d] Bikes≥ 1)∧(3 [0,d] Slots≥ 1), which means that “at some station within the radius d of the current station, there is 95 at least one bike/empty slot available”. If the value of d for a station is large, this means that the user should walk a long way to far stations to find a bike/empty slot available. We combine the two PSTREL formulas intoϕ(τ,d)= G [0,3] {ϕ wait (τ)∨ϕ walk (d)}, which means that always within the next 3 hours at least one of the properties ϕ wait (τ) or ϕ walk (d) should hold. We try to learn the tight value of τ and d for each station. Stations with small values ofτ and d are desirable stations. We first apply Algo. 7 with the order d> P τ to learn a tight parameter valuations, and then apply our clustering followed by a Decision Tree algorithm to learn separating hyper-boxes for the clusters. The results are illustrated in Fig. 5.1a. Green points that have small values of τ and d are desirable stations. The orange point is associated with a station with a long wait time (around 35 minutes). The red points are the most undesirable stations as they have a long wait time and do not have nearby stations that have bike/empty slots availability. Fig. 5.1b shows the location of each station on map which confirms that the stations associated with the red points are far from other stations. That’s the reason for having a large value of d. The time that takes to learn the clusters is 681.78 seconds. Next, we try to learn an interpretable STREL formula for each cluster based on the monotonicity direction ofτ and d as follows: ϕ green =ϕ(17.09,2100)∧¬ϕ(0,2100)∧¬ϕ(17.09,0)=ϕ(17.09,2100) ϕ orange =ϕ(50,1000.98)∧¬ϕ(17.09,1000.98)∧ϕ(50,0)=ϕ(50,1000.98)∧ ¬ϕ(17.09,1000.98) ϕ red =ϕ(50,2100)∧¬ϕ(17.09,2100)∧¬ϕ(50,1000.98)=¬ϕ(17.09,2100)∧ ¬ϕ(50,1000.98) By replacingϕ(τ,d) with G [0,3] {ϕ wait (τ)∨ϕ walk (d)} in the above STREL formulas we get: 96 ϕ green = G [0,3] {ϕ wait (17.09)∨ϕ walk (2100)} ϕ orange = G [0,3] {ϕ wait (50)∨ϕ walk (1000.98)}∧ ¬G [0,3] {ϕ wait (17.09)∨ϕ walk (1000.98)} ϕ red =¬G [0,3] {ϕ wait (17.09)∨ϕ walk (2100)}∧ ¬G [0,3] {ϕ wait (50)∨ϕ walk (1000.98)} The intuition behind the learned STREL formula for the green cluster is that always within the next 3 hours the wait time for bike/slot availability is less than 17.09 minutes, or the walking distance to the nearby stations with bike/slot availability is less than 2100 meters. Fig. 5.1a illustrates that the actual walking distance for green points is less than or equal to 1000.98 meters, and the reason for learning 2100 meters is that the Decision Tree tries to learn robust and relaxed boundaries for each class. The STREL formula ϕ orange means that for the next 3 hours at least one of the propertiesϕ wait (50) orϕ walk (1000.98) holds for the orange stations. However, for a smaller wait time equal to 17.09 seconds, at least once in the next 3 hours, both the properties ϕ wait (17.09) and ϕ walk (1000.98) do not hold. The intuition behind the learned STREL formula for the orange stations is that, the orange stations have long wait time because they satisfy the property G [0,3] {ϕ wait (50)∨ϕ walk (1000.98)} and falsify the property G [0,3] {ϕ wait (17.09)∨ϕ walk (1000.98)}. The red points falsify the property G [0,3] {ϕ wait (17.09)∨ϕ walk (2100)} which is associated with a short wait time and the property G [0,3] {ϕ wait (50)∨ϕ walk (1000.98)} which is associated with the short walking distance. Therefore, the red points are the most undesirable stations. Comparison with Traditional ML approaches. To compare our framework with traditional ML approaches, we apply KMeans clustering from tslearn library [91] on the BSS data which uses Dynamic Time Warping (DTW) for similarity metric (tslearn is a library in Python for analysis of time-series data). The results of 97 (a) Clusters learned from BSS data (b) Clusters learned from BSS data (c) Regions in Edinburgh city asso- ciated with the learned clusters. Figure 5.4: Using KMeans approach from tslearn library with Dynamic Time Warping (DTW) metric to cluster BSS spatio-temporal traces. (a) Clusters learned from BSS data (b) Clusters learned from BSS data (c) Regions in Edinburgh city asso- ciated with the learned clusters. Figure 5.5: Using Kshape approach from tslearn library to cluster BSS spatio-temporal traces. clustering are represented in Fig. 5.4. We also apply k-Shape clustering from tslearn library on the same dataset, where the results are illustrated in Fig. 5.5. k-Shape is a partitional clustering algorithm that pre- serves the shapes of time series. While KMeans and k-Shape clustering approaches perform faster than our approach (both approaches run in less than 5 seconds), the artifacts trained by these approaches often lack interpretability despite our approach (for example, identifying stations with long wait time or far away stations). Furthermore, since our learning approach for each node/location is independent of other locations, it is possible to improve the run-time by parallelization. 5.3.2 COVID-19 data from LA County Understanding the spread pattern of COVID-19 in different areas is vital to stop the spread of the disease. While this example is not related to a software system, it is nevertheless a useful example to show the versatility of our approach to spatio-temporal data. The PSTREL formula ϕ(c,d)=3 [0,d] {F [0,τ] (x> c) 98 allows us to measure number of cases exceeding a threshold c withinτ = 10 days in a neighborhood of size d for a given location. We fix τ to 10 days and focus on learning the values of c and d for each location. Locations with small value of d and large value of c are unsafe as there is a large number of new positive cases within a small radius around them. We illustrate the clustering results in Fig. 5.6. Each location in Fig. 5.6a is associated with a geographic region in LA county (shown in Fig. 5.6c), and the red cluster corresponds to hot spots (small d and large c). Applying the DT classifier on the learned clusters (shown in Fig. 5.6a) produces 11 hyperboxes, some of which contain only a few points. Hence, we apply our DT pruning procedure to obtain the largest cluster that gives us at least 90% accuracy. Fig. 5.6b shows the results after pruning the Decision Tree. We learn the following formula: ϕ red =3 [0,4691.29] (F [0,10] (x> 3180))∨3 [0,15000] (F [0,10] (x> 5611.5)), This formula means that within 4691.29 meters from any red location, within 10 days, the number of new positive cases exceeds 3180. The COVID-19 data that we used is for September 2020. In Fig. 5.7, we show the results of STREL clustering for 3 different months in 2020, which confirms the rapid spread of the COVID-19 virus in LA county from April 2020 to September 2020. Furthermore, we can clearly see spread of the virus around the hot spots during the time, a further validation of our approach. 5.3.3 Outdoor Air Quality data from California We next consider Air Quality data from California gathered by the US Environmental Protection Agency (EPA). Among reported pollutants we focus on PM2.5 contaminant, and try to learn the patterns in the amount of PM2.5 in the air using STREL formulas. Consider a mobile sensing network consisting of UA Vs to monitor pollution, such a STREL formula could be used to characterize locations that need increased monitoring. 99 Figure 5.6: Procedure to learn STREL formulas from COVID-19 data. (a) The learned hyper-boxes before pruning the DT. (b) The learned hyper-boxes after pruning the DT. (c) Red-color points: hot spots. (a) Clusters learned from the COVID-19 data for April 2020. (b) Clusters learned from the COVID-19 data for June 2020. (c) Clusters learned from the COVID-19 data for September 2020. (d) Regions in LA county associ- ated with the learned clusters from the COVID-19 data for April 2020. (e) Regions in LA county associ- ated with the learned clusters from the COVID-19 data for June 2020. (f) Regions in LA county associ- ated with the learned clusters from the COVID-19 data for September 2020. Figure 5.7: Changing of clustering with respect to time for the COVID-19 data and PSTREL formula ϕ(c,d)=3 [0,d] {F [0,10] (x> c)}. The plots confirm the rapid spread of the COVID-19 virus in LA county from April 2020 to September 2020. We use the PSTREL formulaϕ(c,d) = G [0,10] (E [d,16000] (PM2.5< c)) and project each location in Cali- fornia to the parameter space of c,d. A locationℓ satisfies this property if it is always true within the next 10 days, that there exists a locationℓ ′ at a distance more than d, and a routeτ starting fromℓ and reachingℓ ′ such that all the locations in the route satisfy the property PM2.5< c. Hence, it might be possible to escape to a 100 Figure 5.8: Clustering experiments on the California Air Quality Data. (a) The learned Hyper-boxes from Air Quality data. (b) Red and orange points: high density of PM2.5. location at a distance greater than d always satisfying property PM2.5< c. The results are shown in Fig. 5.8. Cluster 8 is the best cluster as it has a small value of c and large value of d which means that there exists a long route from the locations in cluster 8 with low density of PM2.5. Cluster 3 is the worst as it has a large value of c and a small value of d. The formula for cluster 3 isϕ 3 =ϕ(500,0)∧¬ϕ(500,2500)∧¬ϕ(216,0). ϕ 3 holds in locations where, in the next 10 days, PM2.5 is always less than 500, but at least in 1 day PM2.5 reaches 216 and there is no safe route (i.e. locations along the route have PM2.5< 500) of length at least 2500. 5.3.4 Food Court Building Here, we consider a synthetic dataset that simulates movements of customers in different areas of a food court. Due to COVID-19 pandemic, we are interested in identifying crowded areas in the food court, to help managers of the food court to have a better arrangement of different facilities (for example, keeping the popular restaurants in the food court separated) to prevent crowded area. To synthesize the dataset, we divide the food court into 20 regions, considering one of the regions as entrance, and three of them as popular restaurants. We create the dataset using the following steps: (1) we make a simplifying assumption that all 101 of the customers enter the food court at time 0, which means that we set initial location of each customer as entrance (2) in every 10 minutes, we choose a random destination for each customer. The destination can be current location of the customer (in this case the customer does not go anywhere), one of the popular areas (we choose the probability as 0.8) or other areas of the food court. After choosing the destination for each customer, we simulate moving of the customer towards the destination with speed of 1.4m/s, which is the average speed of walking. To create the spatial model, we assume the centers of each of the 20 regions as a node and connect the 20 nodes using the(α,d hvrsn )-Enhanced MSG approach withα = 2 resulting in 35 edges in the graph. The PSTREL formulaϕ(c,d)=3 [0,d] {F [0,3] (numPeople> c)} means that somewhere within the radius d from a location, at least once in the next 3 hours, the number of people in the location exceeds the threshold c. Using this property, we can identify the crowded areas, and take necessary actions to mitigate the spread of COVID-19 in the Food Court area. We apply our method on the synthetic dataset that simulates the movements of customers in a Food Court, and try to learn the tight values of d and c for each location followed by clustering and our Decision Tree classification approach. The run-time of our learning approach is 78.24 seconds, and the results are illustrated in Fig. 5.9a. Cluster 4 which has a large value of c and a small value of d is associated with the most crowded area. We show the locations in the Food Court associated with each cluster in Fig. 5.9b. The results show a larger value of c for the entrance and popular restaurants which confirm that these area are more crowded compared to other locations in the Food Court. Next, we try to learn an STREL formula for cluster 4 (associated with the most crowded location) and cluster 5 (associated with the most empty location) as follows: 102 (a) Clusters learned from Food Court data. (b) Locations in the Food Court associated with the learned clusters. Figure 5.9: Results of the Food Court Building case study. ϕ 4 =ϕ(323,37.5)∧¬ϕ(323,10)∧¬ϕ(420,37.5)=ϕ(323,37.5)= 3 [0,37.5] {F [0,3] (numPeople> 323)} ϕ 5 =ϕ(0,70)∧¬ϕ(71.5,70)∧¬ϕ(0,37.5)=3 [0,70] {F [0,3] (numPeople> 0)}∧ □ [0,70] {G [0,3] (numPeople≤ 71.5)}∧□ [0,37.5] {G [0,3] (numPeople= 0)} The STREL formulaϕ 4 means that somewhere within the radius 37.5 meters from the location, at least once within the next 3 hours, the number of people exceeds 323 which shows a crowded area. The learned formula for cluster 5 means that for the next 3 hours there will be no one in the radius 37.5 from the location. However, between the radius 37.5 and 70 from the location, there will be some people but the number does not exceed 72. 5.4 Summary In this Chapter, we proposed a technique to learn interpretable STREL formulas from spatio-temporal time- series data for Spatially Distributed Systems. First, we introduced the notion of monotonicity for a PSTREL formula, proving the monotonicity of each of the spatial operators. We proposed a new method for creating a 103 spatial model with a restrict number of edges that preserves connectivity of the spatial model. We leveraged quantitative semantics of STREL combined with multi-dimensional bisection search to extract features for spatio-temporal time-series clustering. We applied Agglomerative Hierarchical clustering on the extracted features followed by a Decision Tree based approach to learn an interpretable STREL formula for each cluster. We then illustrated with a number of benchmarks how this technique could be used and the kinds of insights it can develop. The results show that while our method performs slower than traditional ML approaches, it is more interpretable and provides a better insight into the data. For future work, we will study extensions of this approach to supervised and active learning. 104 Chapter 6 Learning from Natural Language and Demonstrations using Signal Temporal Logic 6.1 Introduction Natural language is an easy and end-user friendly way for humans to convey their intended tasks to robots. However, natural language descriptions are usually ambiguous and can have multiple meanings. For exam- ple, “pick up the door key and open the door” implicitly carries urgency to a human if there is a fire in the room. However, a robot system needs to infer timing constraints, as well as underspecified information like which door when there are multiple and whether pausing between getting and using the key is acceptable. Signal Temporal Logic (STL) [64] has been used as a flexible, expressive, and unambiguous language to describe robotic tasks that involve time-series data and signals. For instance, STL can be used to formulate properties such as “the robot should immediately extinguish fires but can accept delays in opening doors.” From a grammar-based perspective, an STL formula can be viewed as atomic formulas combined with logical and temporal operators [71]. In this work, we match different components of a natural language de- scription with atoms and operators to form candidate STL formulas, and use dialogue with users to resolve ambiguities. 105 Figure 6.1: We infer an STL formula and optimal policy from a given natural language description, a few demonstrations, and questions to the user. Formalizing behavior using temporal logics such as STL require the user to specify the correct temporal logic specification [9, 81]—a difficult and error-prone task for untrained human users. Translating a sentence written in a natural and ambiguous language into a more general and concise formal language is an open challenge [39]. We propose an interactive and explainable approach, DIALOGUESTL. Fig. 6.1 shows the high-level flow of our framework, whose input consists of natural language description (NL) and a few demonstrations (demos) of a robotic task. After clarifying ambiguities with the user via dialogue, our framework produces the optimal policy as outcome. We demonstrate that our method is efficient and scalable, and note that in most cases the user has to provide only a few demonstrations—often only one—of a successful behavior for our framework to arrive at the correct STL formula using dialogue interactions. The remainder of the Chapter is structured as follows. We present our approache for learning PSTL formulas from natural language in Section. 6.2.1. Next, we present our technique for learning the best STL formula in Section. 6.2.2. Our method for learning optimal policies is illustrated in Section. 6.2.3. We present our experimental results in Section. 6.3. Finally, we provide a summary in Section. 6.4. 106 6.2 Methods 6.2.1 DIALOGUESTL: Learning PSTL Candidates In this section, we propose an explainable and interactive approach to learn candidate Parametric STL (PSTL) formulas from the natural language description of a task or constraints provided by user. The overall structure of our approach is shown in Algo. 10. As a running example, consider the command “turn on the lamp and pick up the cube” for the grid world environment shown in Fig. 2.1. The high-level view of our method for the running example is illustrated in Fig. 6.2, and the method for converting NL to candidate PSTL formulas is formalized in Algo. 10. The inputs of the algorithm consist of NL description of the task (taskNL), a few sample data for each atomic predicate (sampleAtoms) and operator (sampleOps) that we generate manually, and a thresholdε on the confidence of an atom predictor. We first generate a set of data for atom predictor (AtomPredictor) using a GPT-3 based paraphrase generator (GPT3ParaphGen) and learn a model for predicting likely atoms from individual verb phrases. We denote a verb phrase loosely as a group of words that contains a verb, such as “turn on the lamp”, “if fire is on”, and “open the door”. For a given NL description, verb phrases, conjunctions and adverbs are extracted and matched with atoms and operators, and candidate PSTL formulas with length in the range[l,u] are enumerated using predicted atoms and operators. We now explain each part of Algo. 10 step-by-step. Atom Predictor Data. We overcome the brittleness of a language-to-STL predictor based on hand-crafted grammar [39] using a state-of-the-art paraphrase generation tool for data augmentation over a small manual set. GPT-3 [17] is a large, pretrained language model able to generate natural language texts that can be hard to distinguish from human-authored content [25]. We prime GPT-3 with a few examples such as “hi” and “hello” to establish the paraphrase task. Then, we input manually generated verb phrases corresponding to STL atoms in our grid world. We check the paraphrase quality of the GPT-3 outputs manually. For the grid 107 Figure 6.2: We infer an STL formula ϕ best = F [0,15] (lampOn≥ 0∧ F [0,10] (itemOnRobot(purpleCube)≥ 0)) and an optimal policy from a given natural language description “Turn on the lamp and pick up the cube”, a demonstration (—), and interactions with the user. NL splitter extracts components of the Natural Language (i.e., “Turn on the lamp”, “and”, “pick up the cube”). Each component is mapped to an atom or operator using the Atom and Operator Predictors {“Turn on the lamp”:lampOn≥ 0, “and”:∧, “pick up the cube”:itemOnRobot≥ 0}. Next, candidate PSTL formulas are generated from the predicted atoms and operators. Asking questions from the user can help learn parameters of PSTL formulas and therefore, learning the best STL formula. Finally, Deep RL techniques are employed to learn an optimal policy from the learned STL formula. world demonstrated in Fig. 2.1, there are a total of 15 atoms, and we generate a total of 108 verb phrase samples for the 15 atoms. For example, our GPT-3-based paraphrase generator gets “Turn off the fire” as input and generates “Extinguish the fire” as one output paraphrase, and both can be paired with the atom fireOff as training data. Atom Predictor. Given a set of verb phrases and their corresponding atomic formulas from the afore- mentioned data generation, we learn a model that given a verb phrase can output the most similar atom to the verb phrase. We reserve 80% of the data generated by GPT-3 for training and 20% for validation. We use DIET [18], a lightweight transformer-based architecture that outperforms fine-tuning BERT [28] and is about six times faster to train. We trained DIET for 100 epochs, which took less than 1 minute and resulted in training accuracy of 100% and test accuracy of 92%. Operator Predictor. We compute an average word embedding for each operator. To do so, we map each operator with a few words in natural language that correspond to that operator. For example, the words “and” and “and then” correspond to∧ operator. The word embedding of each operator is computed as the 108 Algorithm 10: Natural Language to PSTL algorithm Input: taskNL,sampleAtoms,sampleOps,ε=0.5 Output: PSTLFormulas // Generate data for atom predictor 1 atoms := GPT3ParaphGen(sampleAtoms) 2 trainAtoms,testAtoms := TrainTestSplit(atoms) 3 AtomPredictor :=train(trainAtoms) // Train 4 Accuracy :=test(testAtoms) // Test // Avg. embedding for each operator 5 opEmbeddings := computeOpsBertEmbeddings(sampleOps) 6 taggedTokens := partOfSpeechTagger(taskNL) // Extract verb phrases, conjunctions and adverbs based on the tags 7 vPhrases,Conjs,Advbs := NLSplitter(taggedTokens) 8 forvPhrase∈vPhrases do // Find best atoms 9 atom,confidence := AtomPredictor(vPhrase) ifconfidence≤ ε then 10 vPhrase := ParaphrasedByUser(vPhrase) 11 atom,confidence := AtomPredictor(vPhrase) // Find best operators 12 ops := findBestOps( opEmbeddings,Conjs,Advbs) // Bounds on the length of PSTL formula 13 l,u := 2·| vPhrases|− 1,2·| vPhrases|+|Conjs|+|Advbs| // Enumerative, interactive PSTL synthesis 14 PSTLFormulas := genPSTLEnum(atoms,ops,l,u) average BERT embedding of the words corresponding to that operator. The computed word embedding will be used to match each conjunction or adverb with the most similar operator. Natural Language Splitting. To extract verb phrases, we first run a part-of-speech tagger, Flair. Flair [1] is a state-of-the-art tagger based on contextual string embeddings and neural character language models. We divide the language description based on the position of verbs, resulting in, for example, “turn on the lamp” and “pick up the cube”. We extract conjunctions from the words that connect the verb phrases, such as “and”. We also extract any adverbs, since, for example, “always” can correspond to the globally operator G. We try to match each verb phrase with an atom using the trained atom predictor and each conjunction or adverb with an operator using cosine similarity between the operators’ and words’ BERT embeddings. We always add F to the list of candidate operators because it is a commonly used operator * . For our running example, * Whenever the user want the robot to eventually do a task, the user naturally does not explicitly say any word that corresponds to eventually operator F. 109 we extract atoms=[lampOn≥ 0,itemOnRobot≥ 0] and operators=[∧,F]. If the verb phrase to atom correspondence confidence is low, we ask user to paraphrase the word sequence that has low confidence. Explainability. Next, the extracted verb phrases, conjunctions and adverbs are mapped to atoms and operators using the trained Atom and Operator Predictors. This results in an explanation dictionary that gives transparency into the decision of DIALOGUESTL. For our running example,{“Turn on the lamp”:lampOn≥ 0, “and”:∧, “pick up the cube”:itemOnRobot≥ 0} is the generated explanation dictionary. This dictionary can help us repair the tool if an incorrect STL formula is predicted. For example if the atom lampOn≥ 0 is incorrectly predicted as fireOn ≥ 0, we will need to improve the robustness of Atom Predictor using data augmentation or other robustness-improvement techniques. This is one of the advantages of our tool compared to DeepSTL, which is a fully back-box model. PSTL Generation. We must bound the lower and upper lengths of possible PSTL formulas to enumerate possibilities. We consider the lower bound as l=|vPhrases|+|vPhrases|− 1 because n verb phrases need n− 1 connectors. We consider u= 2·| vPhrases|+|Conj|+|Adv| as the length upper bound. We multiply |vPhrases| by 2 because each verb phrase can require an F operator. Each conjunction or adverb might also be converted to an operator. Next, we use systematic PSTL enumeration [71] to generate candidate PSTL formulas within the range l and u using the extracted atoms and operators in increasing order of their length. We remove enumerated PSTL formulas that do not contain all the atoms or contain more than one instance from each atom. Causal and Temporal Dependency. We shrink the space of candidate PSTL formulas using the idea of causal or temporal dependency between atoms. Atom 2 is causally dependent on Atom 1 if Atom 1 should happen before Atom 2 , which eliminates formulas such as F(Atom 1 )∧ F(Atom 2 ) and Atom 2 ∧ F(Atom 1 ). For our running example, we ask the user whether the tasks “turn on the lamp” and “pick up the cube” should run in sequence or not. Alternatively, we could ask user if “turn on the lamp and then pick up the cube” is acceptable or not. If the answer is “Yes”, this means that the atom itemOnRobot≥ 0 is causally 110 dependent on lampOn≥ 0, and hence, formulas F(lampOn≥ 0)∧ F(itemOnRobot≥ 0) and itemOnRobot≥ 0∧ F(lampOn≥ 0) would be omitted. For our running example, the reason for causal dependency is that if room is dark, it would be difficult for the robot to identify the cube and pick it up. From the 9 generated PSTL formulas 6 of them are removed resulting in 3 remaining candidate PSTL formulas. ϕ 1 = lampOn≥ 0∧ itemOnRobot≥ 0 ϕ 2 = lampOn≥ 0∧ F I (itemOnRobot≥ 0) ϕ 3 = F I (lampOn≥ 0∧ F I (itemOnRobot≥ 0)) 6.2.2 DIALOGUESTL: Selecting Correct STL We find the parameters of the PSTL candidates by searching in the initial language description and by interacting with the user (Algo. 11). We use 4 question types: order of tasks, atom parameters, operator parameters, and for paraphrasing a verb phrase. DIALOGUESTL takes positive and negative demonstrations as input. Asking the user to generate many demonstrations makes for a tedious interface. We use only a few demonstrations from the user and use them to automatically generate more negative demonstrations. To generate negative examples, we use the principle of “no excessive effort” [36], which means that any prefix of the demonstrated positive example is assumed insufficient and considered a negative example. Intuitively, if a prefix of d would already be a good example, then the user would not have given the full demo d. To convert each PSTL formula to its corresponding STL formula, all parameter values must be discov- ered. For instance, operator F needs a time interval and atom itemOnRobot≥ 0 requires the name of the item that should be picked up. We search for parameter values in NL description, and if any of the parameter 111 values cannot be found, we find it by interaction with user. We replace the parameter valuations in PSTL formulas which result in STL candidates. Finally, we choose the STL formula that satisfies positive demos and does not satisfy the negative demos. In our running example, among the three candidate STL formulas, ϕ 3 is chosen as the best STL formula. If no STL formula can be found, it may be that the language description was not correctly split into its components, or that one or more of the predicted atoms and operators were not correct. In the future, we can improve the splitting algorithm by learning from incorrect predictions, and use beam search in enumeration to move to the next high rank atom or operator when the first set fails. Algorithm 11: Learning the best STL formula Input: taskNL,atoms,ops,PSTLFormulas,Demos d Output: ϕ best // Generate more positive, negative demos 1 d + ,d − := generateMoreDemos(d) // Finding atoms’ parameters 2 foratom∈atoms do 3 atomParams := findAtomParams( taskNL,atom) 4 ifatomParams notfound then 5 atomParams := getParamsByInteractionWithUser() // Finding operators’ parameters 6 forop∈ops do 7 opParams := findOpParams( taskNL,op) 8 ifopParams notfound then 9 opParams := getParamsByInteractionWithUser() 10 forϕ(p)∈PSTLFormulas do // Set parameters of the PSTL formula 11 ϕ(v(p)) := setParams(ϕ(p),atomParams,opParams) // Check if the STL formula ϕ(v(p)) satisfies positive demos and does not satisfy negative demos 12 ifϕ(v(p))⊨ d + andϕ(v(p))⊭ d − then 13 ϕ best :=ϕ(v(p)) 14 returnϕ best 15 return / 0 112 Figure 6.3: The learned policy (—) from theϕ task = F [0,15] (lampOn∧ F [0,10] (itemOnRobot(purpleCube))). 6.2.3 Learning optimal policies Previous works have used the robustness of an STL formula, or the signed distance of a given trajectory from satisfying or violating a given formula as rewards to guide the RL algorithm [81, 9]. Here, we only provide an example of learning optimal policy from a given STL formula using those existing techniques. We use Deep Q-Learning [68] because of scalability to environments with a large number of states. In our grid world environment (Fig. 2.1), there are more than 8 billion states. † The algorithm takes the STL specification of the task ϕ task , the goal state and the hyper-parameters (i.e., M, C,γ) as input, and generates the optimal policy that respectsϕ task as output. The main RL loop runs for |episodes|. In each episode, first, the state is set to initial state and the partial trajectory is set to / 0. While the robot has not reached the final state or maximum number of states is not reached, the robot explores the grid environment and the reward is computed as robustness of the partial trajectory with respect to ϕ task . The robot experiences are recorded in replay memory to be used later for training the Q network. Whenever the replay buffer size exceeds M, we start training the Q network using the bellman equation. We update the weights of target action-value function ˆ Q with the weights of Q in every C episodes. For our running example, withϕ task = F [0,15] (lampOn≥ 0∧ F [0,10] (itemOnRobot(purpleCube)≥ 0)), the reward converges in less than 15000 episodes, and the learned policy is illustrated in Fig. 6.3. † Each state is a tuple of 16 elements consist of robot and each of the items’ (door key, green and purple cube) positions, state of the lamp and fire (on or off), and state of the door (open or close). 113 Type Pre-paraphrase #Ds Most Frequent Correctness Natural Language STL Prediction (Yes/No) C Always don’t hit into walls. 2 ¬(F [0,1000] (robotAtWall)) Yes Always do not walk into water. 2 ¬(F [0,1500] (robotAtWater)) Yes S Pick up the purple cube. 1 F [0,12] (itemOnRobot(purpleCube)) Yes Turn off the fire. 1 F [0,5] (fireOff ) Yes Q Open the door and then charge yourself. 1 F [0,8] (doorOpen∧ F [0,6] (chargerPlugged)) Yes Go to location (7, 4) and pick up the green cube. 1 F [0,10] (robotAt(7,4)∧ F [0,4] (itemOnRobot(greenCube)) Yes Turn on the lamp before picking up the purple cube. 1 F [0,12] (lampOn U [0,8] itemOnRobot(purpleCube)) Yes Open the gate before picking up the green cube. 1 F [0,8] (doorOpened U [0,5] itemOnRobot(greenCube)) Yes M Turn on the lamp or turn on the fire. 2 F [0,12] (lampOn∨ fireOn ) Yes Sit on the chair or pick up the purple cube. 2 F [0,15] (robotSittingOnChair∨ itemOnRobot(purpleCube)) Yes D If gate is open, close it. 1 F [0,10] (doorOpen =⇒ doorClosed) No If fire is on, turn off the fire, else pick up the key. 1 F [0,10] ((fireOff =⇒ fireOn ) =⇒ itemOnRobot) No Table 6.1: DIALOGUESTL results on sample natural language inputs across 142 GPT-3 paraphrases of the inputs for fixed user demonstrations per row (#Ds). We report the most frequent STL prediction and whether it is correct or not. The task types include (C)onstraint, (s)ingle, se(Q)uence, (M)ultiple-choice and con(D)itional. Note that “≥ 0 ′′ is removed from all atoms for brevity. 6.3 Experimental Results To evaluate DIALOGUESTL, we qualitatively and quantitatively examine its performance in our gridworld (Fig. 2.1) on natural language specifications whose underlying STL formulas exhibit various qualities. The software to reproduce the results is available at GitHub ‡ . Table 6.1 shows a set of manually curated natural language sentences describing a variety of tasks and user constraints paired with the most frequent STL formulas ultimately predicted by DIALOGUESTL. We use the GPT-3 based paraphrase generator to generate 142 paraphrases each for the sample sentences in Table 6.1. We also design an Oracle user to interact with DIALOGUESTL to answer posed interaction questions. The Oracle user is a simple, rule-based program that provides the correct answer to any given question about the true, underlying STL formula. Table 6.1 and Table 6.2 show the source language input before paraphrasing, the number of provided demonstrations, and the DIALOGUESTL average number of enumerated formulas, user interactions, success rate, and run-time. § ‡ https://github.com/saramohammadinejad/DialogueSTL § We run the experiments on an Intel Core-i7 Macbook Pro with 2.7 GHz processors and 16 GB RAM. 114 Type Pre-paraphrase #EEs #UIs SR RT Natural Language C Always don’t hit into walls. 7.81 1.0 90% 5.36 Always do not walk into water. 12.72 1.0 90% 3.94 S Pick up the purple cube. 2.0 1.09 100% 3.71 Turn off the fire. 2.0 1.0 90% 3.53 Q Open the door and then charge yourself. 3.0 3.18 100% 4.12 Go to location (7, 4) and pick up the green cube. 4.16 2.5 58% 4.13 Turn on the lamp before picking up the purple cube. 11.8 2.2 100% 9.50 Open the gate before picking up the green cube. 10.44 2.11 88% 8.56 M Turn on the lamp or turn on the fire. 7.0 1.0 100% 4.65 Sit on the chair or pick up the purple cube. 7.14 1.14 100% 6.65 D If gate is open, close it. 32.44 1.0 0% 6.68 If fire is on, turn off the fire, else pick up the key. 1253.33 1.16 0% 67.68 Table 6.2: DIALOGUESTL performance on sample natural language inputs across 142 GPT-3 paraphrases of the inputs for fixed user demonstrations per row. We report the average number of enumerated formulas (#EFs), average user interactions (#UIs) to select a final formula, success rate (SR) of finding the exact match correct formula, and average runtime in seconds(RT). The task types include (C)onstraint, (s)ingle, se(Q)uence, (M)ultiple-choice and con(D)itional. 6.3.1 Results Across Description Types User Constraints. A human (Oracle) provides a general constraint for a task such as “Always don’t hit into walls”, one positive demonstration that satisfies the constraint and a negative one that does not satisfy or violates it. The human user is asked “For how many seconds do you want the constraint to be satisfied?”, and answers “1000 seconds”, and so the best STL formula is predicted asϕ =¬F [0,1000] (robotAtWall≥ 0): “In the next 1000 seconds, the robot should not run into walls”. Single Tasks. A human user provides a single task, such as “Pick up the purple cube”, and a pos- itive demonstration of the task. Negative examples are generated from this positive example based on the principle of “no excessive effort”. The user is asked about timing requirements, such as: “In how many seconds, should the robot complete the task?”, and in this case answers “12 seconds”. The formula ϕ = F [0,12] (itemOnRobot(PurpleCube)≥ 0) is predicted, which means “In the next 12 seconds, the purple cube should be picked up by robot”. 115 Sequence of Tasks. A sequential task, such as “Go to location (7, 4) and pick up the green cube”, requires the robot to do one thing before another—a temporal dependency. The STL formulas that do not give such guarantee can be eliminated from the candidate formulas, and in this case DIALOGUESTL predicts: F [0,10] (robotAt(7,4)≥ 0∧ F [0,4] (itemOnRobot(greenCube)≥ 0)), which means “In the next 10 seconds, the robot should reach to location (7, 4) and after robot reaches (7, 4), it should pick up the green cube in the next 4 seconds”. Another example of a sequential task is “Turn on the lamp before picking up the purple cube”. The word “before” implies the temporal dependency, and the formula predicted is: F [0,12] ((lampOn≥ 0)U [0,8] (itemOnRobot(purpleCube)≥ 0)), which means “Turn on the lamp” happens in the past of “pick up the purple cube”. Multiple-choice Tasks. “Turn on the lamp or turn on the fire” means that the robot is required to complete at least one of the two tasks. In such cases, the user can provide two positive demonstrations showcasing the alternative goals. Conditional Tasks. “If gate is open, close it” is an example of a conditional task; the robot should accom- plish a task only if a condition is satisfied. For conditional tasks, our tool fails to predict the correct STL formulas. 6.3.2 Comparison with DeepSTL The main differences between DIALOGUESTL and DeepSTL are: 116 • DIALOGUESTL splits the formula and learns from its components but DeepSTL learns from the entire formula. • DIALOGUESTL needs a few demonstrations for finding the best STL formula, but DeepSTL only needs NL description of the task. • DeepSTL is a fully black-box model but DialogueSTL generates explanation dictionaries. Here, we show that DIALOGUESTL outperforms DeepSTL in terms of training runtime and accuracy at the cost of increased test runtime. DIALOGUESTL’s training data is a total of 108 verb phrases for 15 atoms and 18 adverbs and conjunctions for the 7 operators. Since DeepSTL needs complete sentences for training, we use an enumerative approach by systematically applying production rules to enlist valid sentences from the DIALOGUESTL’s training data which results in 250000 sentences with their corresponding STL formu- las ¶ . We randomly sample 120000 instances (the dataset size used in DeepSTL paper), and split it to 80% for training, 10% for validation, and 10% for testing. We train the DeepSTL tool for 60 epochs on our generated training data with exact same transformer structure and hyper-parameters used in the DeepSTL paper. We also tried different sets of hyper-parameters but the default hyper-parameters used in the DeepSTL paper resulted in the best accuracy. The training process takes 23.64 hours resulting in the train, validation and test accuracy of 97.7%, 97.6% and 66.0%, respectively. It is possible to decrease the training time to several hours by using GPUs but still the training time is not comparable with DIALOGUESTL’s which is less than 1 minute. Next, we test the trained model on GPT-3 generated paraphrases to make a comparison with our tool. We use two metrics for comparing the performances: (1) success rate (SR) which is the percentage of correctly predicted STL formulas (2) accuracy (ACC) which measures how similar the predicted STL formula is to ground truth STL formula. DIALOGUESTL outperforms DeepSTL in both SR and ACC with the cost of increased test time. The detailed results are presented in Table 6.3. The reason for large test time is ¶ The code to generate training data for DeepSTL is available at https://github.com/saramohammadinejad/DialogueSTL. 117 Tool Avg SR Avg ACC Avg test time (seconds) DialogueSTL 72% 78% 5.9 DeepSTL 20% 54% 0.17 Table 6.3: Success rate (SR) and accuracy (ACC) comparison of DialogueSTL and DeepSTL on natural language inputs across 142 GPT-3 generated paraphrases. that DIALOGUESTL only learns from the components of formulas during training, and the formula itself should be learned during testing. Splitting the formula to its components, learning the formula parameters, enumerating candidate STL formulas and choosing the one that satisfies user’s demonstrations are the steps that happen during testing. 6.4 Summary In this Chapter, we proposed an interactive approach to learn STL formulas from natural language descrip- tion of robotic tasks and demonstrations, DIALOGUESTL. We use part-of-speech tagging to extract sentence components. Then, we generate data automatically with user-in-the-loop using the GPT-3 language model. Transformers are used for detecting the best atoms and operators for generating candidate PSTL formulas. Finally, demonstrations provided by the user can help shrink the space of possible STL formulas and learn the best STL formula for a given task. Our tool has a number of advantages compared to previous works such as addressing ambiguity of natural language by interaction with user, considering the space of all STL formulas, explainability, and performance. 118 Chapter 7 Related Work In this chapter, we review related work. We begin with the works related to learning from timed traces including tradition ML approaches, techniques based on shape expressions, automata learning, and STL- based learning. Then, we review relevant work on assumption mining for CPS, i.e., requirement mining and falsification-based approaches. Finally, we elaborate on work related to learning from spatio-temporal data and natural language. 7.1 Related Work on learning from time-series data Time-series learning is a popular area in the domain of machine learning and data mining. Some techniques for time-series learning combine techniques such as k-nearest neighbours [22, 103, 21], support vector machines (SVM) [87, 20, 86], KMeans [42], Hierarchical Clustering and agglomerative clustering [24] with similarity metrics between time-series data such as the Euclidean distance, dynamic time-warping (DTW) distance, and statistical measures that include but are not limited to mean, median and correlation. Some recent works such as the works on shapelets automatically identify distinguishing shapes in the time-series data [104, 38, 105]. Such shapelets serve as features for ML tasks and are human-interpretable. Model- based approaches such as naïve Bayes [78, 95] or Hidden Markov Models [77, 61, 100] investigate an underlying generative model for the time-series data. Deep learning approaches for time-series classification 119 have recently received lots of attentions [43, 52, 44]. For example, Recurrent Neural Networks (RNNs) are specifically designed for learning from sequential data and have applications in a variety of domains such as speech recognition, machine translation and natural language processing. While RNNs achieve a good accuracy compared to the state-of-the-art techniques for time-series classification, the artifacts trained by these approaches often lack interpretability. Furthermore, most of these approaches are based on shape- similarity which might be useful in some applications; however, for applications that the user is interested in mining temporal information from data, dissimilar traces might be clustered in the same group [96]. The work by Nickovic et al. [75] proposes shape expressions as a formalism for extracting rich temporal patterns from time-series data. Then, a pattern matching technique based on automata learning is intro- duced to recognize signals that are close to a specified shape. Shape expressions are regular expressions over parametrized signal shapes, such as linear, exponential or sine segments, and with additional param- eter constraints. Learning shape expressions is partially motivated by the concept of shapelets. However, shape expressions provide a more supervised feature extraction mechanism as apposed to shapelets that are extracted from unlabelled data. Timed automata (automata equipped with clocks) have been used extensively as a formalism for reason- ing about quantitative temporal aspects of systems [5]. Timed automata is equivalent in expressive power to timed regular expressions (TRE) [6] which are popular formalism for specifying properties of CPS behav- iors. TRE is an extension of regular expressions for specifying sets of dense-time discrete-valued signals. [93] proposes an offline algorithm for matching TREs, and [94] extends this work to online pattern matching. Matching techniques for TREs based on automata have been introduced in [98, 99]. There has been considerable recent work on learning STL formulas from data for various applications such as supervised learning [16, 54, 74, 37], clustering [96, 97], or anomaly detection [50]. In [54], a frag- ment of PSTL (rPSTL or reactive parametric signal temporal logic) is defined to capture causal relationships from data. However, there are some temporal properties namely, concurrent eventuality and nested always 120 eventually that cannot be described directly in rPSTL. In [50], the authors extend [54] by using a fragment of rPSTL, inference parametric STL (iPSTL), that does not require a causal structure. In this work, classical ML algorithms (one-class support vector machines) are applied for unsupervised learning problem. In [16], a decision tree based method is employed to learn STL formulas, which creates a map between a restricted fragment of STL and a binary decision tree in order to build an STL classifier. While this seminal work has advanced work in the intersection of formal methods and machine learning, one disadvantage of these approaches is that they lead to long formulas which can become an issue for interpretability. The work by Nenzi et al. [74] uses genetic algorithms to learn the STL formula structure, where longer formulas are obtained in a new generation by combining formulas from a previous generation. While this procedure is able to learn interesting formulas, a key difference is that it does not guarantee that the shortest formula will be learned, and does not check for equivalent formulas (beyond what is encoded by the fitness function of the GP algorithm). The closest work to our approach appears in [37], where the authors propose a similar systematic enu- meration of all parametric formulas. In this recent work, the authors discuss the problem of online moni- toring and formula synthesis for a past fragment of STL. However, the proposed work in [37] does not use signature-based optimization to prevent enumerating repeated formulas; this is one of the key contributions of our work. Furthermore, [37] assumes a past-only fragment of STL, while our technique allows future op- erators. Finally, [37] does not provide details on how parameter valuations are explored, except for language indicating a grid-based parameter space exploration. We use a systematic and efficient procedure based on validity domain boundary or decision tree to do this. Especially relevant to our decision tree based approach is the seminal work in [16], where the authors propose learning the structure and parameters of STL formulas using decision trees. In contrast to our technique where there is a single STL formula used throughout the decision tree, in [16], each node is associated with a primitive PSTL formula. The technique then makes use of impurity measures to rank the 121 primitives according to how accurately they label the set of traces (compared to ground truth). The primitives come from a fragment of PSTL containing formulas with only top-level F, G, FG or GF operators. We observe that the generated STL formulas in this approach can become long and complicated, especially because each node in the decision tree can potentially be a different STL formula. The decision trees produced by this method lead to formulas that splice together local deductions over traces together into a bigger formula. In template-based techniques, a fixed PSTL template is provided by the user, and the techniques only learn the values of parameters associated with the PSTL. In [96], a total ordering on parameter space of PSTL specifications is utilized as feature vectors for learning logical specifications. Unfortunately, recognizing the best total ordering is not straightforward for users. In [97], the authors eliminate this additional burden on the user by suggesting a method that maps timed traces to a surface in the parameter space of the formula, and then employing these curves as features. In [49], the input to the algorithm is a requirement template expressed in PSTL, where the traces are actively generated from a model of the system. TeLEx [47] is a novel technique that addresses the problem of learning STL formulas from just positive example traces. A novel technique is proposed to automatically learn the structure of the STL formula by incrementally constructing a more complex formula guided by the robustness metric of its subformula. However, in spite of systematic enumeration technique, this method does not guarantee to learn the simplest STL formulas. Our proposed technique, which uses systematic enumeration, can produce smaller formulas which may be more human-interpretable, and with higher accuracy (≥ 95% in all investigated case studies). 7.2 Related Work on mining environment assumptions for CPS In [49, 23, 41], the authors address the problem of mining (output) requirements. Here, they assume that the structure of the PSTL formula representing an output requirement is provided by the user. The technique 122 then uses counterexample guided inductive synthesis to infer formula parameters that result in an STL formula that is satisfied by all observed model outputs. Key differences from this method are: (1) we are interested in mining environment assumptions and not output requirements, (2) we use a supervised learning procedure that separates input traces that lead to outputs satisfying/violating an output requirement. The work in [62] focuses on mining temporal patterns from traces of digital circuits, and uses automata-based ideas that are quite different from the work presented here. The seminal work proposed by Ferrère et al. [34] extends STL with support to define input-output interfaces. A new measure of input vacuity and output robustness is defined that better reflects the nature of the system and the specification intent. Then, the robustness computation is adapted to exploit the input- output interface and consequently provide deeper insights into the system behaviors. The connection of this work with our technique is that we also look at input-output relations using STL specifications. Our method mines an STL formula on inputs that guarantees satisfaction of a desired requirement on outputs. It would be interesting to extend the methods developed in this work to the problem of mining interface-aware STL requirements. The latter is more expressive as it includes predicates that combine input and output variables, which our current work does not address. In [29], the authors analyze the falsifying traces of a cyber-physical system. Concretely, they seek to understand the properties or parts of the inputs to a system model that results in a counterexample using sensitivity analysis. They use learning methods (such as statistical hypothesis testing) from repeated simu- lations for the system under test. Tornado diagrams are used to find the values till no violation occurs while SMT solvers are used to find the falsifying intervals. Our work can be used to solve a similar problem, by basically mining environment assumptions for¬ϕ out for a given output requirement. A key difference in our technique is that we seek to explain falsifying input traces using an STL formula, while the work in [29] formulates explanations directly in terms of the input traces. 123 7.3 Related Work on learning from spatio-temporal data There is considerable amount of recent work such as [11, 12] on monitoring spatio-temporal properties. For instance, the work in [11] introduces the spatio-temporal logic STREL that enables monitoring of spatio- temporal properties over the execution of spatially distributed CPS. Particularly, MoonLight [12] is a recent tool for monitoring of STREL properties, and in our current work, we use MoonLight for computing the robustness of spatio-temporal data with respect to STREL formulas. MoonLight uses (δ,d)-connectivity approach for creating a spatial model, which has several issues, including dis-connectivity and distance overestimation. We resolve these issues by proposing our new method for creating the spatial graph, which we call Enhanced MSG. While there are many works on monitoring of spatio-temporal logic, to the best of our knowledge, there is no work on automatically inferring spatio-temporal logic formulas from data that we address in this work. 7.4 Related Work on learning from natural language Prior work has used STL for reinforcement learning applications. Quantitative semantics of STL can be used as reward functions for evaluating robotic behaviors [9]. STL formulas can be used to rank the quality of demonstrations in robotic domain and also computing the reward for RL problems [81]. However, those works put the burden of specifying the correct STL formulas on users, and can require 3X more demonstra- tions than our work (DIALOGUESTL) despite using a similar environment [81]. There has been a tremendous effort in learning temporal logic languages from natural human lan- guages [36, 39, 72, 82, 57, 8]. These works variously assume a particular format for natural language, are limited to a specific fragment of formal logic, or have scalability and robustness issues. In particular, we explain the limitations of [36] and [39]. 124 Seminal works in [36] considers the problem of interactive synthesis of robot policies from natural language specifications. The authors consider similar environments as ours but expect specifications to be provided in structured English that is then translated to LTL formulas. The main objective is to explore the space of specifications using user demonstrations and constraint-based methods to identify the precise specifications, which are then used to synthesize robot policies using reactive synthesis methods. In our work, the emphasis is on directly learning the structure of the task objective using modern natural language processing tools and employ a dialogue-based method to refine the specification. Furthermore, we use recently developed RL methods to learn optimal policies from STL instead of LTL synthesis approaches. There is only a basic keyword search in natural language description in [36] versus our work that uses word embedding of natural language which is a more robust approach. Another limitation of [36] is that nested temporal operators are not allowed. DeepSTL [39], which is state-of-the-art for translating informal English requirements to STL formulas, trains a sequence-to-sequence transformer on synthetic data. DeepSTL is a black-box approach but our framework generates explanation dictionaries that help understand predictions of the model. Furthermore, our tool outperforms DeepSTL in terms of accuracy and training time. Our work addresses these shortcomings by operating over the space of all possible STL formulas, lever- aging interaction with the user to repair ambiguities, providing explanations to the users, and scaling to a larger state space. 125 Chapter 8 Conclusion and Future Work The primary contributions of this dissertation are outlined in this chapter, along with several promising areas for further research. 8.1 Summary Traditional ML techniques for learning from sequential data do not result in human-interpretable models. This dissertation focuses on learning interpretable logical abstractions from sequential data including time- series data, spatio-temporal data, robot trajectories, and natural language. We proposed four cutting-edge methods towards this end. In Chapter 3, we proposed a novel approach for multi-class classification of time-series data using Signal Temporal Logic formulas. The key idea is to combine an algorithm for systematic enumeration of PSTL formulas with an algorithm for estimation of the satisfaction boundary of the enumerated PSTL formula. We also investigate an optimization using formula signatures to avoid enumerating equivalent PSTL formulas. We then illustrate this technique with a number of case studies on real world data from different domains. The results show that the enumerative solver has a number of advantages compared to existing approaches. We also performed a user study to measure the degree that STL formulas are understandable to humans. 126 In Chapter 4, we addressed the problem of mining environment assumptions for CPS components and representing them using Signal Temporal Logic. An input trace satisfying an environment assumption hope- fully guarantees to produce an output that meets the component requirement. We use a counter-example guided procedure that systematically enumerates parametric STL formulas and uses a decision tree based classification procedure to learn both the structure and precise numeric constants of an STL formula repre- senting the environment assumption. We demonstrate our technique on a few benchmark CPS models. In Chapter 5, we proposed a technique to learn interpretable STREL formulas from spatio-temporal time-series data for Spatially Distributed Systems. We proposed a new method for creating a spatial model that preserves connectivity of the spatial model. We leveraged a multi-dimensional bisection search to ex- tract features for spatio-temporal time-series clustering. We applied existing ML techniques on the extracted features to cluster the data and learn an interpretable STREL formula for each cluster. We then illustrated with a number of benchmarks that while our method performs slower than traditional ML approaches, it is more interpretable and provides a better insight into the data. In Chapter 6, we proposed an interactive and explainable approach, DIALOGUESTL, to learn STL for- mulas from natural language descriptions of robotic tasks and demonstrations. We used part-of-speech tagging to extract sentence components and used the GPT-3 language model to generate data automatically given a small sample of manually generated data. Then, transformer models are used for detecting the best atoms and operators for generating candidate PSTL formulas. Finally, demonstrations provided by the user can help learn the best STL formula for a given task. Our tool has a number of advantages compared to previous works such as addressing ambiguity of natural language by interaction with user, considering the space of all STL formulas, and explainability. 8.2 Future work Here, we outline possible extensions of our work. 127 8.2.1 Learning from time-series data As future work, we will extend this approach to unsupervised and semi-supervised learning. We will also investigate other optimization techniques to make the enumerative solver faster and more memory-efficient. Our current approach only uses time-domain of the data. However, in some applications, frequency domain might contain more valuable information compared to time-domain. In this case, using the information from frequency domain can result in learning shorter and hence more interpretable formulas. While enumerative learning could potentially serve as a template-free learning method, a criticism of enumerative learning is that it is a brute-force method that does not impose any structure on the space of formulas. We plan to explore techniques based on automata learning to more efficiently search over the space of formulas. 8.2.2 Learning from spatio-temporal data In this work, we assume that a template of logical abstraction is provided by user, which is limiting and error-prone specially for untrained users. In future, we wish to explore template-free techniques for learning from spatio-temporal data. We will also investigate extensions of this approach to supervised and active learning as part of future work. 8.2.3 Learning from natural language DIALOGUESTL is a concrete first step towards translating natural language descriptions to STL formulas by leveraging dialogue-based clarifications. There are exciting, immediate avenues for improving and building on DIALOGUESTL. For the conditional task “If fire is on, turn off the fire, else pick up the key”, D IALOGUESTL fails to pre- dict a satisfying formula. The ground truth STL formula for this sentence is G((fireOn ≥ 0 =⇒ F(fireOff ≥ 0))∧(fireOff ≥ 0 =⇒ F(itemOnRobot(key)≥ 0)). DIALOGUESTL fails to discover this formula because there are no key words in the sentence to imply the G and ∧ operators. In the future, gathering such failure 128 cases to create augmented data may fill such gaps. Further, user questions could be designed to recover from such missing operator corner cases. Relatedly, for the task “if fire is off, turn it on”, D IALOGUESTL incor- rectly predicts F [0,8] (fireOff ≥ 0 =⇒ lampOn≥ 0). The atom lampOn≥ 0 is selected instead of fireOn ≥ 0 because “turn it on” does not explicitly say “turn the fire on”. Co-reference resolution steps may mitigate this issue. 129 Bibliography [1] Alan Akbik, Tanja Bergmann, Duncan Blythe, Kashif Rasul, Stefan Schweter, and Roland V ollgraf. “FLAIR: An easy-to-use framework for state-of-the-art NLP”. In: NAACL 2019, 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations). 2019, pp. 54–59. [2] Rajeev Alur, Rastislav Bodik, Garvit Juniwal, Milo MK Martin, Mukund Raghothaman, Sanjit A Seshia, Rishabh Singh, Armando Solar-Lezama, Emina Torlak, and Abhishek Udupa. “Syntax-guided synthesis”. In: 2013 Formal Methods in Computer-Aided Design. IEEE. 2013, pp. 1–8. [3] Rajeev Alur and Thomas A Henzinger. “A really temporal logic”. In: Journal of the ACM (JACM) 41.1 (1994), pp. 181–203. [4] Yashwanth Annapureddy, Che Liu, Georgios E Fainekos, and Sriram Sankaranarayanan. “S-TaLiRo: A Tool for Temporal Logic Falsification for Hybrid Systems.” In: TACAS. V ol. 6605. Springer. 2011, pp. 254–257. [5] Eugene Asarin, Paul Caspi, and Oded Maler. “A Kleene theorem for timed automata”. In: Proceedings of Twelfth Annual IEEE Symposium on Logic in Computer Science. IEEE. 1997, pp. 160–171. [6] Eugene Asarin, Paul Caspi, and Oded Maler. “Timed regular expressions”. In: Journal of the ACM 49.2 (2002), pp. 172–206. [7] Eugene Asarin, Alexandre Donzé, Oded Maler, and Dejan Nickovic. “Parametric identification of temporal properties”. In: International Conference on Runtime Verification . Springer. 2011, pp. 147–160. [8] Marco Autili, Lars Grunske, Markus Lumpe, Patrizio Pelliccione, and Antony Tang. “Aligning qualitative, real-time, and probabilistic property specification patterns using a structured english grammar”. In: IEEE Transactions on Software Engineering 41.7 (2015), pp. 620–638. [9] Anand Balakrishnan and Jyotirmoy V Deshmukh. “Structured Reward Shaping using Signal Temporal Logic specifications”. In: International Conference on Intelligent Robots and Systems (IROS) (2019). 130 [10] Ayca Balkan, Paulo Tabuada, Jyotirmoy V Deshmukh, Xiaoqing Jin, and James Kapinski. “Underminer: A framework for automatically identifying nonconverging behaviors in black-box system models”. In: ACM Transactions on Embedded Computing Systems (TECS) 17.1 (2018), p. 20. [11] Ezio Bartocci, Luca Bortolussi, Michele Loreti, and Laura Nenzi. “Monitoring mobile and spatially distributed cyber-physical systems”. In: Proc. of MEMOCODE. 2017. [12] Ezio Bartocci, Luca Bortolussi, Michele Loreti, Laura Nenzi, and Simone Silvetti. “Moonlight: A lightweight tool for monitoring spatio-temporal properties”. In: Proc. of RV. 2020. [13] Ezio Bartocci, Luca Bortolussi, and Guido Sanguinetti. “Learning temporal logical properties discriminating ECG models of cardiac arrhytmias”. In: arXiv preprint arXiv:1312.7523 (2013). [14] Marcello Maria Bersani, Matteo Rossi, and Pierluigi San Pietro. “Deciding the satisfiability of MITL specifications”. In: arXiv preprint arXiv:1307.4469 (2013). [15] Or Biran and Courtenay Cotton. “Explanation and justification in machine learning: A survey”. In: IJCAI-17 workshop on explainable AI (XAI). V ol. 8. 1. 2017, pp. 8–13. [16] Giuseppe Bombara, Cristian-Ioan Vasile, Francisco Penedo, Hirotoshi Yasuoka, and Calin Belta. “A decision tree approach to data classification using signal temporal logic”. In: Proceedings of the 19th International Conference on Hybrid Systems: Computation and Control. ACM. 2016, pp. 1–10. [17] Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. “Language models are few-shot learners”. In: arXiv preprint arXiv:2005.14165 (2020). [18] Tanja Bunk, Daksh Varshneya, Vladimir Vlasov, and Alan Nichol. “Diet: Lightweight language understanding for dialogue systems”. In: arXiv preprint arXiv:2004.09936 (2020). [19] Fraser Cameron, Georgios Fainekos, David M Maahs, and Sriram Sankaranarayanan. “Towards a verified artificial pancreas: Challenges and solutions for runtime verification”. In: Runtime Verification . Springer. 2015, pp. 3–17. [20] Li-Juan Cao and Francis Eng Hock Tay. “Support vector machine with adaptive parameters in financial time series forecasting”. In: IEEE Transactions on neural networks 14.6 (2003), pp. 1506–1518. [21] Nina Marie Caraway, James Lucian McCreight, and Balaji Rajagopalan. “Multisite stochastic weather generation using cluster analysis and k-nearest neighbor time series resampling”. In: Journal of hydrology 508 (2014), pp. 197–213. [22] Wanpracha Art Chaovalitwongse, Ya-Ju Fan, and Rajesh C Sachdeo. “On the time series k-nearest neighbor classification of abnormal brain activity”. In: IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 37.6 (2007), pp. 1005–1016. 131 [23] Gang Chen, Zachary Sabato, and Zhaodan Kong. “Active learning based requirement mining for cyber-physical systems”. In: 2016 IEEE 55th Conference on Decision and Control (CDC). IEEE. 2016, pp. 4586–4593. [24] Germán Cobo, David Garcıa-Solórzano, Eugènia Santamarıa, Jose Antonio Morán, Javier Melenchón, and Carlos Monzo. “Modeling students’ activity in online discussion forums: a strategy based on time series and agglomerative hierarchical clustering”. In: Educational data mining. 2010. [25] Robert Dale. “GPT-3: What’s it good for?” In: Natural Language Engineering 27.1 (2021), pp. 113–118. [26] William HE Day and Herbert Edelsbrunner. “Efficient algorithms for agglomerative hierarchical clustering methods”. In: Journal of classification 1.1 (1984), pp. 7–24. [27] Jyotirmoy Deshmukh, Xiaoqing Jin, James Kapinski, and Oded Maler. “Stochastic local search for falsification of hybrid systems”. In: International Symposium on Automated Technology for Verification and Analysis . Springer. 2015, pp. 500–517. [28] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. “Bert: Pre-training of deep bidirectional transformers for language understanding”. In: arXiv preprint arXiv:1810.04805 (2018). [29] Ram Das Diwakaran, Sriram Sankaranarayanan, and Ashutosh Trivedi. “Analyzing Neighborhoods of Falsifying Traces in Cyber-Physical Systems”. In: Intl. Conference on Cyber-Physical Systems (ICCPS). ACM Press, 2017, pp. 109–119. [30] Alexandre Donzé. “Breach, a toolbox for verification and parameter synthesis of hybrid systems”. In: International Conference on Computer Aided Verification . Springer. 2010, pp. 167–170. [31] Alexandre Donzé, Thomas Ferrere, and Oded Maler. “Efficient robust monitoring for STL”. In: International Conference on Computer Aided Verification . Springer. 2013, pp. 264–279. [32] Alexandre Donzé and Oded Maler. “Robust Satisfaction of Temporal Logic over Real-Valued Signals”. In: Formal Modeling and Analysis of Timed Systems - 8th International Conference, FORMATS 2010, Klosterneuburg, Austria, September 8-10, 2010. Proceedings. 2010, pp. 92–106. [33] Georgios E Fainekos and George J Pappas. “Robustness of temporal logic specifications for continuous-time signals”. In: Theoretical Computer Science 410.42 (2009), pp. 4262–4291. [34] Thomas Ferrère, Dejan Nickovic, Alexandre Donzé, Hisahiro Ito, and James Kapinski. “Interface-aware signal temporal logic”. In: Proceedings of the 22nd ACM International Conference on Hybrid Systems: Computation and Control. ACM. 2019, pp. 57–66. [35] Bernold Fiedler and Arnd Scheel. “Spatio-temporal dynamics of reaction-diffusion patterns”. In: Trends in nonlinear analysis (2003), pp. 23–152. 132 [36] Ivan Gavran, Eva Darulova, and Rupak Majumdar. “Interactive synthesis of temporal specifications from examples and natural language”. In: Proceedings of the ACM on Programming Languages 4.OOPSLA (2020), pp. 1–26. [37] Ebru Aydin Gol. “Efficient online monitoring and formula synthesis with past stl”. In: 2018 5th International Conference on Control, Decision and Information Technologies (CoDIT). IEEE. 2018, pp. 916–921. [38] Josif Grabocka, Nicolas Schilling, Martin Wistuba, and Lars Schmidt-Thieme. “Learning time-series shapelets”. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 2014, pp. 392–401. [39] Jie He, Ezio Bartocci, Dejan Niˇ ckovi´ c, Haris Isakovic, and Radu Grosu. “From English to Signal Temporal Logic”. In: arXiv preprint arXiv:2109.10294 (2021). [40] Bardh Hoxha, Houssam Abbas, and Georgios Fainekos. “Benchmarks for Temporal Logic Requirements for Automotive Systems”. In: ARCH14-15. 1st and 2nd International Workshop on Applied veRification for Continuous and Hybrid Systems . Ed. by Goran Frehse and Matthias Althoff. V ol. 34. EPiC Series in Computing. EasyChair, 2015, pp. 25–30. [41] Bardh Hoxha, Adel Dokhanchi, and Georgios Fainekos. “Mining parametric temporal logic properties in model-based design for cyber-physical systems”. In: International Journal on Software Tools for Technology Transfer 20.1 (2018), pp. 79–93. [42] Xiaohui Huang, Yunming Ye, Liyan Xiong, Raymond YK Lau, Nan Jiang, and Shaokai Wang. “Time series k-means: A new k-means type smooth subspace clustering for time series data”. In: Information Sciences 367 (2016), pp. 1–13. [43] Michael Hüsken and Peter Stagge. “Recurrent neural networks for time series classification”. In: Neurocomputing 50 (2003), pp. 223–235. [44] Roberto Interdonato, Dino Ienco, Raffaele Gaetano, and Kenji Ose. “DuPLO: A DUal view Point deep Learning architecture for time series classificatiOn”. In: ISPRS journal of photogrammetry and remote sensing 149 (2019), pp. 91–104. [45] Hisao Ishibuchi and Takashi Yamamoto. “Interpretability issues in fuzzy genetics-based machine learning for linguistic modelling”. In: Modelling with Words. Springer, 2003, pp. 209–228. [46] Susmit Jha, Ashish Tiwari, Sanjit A Seshia, Tuhin Sahai, and Natarajan Shankar. “TeLEx: learning signal temporal logic from positive examples using tightness metric”. In: Formal Methods in System Design (2019), pp. 1–24. [47] Susmit Jha, Ashish Tiwari, Sanjit A Seshia, Tuhin Sahai, and Natarajan Shankar. “Telex: Passive STL learning using only positive examples”. In: International Conference on Runtime Verification . Springer. 2017, pp. 208–224. [48] Xiaoqing Jin, Jyotirmoy V Deshmukh, James Kapinski, Koichi Ueda, and Ken Butts. “Powertrain control verification benchmark”. In: Proceedings of the 17th international conference on Hybrid systems: computation and control. ACM. 2014, pp. 253–262. 133 [49] Xiaoqing Jin, Alexandre Donzé, Jyotirmoy V Deshmukh, and Sanjit A Seshia. “Mining requirements from closed-loop control models”. In: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 34.11 (2015), pp. 1704–1717. [50] Austin Jones, Zhaodan Kong, and Calin Belta. “Anomaly detection in cyber-physical systems: A formal methods approach”. In: 53rd IEEE Conference on Decision and Control. IEEE. 2014, pp. 848–853. [51] James Kapinski, Xiaoqing Jin, Jyotirmoy Deshmukh, Alexandre Donze, Tomoya Yamaguchi, Hisahiro Ito, Tomoyuki Kaga, Shunsuke Kobuna, and Sanjit Seshia. ST-Lib: a library for specifying and classifying model behaviors. Tech. rep. SAE Technical Paper, 2016. [52] Fazle Karim, Somshubra Majumdar, Houshang Darabi, and Shun Chen. “LSTM fully convolutional networks for time series classification”. In: IEEE access 6 (2017), pp. 1662–1669. [53] Mehrdad Kiamari, Gowri Ramachandran, Quynh Nguyen, Eva Pereira, Jeanne Holm, and Bhaskar Krishnamachari. “COVID-19 Risk Estimation using a Time-varying SIR-model”. In: Proc. of the 1st ACM SIGSPATIAL International Workshop on Modeling and Understanding the Spread of COVID-19. 2020, pp. 36–42. [54] Zhaodan Kong, Austin Jones, Ana Medina Ayala, Ebru Aydin Gol, and Calin Belta. “Temporal logic inference for classification and prediction from data”. In: Proceedings of the 17th international conference on Hybrid systems: computation and control. ACM. 2014, pp. 273–282. [55] Ron Koymans. “Specifying real-time properties with metric temporal logic”. In: Real-time systems 2.4 (1990), pp. 255–299. [56] Justin Noah Kreikemeyer, Jane Hillston, and Adelinde Uhrmacher. “Probing the Performance of the Edinburgh Bike Sharing System using SSTL”. In: Proceedings of the 2020 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation. 2020, pp. 141–152. [57] Hadas Kress-Gazit, Georgios E Fainekos, and George J Pappas. “Translating structured english to robot controllers”. In: Advanced Robotics 22.12 (2008), pp. 1343–1359. [58] Hyun Jung La and Soo Dong Kim. “A service-based approach to designing cyber physical systems”. In: 2010 IEEE/ACIS 9th International Conference on Computer and Information Science. IEEE. 2010, pp. 895–900. [59] Himabindu Lakkaraju, Stephen H Bach, and Jure Leskovec. “Interpretable decision sets: A joint framework for description and prediction”. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM. 2016, pp. 1675–1684. [60] Jiwei Li, Pierluigi Nuzzo, Alberto Sangiovanni-Vincentelli, Yugeng Xi, and Dewei Li. “Stochastic contracts for cyber-physical system design under probabilistic requirements”. In: Proceedings of the 15th ACM-IEEE International Conference on Formal Methods and Models for System Design. ACM. 2017, pp. 5–14. 134 [61] Sheng-Tun Li and Yi-Chung Cheng. “A stochastic HMM-based forecasting model for fuzzy time series”. In: IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 40.5 (2009), pp. 1255–1266. [62] Wenchao Li, Alessandro Forin, and Sanjit A. Seshia. “Scalable Specification Mining for Verification and Diagnosis”. In: Proceedings of the Design Automation Conference (DAC). June 2010, pp. 755–760. [63] Oded Maler. “Learning Monotone Partitions of Partially-Ordered Domains (Work in Progress)”. In: (2017). [64] Oded Maler and Dejan Nickovic. “Monitoring temporal properties of continuous signals”. In: Formal Techniques, Modelling and Analysis of Timed and Fault-Tolerant Systems. Springer, 2004, pp. 152–166. [65] Oded Maler and Dejan Niˇ ckovi´ c. “Monitoring properties of analog and mixed-signal circuits”. In: International Journal on Software Tools for Technology Transfer 15.3 (2013), pp. 247–268. [66] Rahul Mangharam, Houssam Abbas, Madhur Behl, Kuk Jang, Miroslav Pajic, and Zhihao Jiang. “Three challenges in cyber-physical systems”. In: 2016 8th International Conference on Communication Systems and Networks (COMSNETS). IEEE. 2016, pp. 1–8. [67] Thomas M. Mitchell. Machine Learning. 1st ed. New York, NY , USA: McGraw-Hill, Inc., 1997. ISBN: 0070428077, 9780070428072. [68] V olodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. “Human-level control through deep reinforcement learning”. In: nature 518.7540 (2015), pp. 529–533. [69] Sara Mohammadinejad, Jyotirmoy V Deshmukh, and Laura Nenzi. “Mining interpretable spatio-temporal logic properties for spatially distributed systems”. In: International Symposium on Automated Technology for Verification and Analysis . Springer. 2021, pp. 91–107. [70] Sara Mohammadinejad, Jyotirmoy V Deshmukh, and Aniruddh G Puranic. “Mining Environment Assumptions for Cyber-Physical System Models”. In: arXiv preprint arXiv:2005.08435 (2020). [71] Sara Mohammadinejad, Jyotirmoy V Deshmukh, Aniruddh G Puranic, Marcell Vazquez-Chanlatte, and Alexandre Donzé. “Interpretable classification of time-series data using efficient enumerative techniques”. In: Proc. of HSCC. 2020. [72] Rani Nelken and Nissim Francez. “Automatic translation of natural language system specifications into temporal logic”. In: International Conference on Computer Aided Verification . Springer. 1996, pp. 360–371. [73] Laura Nenzi, Luca Bortolussi, Vincenzo Ciancia, Michele Loreti, and Mieke Massink. “Qualitative and Quantitative Monitoring of Spatio-Temporal Properties with SSTL”. In: LMCS 14.4 (2018). 135 [74] Laura Nenzi, Simone Silvetti, Ezio Bartocci, and Luca Bortolussi. “A robust genetic algorithm for learning temporal specifications from data”. In: International Conference on Quantitative Evaluation of Systems. Springer. 2018, pp. 323–338. [75] Dejan Niˇ ckovi´ c, Xin Qin, Thomas Ferrère, Cristinel Mateis, and Jyotirmoy Deshmukh. “Shape Expressions for Specifying and Extracting Signal Features”. In: International Conference on Runtime Verification . Springer. 2019, pp. 292–309. [76] Pierluigi Nuzzo, Antonio Iannopollo, Stavros Tripakis, and Alberto L Sangiovanni-Vincentelli. From relational interfaces to assume-guarantee contracts. Tech. rep. UC Berkeley, 2014. [77] Tim Oates, Laura Firoiu, and Paul R Cohen. “Using dynamic time warping to bootstrap HMM-based clustering of time series”. In: Sequence Learning. Springer, 2000, pp. 35–52. [78] S Padmavathi and E Ramanujam. “Naıve bayes classifier for ecg abnormalities using multivariate maximal time series motif”. In: Procedia Computer Science 47 (2015), pp. 222–228. [79] Miroslav Pajic, Zhihao Jiang, Insup Lee, Oleg Sokolsky, and Rahul Mangharam. “From verification to implementation: A model translation tool and a pacemaker case study”. In: 2012 IEEE 18th real time and embedded technology and applications symposium. IEEE. 2012, pp. 173–184. [80] Amir Pnueli. “The temporal logic of programs”. In: 18th Annual Symposium on Foundations of Computer Science (sfcs 1977). IEEE. 1977, pp. 46–57. [81] Aniruddh Puranic, Jyotirmoy Deshmukh, and Stefanos Nikolaidis. “Learning from Demonstrations using Signal Temporal Logic”. In: Proceedings of the 2020 Conference on Robot Learning. 2021. [82] Aarne Ranta. “Translating between language and logic: what is easy and what is difficult”. In: International Conference on Automated Deduction. Springer. 2011, pp. 5–25. [83] Ran Raz and Amir Shpilka. “Deterministic polynomial identity testing in non-commutative models”. In: Computational Complexity 14.1 (2005), pp. 1–19. [84] Hendrik Roehm, Rainer Gmehlich, Thomas Heinz, Jens Oehlerking, and Matthias Woehrle. “Industrial Examples of Formal Specifications for Test Case Generation”. In: Workshop on Applied veRification for Continuous and Hybrid Systems, ARCH@CPSWeek 2015 . 2015, pp. 80–88. [85] Peter J Rousseeuw. “Silhouettes: a graphical aid to the interpretation and validation of cluster analysis”. In: Journal of computational and applied mathematics 20 (1987), pp. 53–65. [86] Ginés Rubio, Héctor Pomares, Ignacio Rojas, and Luis Javier Herrera. “A heuristic method for parameter selection in LS-SVM: Application to time series prediction”. In: International Journal of Forecasting 27.3 (2011), pp. 725–739. [87] Stefan Rüping. SVM kernels for time series analysis. Tech. rep. Technical report, 2001. [88] Burr Settles. “Active learning literature survey”. In: (2009). 136 [89] Szymon Stoma, Alexandre Donzé, François Bertaux, Oded Maler, and Gregory Batt. “STL-based analysis of TRAIL-induced apoptosis challenges the notion of type I/type II cell line classification”. In: PLoS computational biology 9.5 (2013), e1003056. [90] Jiang Su and Harry Zhang. “A fast decision tree learning algorithm”. In: AAAI. V ol. 6. 2006, pp. 500–505. [91] Romain Tavenard, Johann Faouzi, Gilles Vandewiele, Felix Divo, Guillaume Androz, Chester Holtz, Marie Payne, Roman Yurchak, Marc Rußwurm, Kushal Kolar, and Eli Woods. “Tslearn, A Machine Learning Toolkit for Time Series Data”. In: JMLR 21.118 (2020), pp. 1–6. [92] Abhishek Udupa, Arun Raghavan, Jyotirmoy V Deshmukh, Sela Mador-Haim, Milo MK Martin, and Rajeev Alur. “TRANSIT: specifying protocols with concolic snippets”. In: ACM SIGPLAN Notices 48.6 (2013), pp. 287–296. [93] Dogan Ulus. “M ontre: a tool for monitoring timed regular expressions”. In: International Conference on Computer Aided Verification . Springer. 2017, pp. 329–335. [94] Dogan Ulus, Thomas Ferrère, Eugene Asarin, and Oded Maler. “Online timed pattern matching using derivatives”. In: International Conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer. 2016, pp. 736–751. [95] Maarten Van der Heijden, Marina Velikova, and Peter JF Lucas. “Learning Bayesian networks for clinical time series analysis”. In: Journal of biomedical informatics 48 (2014), pp. 94–105. [96] Marcell Vazquez-Chanlatte, Jyotirmoy V Deshmukh, Xiaoqing Jin, and Sanjit A Seshia. “Logical clustering and learning for time-series data”. In: International Conference on Computer Aided Verification . Springer. 2017, pp. 305–325. [97] Marcell Vazquez-Chanlatte, Shromona Ghosh, Jyotirmoy V Deshmukh, Alberto Sangiovanni-Vincentelli, and Sanjit A Seshia. “Time-Series Learning Using Monotonic Logical Properties”. In: International Conference on Runtime Verification . Springer. 2018, pp. 389–405. [98] Masaki Waga, Ichiro Hasuo, and Kohei Suenaga. “Efficient online timed pattern matching by automata-based skipping”. In: International Conference on Formal Modeling and Analysis of Timed Systems. Springer. 2017, pp. 224–243. [99] Masaki Waga, Ichiro Hasuo, and Kohei Suenaga. “MONAA: a tool for timed pattern matching with automata-based acceleration”. In: 2018 IEEE Workshop on Monitoring and Testing of Cyber-Physical Systems (MT-CPS). IEEE. 2018, pp. 14–15. [100] Min Xu, Ling-Yu Duan, Jianfei Cai, Liang-Tien Chia, Changsheng Xu, and Qi Tian. “HMM-based audio keyword generation”. In: Pacific-Rim Conference on Multimedia . Springer. 2004, pp. 566–574. [101] Zhe Xu, Calin Belta, and Agung Julius. “Temporal logic inference with prior information: An application to robot arm movements”. In: IFAC-PapersOnLine 48.27 (2015), pp. 141–146. 137 [102] Tomoya Yamaguchi, Tomoyuki Kaga, Alexandre Donzé, and Sanjit A Seshia. “Combining requirement mining, software model checking and simulation-based verification for industrial automotive systems”. In: 2016 Formal Methods in Computer-Aided Design (FMCAD). IEEE. 2016, pp. 201–204. [103] Kiyoung Yang and Cyrus Shahabi. “An efficient k nearest neighbor search for multivariate time series”. In: Information and Computation 205.1 (2007), pp. 65–98. [104] Lexiang Ye and Eamonn Keogh. “Time series shapelets: a new primitive for data mining”. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. 2009, pp. 947–956. [105] Lexiang Ye and Eamonn Keogh. “Time series shapelets: a novel technique that allows accurate, interpretable and fast classification”. In: Data mining and knowledge discovery 22.1-2 (2011), pp. 149–182. [106] Jesin Zakaria, Abdullah Mueen, and Eamonn Keogh. “Clustering time series using unsupervised-shapelets”. In: 2012 IEEE 12th International Conference on Data Mining. IEEE. 2012, pp. 785–794. 138 Appendices C Boolean and Quantitative Semantics of STREL STREL is equipped with both Boolean and quantitative semantics; a Boolean semantics, (S,σ,ℓ,t)|=ϕ, with the meaning that the spatio-temporal trace σ in location ℓ at time t with spatial modelS , satisfies the formula ϕ and a quantitative semantics, ρ(ϕ,S,σ,t), that can be used to measure the quantitative level of satisfaction of a formula for a given trajectory and space model. The function ρ is also called the robustness function. The satisfaction of the whole trajectory correspond to the satisfaction at time 0, i.e. ρ(ϕ,S,σ)=ρ(ϕ,S,σ,0). Semantics for Boolean and temporal operator remain the same as STL [64]. We describe below the quantitative semantics of the spatial operators. Boolean semantics can be derived substituting min,max with∨,∧ and considering the Boolean satisfaction instead ofρ. Reach The quantitative semantics of the reach operator is: ρ(ϕ 1 R [d 1 ,d 2 ] ϕ 2 ,S,σ,ℓ,t)= max τ∈T(S,ℓ) max ℓ ′ ∈τ:(d τ S (ℓ ′ )∈[d 1 ,d 2 ]) (min(ρ(ϕ 2 ,S,σ,ℓ ′ ,t), min j<τ(ℓ ′ ) ρ(ϕ 1 ,S,σ,τ[ j],t))) (S,σ,ℓ,t), a spatio-temporal traceσ, in locationℓ, at time t, with a spatial modelS , satisfies ϕ 1 R [d 1 ,d 2 ] ϕ 2 iff it satisfies ϕ 2 in a locationℓ ′ reachable fromℓ through a routeτ, with a length d τ S (ℓ ′ )∈[d 1 ,d 2 ], and such 139 thatτ[0]=ℓ and all its elements with index less thanτ(ℓ ′ ) satisfyϕ 1 . Intuitively, the reachability operator ϕ 1 R [d 1 ,d 2 ] ϕ 2 describes the behavior of reaching a location satisfying property ϕ 2 passing only through lo- cations that satisfy ϕ 1 , and such that the distance from the initial location and the final one belongs to the interval[d 1 ,d 2 ]. Escape The quantitative semantics of the escape operator is: ρ(E [d 1 ,d 2 ] ϕ,S,σ,ℓ,t)= max τ∈T(S,ℓ) max ℓ ′ ∈τ:(d τ S [ℓ,ℓ ′ ]∈[d 1 ,d 2 ]) min i≤ τ(ℓ ′ ) ρ(ϕ,S,σ,τ[i],t). (S,σ,ℓ,t), a spatio-temporal trace σ, in locationℓ, at time t, with a spatial modelS , satisfies E [d 1 ,d 2 ] ϕ if and only if there exists a route τ and a locationℓ ′ ∈τ such that τ[0]=ℓ, d τ S [τ[0],ℓ ′ ]∈[d 1 ,d 2 ] and all elementsτ[0],...τ[k] (withτ(ℓ ′ )= k) satisfyϕ. Practically, the escape operatorE [d 1 ,d 2 ] ϕ, instead, describes the possibility of escaping from a certain region passing only through locations that satisfy ϕ, via a route with a distance that belongs to the interval[d 1 ,d 2 ]. Somewhere 3 [0,d] ϕ := trueR [0,d] ϕ holds for (S,σ,ℓ,t) iff there exists a location ℓ ′ inS such that (S,σ,ℓ ′ ,t) satisfies ϕ andℓ ′ is reachable fromℓ via a routeτ with length d τ S [ℓ ′ ]≤ d. Everywhere. □ [0,d] ϕ :=¬3 [0,d] ¬ϕ holds for (S,σ,ℓ,t) iff all the locations ℓ ′ reachable from ℓ via a path,with length d τ S [ℓ ′ ]≤ d, satisfyϕ. Surround ϕ 1 ⃝ [0,d] ϕ 2 :=ϕ 1 ∧¬(ϕ 1 R [0,d] ¬(ϕ 1 ∨ϕ 2 ))∧¬(E [d,∞] (ϕ 1 )) holds for(S,σ,ℓ,t) iff there exists aϕ 1 -region that containsℓ, all locations in that region satisfies ϕ 1 and are reachable fromℓ through a path with length less than d. Furthermore, all the locations that do not belong to the ϕ 1 -region but are directly connected to a location inϕ 1 -region must satisfyϕ 2 and be reached fromℓ via a path with length less than d. Intuitively, the surround operator indicates the notion of being surrounded by a ϕ 2 -region, while being 140 in a ϕ 1 -region, with some added constraints. The idea is that one cannot escape from a ϕ 1 -region without passing from a node that satisfies ϕ 2 and, in any case, one has to reach aϕ 2 -node at a distance between d 1 and d 2 . D Monotonicity Proofs for spatial operators Lemma. The polarity for PSTREL formulas ϕ(d 1 ,d 2 ) of the form ψ 1 R [d 1 ,d 2 ] ψ 2 ,E [d 1 ,d 2 ] ψ,3 [d 1 ,d 2 ] ψ and ψ 1 ⃝ [d 1 ,d 2 ] ψ 2 are sgn(d 1 )=− and sgn(d 2 )=+, i.e. if a spatio-temporal trace satisfies ϕ(ν(d 1 ),ν(d 2 )), then it also satisfies any STREL formula over a strictly larger spatial model induced distance interval, i.e. by decreasingν(d 1 ) and increasingν(d 2 ). For a formula□ [d 1 ,d 2 ] ψ, sgn(d 1 )=+ and sgn(d 2 )=− , i.e. the formula obtained by strictly shrinking the distance interval. Proof. To prove the above lemma, we first define some ordering on intervals. For intervals I =[a,b] and I ′ =[a ′ ,b ′ ], I ′ ≥ I ⇐⇒ a ′ ≤ a and b ′ ≥ b. Followed by the defined ordering on intervals, max I ′ f(x)≥ max I f(x) ⇐⇒ I ′ ≥ I (D.1) min I ′ f(x)≤ min I f(x) ⇐⇒ I ′ ≥ I (D.2) Assuming d ′ 1 ≤ d 1 and d ′ 2 ≥ d 2 , from quantitative semantics of the Reach operator and equation. D.1 we get: ρ(ϕ 1 R [d ′ 1 ,d 2 ] ϕ 2 ,S,σ,ℓ,t)≥ ρ(ϕ 1 R [d 1 ,d 2 ] ϕ 2 ,S,σ,ℓ,t), 141 ρ(ϕ 1 R [d 1 ,d ′ 2 ] ϕ 2 ,S,σ,ℓ,t)≥ ρ(ϕ 1 R [d 1 ,d 2 ] ϕ 2 ,S,σ,ℓ,t), which proves that the Reach operator is monotonically decreasing with respect to d 1 and monotonically increasing with respect to d 2 . The proofs for other spatial operators are similar, and we skip for brevity. 142
Abstract (if available)
Abstract
Sequential data refers to data where order between successive data-points is important. Time-series data and spatio-temporal data are examples of sequential data. A time-series datum is an ordered sequence of data values where each data point is a unique time-stamp. A spatio-temporal datum can be viewed as a set of time-series data, which each time-series datum is associated with a unique spatial location. Cyber-physical system applications such as autonomous vehicles, wearable devices, and avionic systems generate a large volume of time-series data. The Internet-of-Things, complex sensor networks, multi-agent cyber-physical systems are all examples of spatially distributed systems that continuously evolve in time, and such systems generate huge amounts of spatio-temporal data. Designers often look for tools to extract high-level information from such data. Traditional machine learning (ML) techniques for sequential data offer several solutions to solve these problems; however, the artifacts trained by these algorithms often lack interpretability. For instance, Recurrent Neural Networks (RNNs) are among the most popular models used by machine learning techniques for solving various learning problems. However, RNNs generate black-box models that may be difficult to interpret due to their high-dimensionality, highly nonlinear nature and a lack of immediate connection to visible patterns in the data. These complex ML models demonstrate good performance in various learning tasks; however, in many application settings, analysts require an understanding of why the model produces a particular output. For example, in a classification task, it is important to understand which features in the data led to the data being assigned a particular class label. A definition of interpretability by Biran and Cotton is: Models are interpretable if their decisions can be understood by humans.
Formal parametric logic, such as Signal Temporal Logic (STL) and Spatio-temporal Reach and Escape Logic (STREL) are seeing increasing adoption in the formal methods and industrial communities as go-to specification languages for sequential data. Formal parametric logic are machine-checkable, and human understandable abstractions for sequential data, and they can be used to tackle a variety of learning problems that include but are not limited to classification, clustering and active learning. The use of formal parametric logic in the context of machine learning tasks has seen considerable amount of interest in recent years. We make several significant contributions to this growing body of literature. This dissertation makes five key contributions towards learning formal parametric logic from sequential data. (1) We develop a new technique for learning STL-based classifiers from time-series data and provide a way to systematically explore the space of all STL formulas. (2) We conduct a user study to investigate whether STL formulas are indeed understandable to humans. (3) As an application of our STL-based learning framework, we investigate the problem of mining environment assumptions for cyber-physical system models. (4) We develop the first set of algorithms for logical unsupervised learning of spatio-temporal data and show that our method generates STREL formulas of bounded description complexity. (5) We design an explainable and interactive learning approach to learn from natural language and demonstrations using STL. Finally, we showcase the effectiveness of our approaches on case studies that include but are not limited to urban transportation, automotive, robotics and air quality monitoring.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Data-driven and logic-based analysis of learning-enabled cyber-physical systems
PDF
Sample-efficient and robust neurosymbolic learning from demonstrations
PDF
Verification, learning and control in cyber-physical systems
PDF
Formal analysis of data poisoning robustness of K-nearest neighbors
PDF
Assume-guarantee contracts for assured cyber-physical system design under uncertainty
PDF
Differential verification of deep neural networks
PDF
Security-driven design of logic locking schemes: metrics, attacks, and defenses
PDF
Scaling up temporal graph learning: powerful models, efficient algorithms, and optimized systems
PDF
Side-channel security enabled by program analysis and synthesis
PDF
Modeling dyadic synchrony with heterogeneous data: validation in infant-mother and infant-robot interactions
PDF
Deep learning models for temporal data in health care
PDF
Structure learning for manifolds and multivariate time series
PDF
Interpretable machine learning models via feature interaction discovery
PDF
Data scarcity in robotics: leveraging structural priors and representation learning
PDF
Theoretical foundations for modeling, analysis and optimization of cyber-physical-human systems
PDF
Dispersed computing in dynamic environments
PDF
Dealing with unknown unknowns
PDF
Towards the efficient and flexible leveraging of distributed memories
PDF
High-throughput methods for simulation and deep reinforcement learning
PDF
Efficient machine learning techniques for low- and high-dimensional data sources
Asset Metadata
Creator
Mohammadinejad, Sara
(author)
Core Title
Learning logical abstractions from sequential data
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Degree Conferral Date
2022-12
Publication Date
01/04/2023
Defense Date
12/14/2022
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
cyber-physical systems,formal methods,interpretable machine learning for sequential data,natural language processing,OAI-PMH Harvest,signal temporal logic
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Deshmukh, Jyotirmoy (
committee chair
), Bogdan, Paul (
committee member
), Raghothaman, Mukund (
committee member
), Thomason, Jesse (
committee member
), Wang, Chao (
committee member
)
Creator Email
saramnjusc@gmail.com,saramoha@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC112661532
Unique identifier
UC112661532
Identifier
etd-Mohammadin-11398.pdf (filename)
Legacy Identifier
etd-Mohammadin-11398
Document Type
Dissertation
Format
theses (aat)
Rights
Mohammadinejad, Sara
Internet Media Type
application/pdf
Type
texts
Source
20230104-usctheses-batch-999
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
cyber-physical systems
formal methods
interpretable machine learning for sequential data
natural language processing
signal temporal logic