Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Scaling control synthesis and verification in autonomy using neuro-symbolic methods
(USC Thesis Other)
Scaling control synthesis and verification in autonomy using neuro-symbolic methods
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
SCALING CONTROL SYNTHESIS AND VERIFICATION IN AUTONOMY USING
NEURO-SYMBOLIC METHODS
by
Navid Hashemi
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(COMPUTER SCIENCE)
December 2024
Copyright 2025 Navid Hashemi
Dedication
To my father and mother
ii
Acknowledgements
I am profoundly grateful to my PhD advisor, Professor Jyotirmoy V. Deshmukh, for his invaluable guidance,
steadfast support, and commitment to achieving contributing research results with a foundation of formal
rigor. Jyo, thank you for allowing me the freedom to explore my own path and for always providing timely
and insightful feedback. I have learned so much from you. Your support, both financially and emotionally,
has meant the world to me, especially during my lowest moments when you helped me regain my footing.
You were the person who helped me to come back to normal condition when I was at my lowest. I am
also deeply thankful to Dr Georgios Fainekos and Professor Lars Lindemann, for all of their supports and
invaluable advices in my research journey. Their expertise in computer science topics was really helpful
for me. I also want to thank Dr. Danil Prokhorov, Dr. Bardh Hoxha, and Dr. Tomoya Yamaguchi for the
opportunity to conduct research within the Toyota community and for their invaluable advice throughout
my journey. My sincere thanks go as well to Professor Justin Ruths, whose sponsorship made it possible for
me to join the academic community in the United States and who guided me for four years in my previous
PhD program in Texas, teaching me how to plan and proceed in research as a professional researcher. Justin,
your support during my stressful time in UTD was a true lifeline. I wish I can come to Texas some day and
visit you again. Many thanks to professor Mahyar Fazlyab for giving me the opportunity to work on many
projects like incremental quadratic constraints for neural networks.
I’m also grateful to Xin Qin, Samuel Williams and Vidisha Kudalkar for our partnership and collaboration
on Data Driven reachability analysis, LB4TL and Scenario generation projects.
iii
To the academic authors whose work has inspired me: thank you. Notable papers that have deeply
impacted my thinking include:
• Robustness of Temporal Logic Specifications for Continuous-Time Signals by Georgios E. Fainekos
and George J. Pappas
• NNV: the neural network verification tool for deep neural networks and learning-enabled cyberphysical systems by Tran, Hoang-Dung, Xiaodong Yang, Diego Manzanas Lopez, Patrick Musau,
Luan Viet Nguyen, Weiming Xiang, Stanley Bak, and Taylor T. Johnson.
• A tutorial on conformal prediction by Shafer, Glenn, and Vladimir Vovk
• Robust validation: Confident predictions even when distributions shift by Cauchois, Maxime, Suyash
Gupta, Alnur Ali, and John C. Duchi
• Dropout: a simple way to prevent neural networks from overfitting by Srivastava, N., Hinton, G.,
Krizhevsky, A., Sutskever, I., & Salakhutdinov, R.
• Deep networks with stochastic depth by Huang, Gao, Yu Sun, Zhuang Liu, Daniel Sedra, and Kilian
Q. Weinberger.
I am thankful to Professor Chao Wang, Professor Bhaskar Krishnamachari for their role as the thesis
opponent, and to Professor Gaurav Sukhatme, Professor Pierluigi Nuzzo and Professor Mihailo R. Jovanovic
for serving on my qualification exam and theis proposal committee.
Lastly, my deepest gratitude goes to my family. To my parents, words cannot express my gratitude.
Your love and support have been the foundation of all I have achieved. I cherish every moment spent with
you and look forward to many more.
Navid Hashemi
December 2024
iv
Table of Contents
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Thesis Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Thesis Organization & List of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Chapter 2: Deterministic Formal Verification Framework for STL . . . . . . . . . . . . . . . . . . . 7
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 STL Robustness as a Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 STL Verification Using Reachability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4.1 Verification for STL Properties on DNN Models . . . . . . . . . . . . . . . . . . . . 15
2.4.2 End-to-End Neural Feedback Network (ENFN) . . . . . . . . . . . . . . . . . . . . . 16
2.4.3 STL Verification for DNN Models Using Reachability Analysis . . . . . . . . . . . . 16
2.4.4 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.5 Verification for STL Properties on ODE Models . . . . . . . . . . . . . . . . . . . . 23
2.4.6 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.5 STL Verification of DNN Models Using Sampling . . . . . . . . . . . . . . . . . . . . . . . . 26
2.5.1 Experimental Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.6 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Chapter 3: Learning Based Neurosymbolic Algorithm for STL Control Synthesis . . . . . . . . . . 36
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.1.1 Organization and Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3 Contribution of STL2NN in Learning Based Synthesis . . . . . . . . . . . . . . . . . . . . . 43
3.4 Training Neural Network Control Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.5 Numerical Evaluation for Training with LB4TL . . . . . . . . . . . . . . . . . . . . . . . . 50
3.6 The Challenge of Exploding Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
v
3.7 Extension to Longer Horizon Temporal Tasks & Higher Dimensional Systems . . . . . . . 57
3.7.1 Sampling-Based Gradient Approximation Technique . . . . . . . . . . . . . . . . . 58
3.7.2 Including the Critical Predicate in Time Sampling . . . . . . . . . . . . . . . . . . . 61
3.7.3 Safe Re-Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.7.4 Computing the Sampled Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.7.5 A Detailed Discussion on Training Algorithm . . . . . . . . . . . . . . . . . . . . . 66
3.8 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.8.1 12-Dimensional Quad-rotor (Nested 3-Future Formula) . . . . . . . . . . . . . . . . 69
3.8.2 Multi-Agent: Network of Dubins Cars (Nested Formula) . . . . . . . . . . . . . . . 71
3.8.3 6-Dimensional Quadrotor & Moving Platform: Landing a Quadrotor . . . . . . . . 72
3.8.3.1 Influence of Waypoint Function, Critical Predicate and Time Sampling
on Algorithm 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.8.4 Dubins Car: Growing Task Horizon for Dubins Car (Ablation Study on Time
Sampling) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.8.5 Statistical Verification of Synthesized Controllers . . . . . . . . . . . . . . . . . . . 79
3.9 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Chapter 4: Online Convex Optimization-based Policy Modification . . . . . . . . . . . . . . . . . . 85
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.3 Policy Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.3.1 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Chapter 5: Learning-based Statistical Reachability Analysis . . . . . . . . . . . . . . . . . . . . . . 103
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.2 Problem Statement and Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.3 Learning A Surrogate Model Suitable for Probabilistic Reachability Analysis . . . . . . . . 114
5.4 Scalable Data-Driven Reachability Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
5.6 Extension to Longer Horizon Trajectories & Improving Accuracy . . . . . . . . . . . . . . 133
5.6.1 Scaling Training Strategy for Reachability . . . . . . . . . . . . . . . . . . . . . . . 135
5.6.2 PCA Based Inflating Hypercube . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
5.7 Numerical Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
5.7.1 12-Dimensional Quadcopter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
5.7.1.1 Experiment 1:[Comparison with Section 5.4] . . . . . . . . . . . . . . . . 144
5.7.1.2 Experiment 2: [Sequential Goal Reaching Task] . . . . . . . . . . . . . . 144
5.7.2 27-Dimensional Powertrain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
5.7.2.1 Experiment 3: [Reachability with Distribution Shift] . . . . . . . . . . . . 145
Chapter 6: Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
Appendix A: Conservatism for Exact Reachability . . . . . . . . . . . . . . . . . . . . . . . . . . 163
Appendix B: Lipschitz Constant Analysis for ENFN . . . . . . . . . . . . . . . . . . . . . . . . . 164
Appendix C: Generic Computational Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
Appendix D: Brief Summary of [58] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
vi
List of Tables
2.1 Quantitative Semantics of STL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Shows the result of verification utilizing the reachability analysis on ENFN. In each case
study we consider, both the plant model and the controller are ReLU-FFNNs. We use
the abbreviations A for Approximate star-set-based reachability, and E for the Exact
star-set-based technique. No parallel computing is used and no set partitioning is applied. 18
2.3 Verifying φ8 against NNCS in Eq. (2.14) utilizing exact-star reachability on ENFN. Initial
state set I = {(x, y)|x ∈ [−50, −40], y ∈ [85, 95]}. The trajectory encoding has 180
layers and STL2NN has 8 layers.No parallel computing is utilized. . . . . . . . . . . . . . . 24
3.1 Quantitative Semantics of STL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2 Description of the case studies and the training results. Both used tanh activation. . . . . 51
3.3 Comparison between policy training with LB4TL and the other smooth robustness
semantics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.4 Results on different case studies. Here, b is the hyper-parameter we utilized to generate
LB4TL in [76]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.5 Ablation studies for picking different options for the optimization process. This table shows
the results of the training algorithm in case study 3.8.3.1. We indicate that the training does
not result in positive robustness within 300 gradient steps by DNF (did not finish) with the
value of robustness in iteration 300 in brackets. The table represents an ablation study, where
we disable the various heuristic optimizations in Algorithm 4 in different combinations and
report the extent of reduction in efficiency. We use ✓, × to respectively indicate a heuristic
being included or excluded. The time-sampling technique is utilized in all the experiments. . . 77
3.6 Ablation study. We mark the experiment with DNF[.] if it is unable to provide a positive
robustness within 8000 iterations, and the value inside brackets is the maximum value
of robustness it finds. We magnify the environment proportional to the horizon. All
experiments for K = 10, 50, 100 use a unique guess for initial parameter values, and all
the experiments for K = 500, 1000 use another unique initial guess. Here, we utilized
critical predicate module in both cases of Algorithm 4 (columns 3 & 4). . . . . . . . . . . . 78
vii
5.1 Shows the detail of our computation process to provide probabilistically guaranteed
flowpipes. The time horizon for experiments 1,5,6 is K = 50 time-steps and for the
experiments 2,3,4 is K = 100 time-steps. The sampling time for quadcopter and TRVDP
are 0.05 and 0.02 seconds, respectively. We examine the results with a valid distribution
shift (explained in detail in Table 5.2) that is less than the maximum specified distribution
shift in terms of total variation. This shift is estimated through the comparison between
300, 000 trajectories from Dreal
S,K and Dsim
S,K. We also utilize 10, 000 trajectories (number
of trials) from this specific distribution Dreal
S,K to examine the coverage of flowpipes and
300, 000 trajectories for examination of the coverage level for R∗
δ,τ (i.e.∆˜ ,
˜δ). To evaluate
the contribution of robust conformal inference, we also solve for the flowpipes again
neglecting the distribution shift, i.e. ϵ¯ = ϵ, and show the coverage guarantee for R∗
δ,τ and
flowpipes may get violated, (˜δ < δ or ∆˜ < δ), in case the shifted distribution (deployment
distribution) is considered. The runtimes we report for reachability assumes no parallel
computing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
5.2 Initial state distribution and added Gaussian noise (mean:0, covariance:Σ) for the training
and the shifted environments; uni(I) denotes the uniform distribution over I. . . . . . . . 130
5.3 Shows details of the experiments. The models are trained in parallel with 18 CPU workers.
Thus, the average training runtime may vary by selecting different number of workers. The
words E, and A represent exact-star and approx-star, respectively. . . . . . . . . . . . . . . 143
viii
List of Figures
2.1 The structure of ENFN (that encodes the computation of the trajectory σs0
starting from
initial state s0) composed with STL2NN (that encodes the computation of the robustness
of the STL formula φ w.r.t. σs0
). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Trajectories for the model NFC-2d. The NN-controller is required to drive trajectories to
visit region P3 within time k, where k ∈ [75, 100]. The controller should also avoid unsafe
sets P1, P2 at all times. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3 Trajectories for the model NFC-3d. The NN-controller is required to drive the model to the
region P3 within time k, where k ∈ [35, 50], while avoiding the unsafe sets P1, P2 at all
times. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4 Trajectories of model used to plant and controller models used to show scalability of
verification in the spec size. We propose 4 different sets that we require the object to visit
sequentially. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5 Trajectories for NNCS shown in Eq. (2.14). . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.6 This figure shows the trajectories of quadcopter driven by a controller which has been
trained to satisfy the STL specification in (2.16) . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.8 For a given cubical sub-region S(i, j, k), i, j, k ∈ [10], one can compute its index by
index(i, j, k) = 100(i − 1) + 10(j − 1) + k and the location of its center center(i, j, k) =
[0.0649, 0.0649, −0.0202]⊤ + [i − 1, j − 1, k − 1]⊤ × 0.00213. Therefore, computing
the index for a subregion, (a) presents the left-bound on robustness range (b) presents
the run-time to compute the reach-tube originated from the sub-region (c) presents the
run-time to compute the robustness range on STL2NN with approx-star reachability analysis. 27
2.9 Trajectories for the model in Section 2.5.1. The controller is required to drive the model
such that it visits the region P1 after 3 time-steps but no later than 6 time-steps. Once it
reaches region P1 it is required to visit P2 after 9 time steps but no later than 13 time steps. 30
2.10 Shows the verification run-time in linear time-varying plant, for every 64 partitions of the
set of initial states. We apply algorithm 2 on every partition and conclude the verification.
The red line shows the average run time on the partitions which is approximately 6 minutes.
On the other hand, the maximum run-time is 8 minutes and 40 seconds. . . . . . . . . . . 32
ix
2.11 Shows the verification run-time in neural network controlled quadrotor System, for every
64 partitions of the set of initial states. We apply algorithm 2 on every partition and
conclude the verification. This figure shows the verification run time on the majority of
partitions is approximately 40 minutes. The red line shows the average run-time which is
approximately 125 minutes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.12 [Recursive partitioning for STL verification with local Lipschitz computation: (a) Presents
the certificates ρ1 for each partition at each step. These certificates are computed with
convex programming utilizing MOSEK and YALMIP. The results are rounded upwards. (b)
Presents the certificates ρ2 for each partition at each step. These certificates are computed
over ENFN. The results are rounded downwards. (c) shows the verification results. The
result is 0 when ϵ > ρ2/ρ1 and is 1 when ϵ < ρ2/ρ1. Obviously 1 indicates the controller
is verified over the subset. We partitioned I in three steps to receive 1 on every partition.
The diameter ϵ is √
2/2,
√
2/4,
√
2/8 for the biggest, medium and smallest partitions
respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.13 Shows the evolution of states in a control feedback system for proposed LTV model in 50
time steps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.14 Shows the evolution of states for the quadrotor example, the quadrotor is controlled with
a pre-trained tanh FFNN controller, the quadrotor is planned to avoid E3 but requires to
meet on of destinations E1 or E2 within 20 time steps. . . . . . . . . . . . . . . . . . . . . . 34
3.1 Shows an illustration of the recurrent structure for the control feedback system. . . . . . . 40
3.2 Shows a comparison between ReLU, swish and softplus activation function. This figure
demonstrates the fact that softplus activation function is a guaranteed upper bound for
ReLU, and swish activation function is a guaranteed lower bound for ReLU activation
function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.3 This figure shows a comparison between a non-smooth objective function and its smooth
approximation. This approximation can be very helpful to improve the efficiency of
optimization. In this thesis STL2NN is an example of the non-smooth objective function
and LB4TL is its smooth approximation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.4 Shows the sets considered in the specification and also three different satisfying and
violating trajectories. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.5 Shows the comparison between the robustness computation runtime between STL2NN and
STLCG [113]. This comparison shows the noticeable improvent for computation efficiency
provided by vectorization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.6 Shows the schematic for the computation graph for STL2NN that given the trajectory σ
returns the robustness value. This graph repesents a FFNN that vectorizes the robustness
computation providing noticable level of efficiency in a training algorithm. To maintain its
vectorized nature, in each activation layer, we locate the ReLU activation function in the
buttom and Linear activation sunctions on the top. . . . . . . . . . . . . . . . . . . . . . . 45
x
3.7 Shows the schematic for LB4TL which is a smotth guaranteed lowe bound for STL2NN
that given the trajectory σ returns the robustness value. To maintain its vectorized nature,
in each layer of activations, we locate the softplus activation functions on the bottom and
the swish activation functions in the middle, and the Linear activation functions on the top. 45
3.8 This figure shows the symbolic trajectory generated by NN feedback controller, and the
computation graph for DT-STL robustness. The DT-STL robustness is presented as a
Nero-symbolic computation graph [77] via ReLU and Linear activation functions. . . . . 48
3.9 This figure shows the simulation of trajectories when the trained controller is deployed on
the noisy deployment environment, both controllers are trained in the presence of noise.
The trajectories of NN feedback controller that satisfy (a) and violate (b) the specification
and those of the open-loop controller that satisfy (c) and violate (d) the specification are
shown. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.10 Simulated trajectories of the NNFCS representing control for the simple car dynamics
for the trained controller, in contrast with those for a controller initialized with random
parameter values. The trajectories are initiated from the set of sampled initial conditions,
which is θ =
−3π
4
,
−5π
8
,
−π
2
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.11 Simulated trajectories for the trained controller in comparison to the trajectories for the
NN controller initialized with random parameter values for the quadrotor case study.
Trajectories are initiated from the set of sampled initial conditions consisting of the corners
of I and its center. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.12 this figure shows a common challenge in using critical predicate for control synthesis. This
figure presents the robustness as a piece-wise differentiable function of control parameter θ
(with resolution, 0.00001), where each differentiable segment represent a distinct critical
predicate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.13 This figure shows an example for the relation between control parameters and the resulting
robustness as a piece-wise differentiable function. Assuming a fixed initial state, every
control parameter is corresponding to a simulated trajectory, and that trajectory represents
a robustness value. This robustness value is equal to the quantitative semantics for the
critical predicate. Within each differentiable segment in this plot, the control parameters
yield trajectories associated with a unique critical predicate. . . . . . . . . . . . . . . . . . 63
3.14 This figure depicts the sampling-based gradient computation. In our approach, we freeze
the controller at some time-points, while at others we assume the controller to be a function
of its parameters that can vary in this iteration of back-propagation process. The actions
that are fixed are highlighted in red, whereas the dependent actions are denoted in black.
The red circles represent the time-steps where the controller is frozen. . . . . . . . . . . . 63
3.15 This figure shows the simulation of trained control parameters to satisfy the specified
temporal task in companion with the simulation result for initial guess for control parameters. 69
xi
3.16 These figures show a multi-agent system of 10 connected Dubins cars. Figure (a) shows the start
(blue dots) and goal points (green squares) for agents. Figs. (b,c) show simulated system trajectories
with both the initial untrained controller and the centralized NN controller trained with Algorithm 4.
The controller coordinates all cars to reach their respective goals between 20 and 48 seconds, and
then stay in their goal location for at least 12 seconds. It also keeps the cars at a minimum distance
from each other. We remark that the agents finish their tasks (the first component of φ4) at different
times. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.17 This figure shows the simulated trajectory for trained controller in comparison to the
trajectories for naive initial random guess. The frame is moving with a velocity determined
with the controller that also controls the quad-rotor. . . . . . . . . . . . . . . . . . . . . . . 75
3.19 This figure shows the simulation of the results for Dubins car in the ablation study proposed
in section (3.8.4). In this experiment, the task horizon is 1000 time-steps. . . . . . . . . . . 76
3.18 This figure shows the learning curve for training processes. Note, the figure has been
truncated and the initial robustness for all the experiments at iteration 0 is −47.8. This
figure shows that Algorithm 4 in the presence of the waypoint function concludes
successfully in 84 iterations while when the waypoint function is not included, it terminates
in 107 iterations. The algorithm also fails if the critical predicate is not considered in time
sampling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.1 Shows the comparison between PSO and our convex programming. The green and blue
curves are the results of algorithm 5 and optimal trajectory, respectively . The red curves
represent the deployment environment’s trajectory when there is no policy modification.
We utilize PSO for optimization (4.4) on the same model with convex programming, µNN
where the resultant trajectory is demonstrated in black. We also utilize PSO over the deep
model, µ
∗
NN and the resultant trajectory is demonstrated in magenta. . . . . . . . . . . . . 99
4.2 Shows the results of policy modification on stochastic linear environment of a car. In this
figure, the green curve presents the simulated optimal trajectory. Blue and red curves also
represent the trajectory of deployment environment in the presence and absence of policy
modification, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.3 The green curves represent the optimal trajectory for vrel and drel, while the red and blue
curves present the trajectory of deployment environment without policy adaptation and
with adaptation, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.1 Figures (a) and (b) show a comparison between flowpipes and distributions of UB/(nK)
respectively for training via MSE and training via our proposed loss function (5.11). . . . . 129
xii
5.2 This figure shows the proposed flowpipes computed for the quadcopter dynamics for
each state component over the time horizon of 100 time steps with δt = 0.05 that means
5 seconds operation of quadcopter. The red borders show the flowpipe that contains
trajectories from Dsim
S,K with provable coverage of δ ≥ 99.99%. The green shaded area
shows the density of a collection of 300, 000 of these trajectories, and the darker color
means the higher density of traces. The blue borders are also for a flowpipe that contains
the trajectories from distribution Dsim
S,K with δ ≥ 95%. The dotted black line also shows the
border of collected simulated trajectories. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
5.3 Shows the density of trajectories starting from I3 versus their computed flowpipes. The
green color-bar represents the density of traces from, Dsim
S,K and the blue color-bar is for
traces from Dreal
S,K. The shaded areas are generated via 3 × 105 different trajectories, and the
dotted lines represents their border. a) Shows two different flowpipes for TRVDP dynamics
with confidence level of 0.9999 on Dsim
S,K. The tighter flowpipe (blue color) utilizes the
linear programming (5.14) while the looser one (red color) does not. b) Shows a flowpipe
that covers trajectories from Dreal
S,K with the confidence level of 77% and also covers the
traces from Dsim
S,K with the confidence level of 99.5%. The blue shaded area is for Dreal
S,K
and the green shaded area is for Dsim
S,K. c) Shows the vector field of TRVDP dynamics that
illustrates the instability of the system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
5.4 This figure shows the division of the trajectory into N different segments σ
sim,q
s0
, q ∈ [N] . 135
5.5 The figure shows the projection of prediction errors for two-dimensional states over a
horizon of K = 2. The left figure illustrates the projection on the (R1
, R2
) axes (e.g.,
k = 1), and the right figure displays the projection on the (R3
, R4
) axes (e.g., k = 2). This
figure provides a comparison between the inflating hypercubes for a confidence level
δ ∈ (0, 1), generated by the PCA approach (red hypercubes) and the method proposed
in [78] (green hypercubes). It clearly demonstrates the superior accuracy of the PCA
technique compared to the other method. The principal axes for k = 1, 2 are (r
1
, r2
) and
(r
3
, r4
), respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
5.6 Shows the comparison with section 5.4. The blue and red borders are projections of our
and their δ-confident flowpipes respectively with δ = 99.99%. The shaded regions show
the density of the trajectories from T
trn
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
5.7 Shows the projection of our δ-confident flowpipe on each component of the trajectory
state. The shaded area are the simulation of trajectories from T
trn
. . . . . . . . . . . . . . . 142
5.8 Shows the projection of our δ-confident flowpipe on the first 8 components of the trajectory
state. There is a shift between the distribution of deployment and training environments.
The shaded area are the trajectories sampled from the deployment environment. . . . . . . 143
5.9 Shows the comparison of angular velocity of the last rotating mass in presence and absence
of the process noise. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
xiii
6.1 Shows the ENFN structure. Here N is the number of layers on ENFN. [tℓ
, zℓ
]
⊤ presents the
activation vector for ℓ-th layer and [tℓ
, pℓ
]
⊤ presents its pre-activation on ENFN. The role
of a linear activation function is to copy its input . . . . . . . . . . . . . . . . . . . . . . . 164
xiv
Abstract
Cyber-Physical Systems (CPS) form the backbone of essential infrastructure today, significantly impacting
our quality of life. From remote patient monitoring and robotic surgery to multi-robot systems like drone
fleets and self-driving cars, as well as smart grids and smart home technologies, CPS applications are
widespread. Ensuring the safety of these systems by verifying them against potential faults and designing
them to meet rigorous safety standards is a critical area of research. With the increased availability of
affordable, portable communication and computing devices, these systems are now more interconnected,
forming complex and interactive networks. Beyond remaining connected and synchronized, they must
also meet specific, often complex, requirements both as individual components and as part of a broader
network. For example, a drone fleet may be required to inspect designated areas within set timeframes and
adjust its formation as needed, all while upholding safety standards.
The main challenge in these interconnected systems is verifying that they can operate safely and
accurately under complex requirements. This thesis proposes planning and control algorithms to meet this
challenge, focusing on neural network-controlled systems in closed-loop configurations that must adhere
to specific requirements over time, known as signal temporal logic specifications.
The thesis is structured as follows: the first part presents deterministic formal verification algorithms for
general signal temporal logic specifications, while the second part introduces planning and feedback control
strategies to guide neural network-controlled systems in meeting individual and collective time-sensitive
goals. The third section addresses distribution shifts in planning, which adapt learned policies using
xv
path-tracking techniques. Finally, the last section focuses on reachability analysis under distribution shifts
in stochastic CPS. These proposed algorithms enhance system reliability and have been tested through
simulations and experiments on highly nonlinear, high-dimensional systems with complex temporal
specifications, demonstrating their effectiveness.
xvi
Chapter 1
Introduction
This chapter presents the motivation for our research, along with a comprehensive literature review of
prior work in our area and an analysis of the current challenges in the field. We will finally conclude this
section with a summary of our contributions.
1.1 Motivation
A Cyber-Physical System (CPS) is a network of connected physical parts that interact through sensors and
are directed by a central computer. Advances in technology have made these systems increasingly complex,
a trend visible across many engineering fields. Just as the internet reshaped human interaction, CPS are
expected to transform how we engage with the physical world. Critical applications include transportation,
healthcare, manufacturing, agriculture, energy, defense, aerospace, and infrastructure, each presenting
unique challenges in designing, building, and verifying these systems. Meeting these demands requires
collaboration across engineering disciplines and efficient algorithms that can operate in real time. This
thesis focuses on enhancing the scalability of learning-based feedback control in neural network controlled
systems (NNCS) by integrating methods from computer science, control theory, and robotics. Specifically,
we emphasize autonomous CPS, or systems that operate without human input, where all components must
function correctly together to maintain safety. Examples include medical devices, autonomous vehicles,
1
and robots working alongside people. For autonomous CPS to reliably handle mission-critical and safetysensitive tasks, formal verification is essential, as any deviation from expected behavior can have serious
implications for health, safety, and the economy.
1.2 Thesis Statement
An autonomous cyber-physical system can be viewed as a closed-loop controlled system. This thesis
addresses three key gaps in the literature concerning the design and verification of autonomous systems.
Consider a closed-loop system with an initial state s0 ∈ I ⊂ R
n
, where a feedback controller is provided
to fulfill a temporal task for each trajectory σs0 originating from the set I over the horizon K.
• For general temporal tasks, no formal deterministic verification framework exists in the literature
that can ensure every σs0
for all s0 ∈ I meets the specified requirements.
• The existing control synthesis methodologies face scalability issues to design a controller to satisfy
the temporal task in case the task becomes more complex and the horizon of the task becomes longer.
• Existing methods for policy adjustment under distributional shifts are slow and inefficient, often
hindered by issues of non-convexity and numerical instability.
• The existing data-driven reachability analysis techniques for stochastic systems lack data efficiency,
limiting their ability to provide probabilistic guarantees for higher-dimensional systems over extended horizons. These techniques become significantly more data-consuming when accounting for
distributional shifts.
To address these challenges, We have developed novel solutions to establish a formal deterministic
verification framework and to mitigate scalability issues in the outlined problems. To begin synthesis and
verification for a temporal task, it is essential to first encode the task in a formal framework. Various
2
frameworks exist for encoding temporal tasks, such as Metric Interval Temporal Logic (MITL)[11], propositional Spatio Temporal Logic (PSTL)[24], Linear Temporal Logic (LTL)[136], and Signal Temporal Logic
(STL)[126, 54]. Among these, Signal Temporal Logic is of particular interest, as it encompasses a broader
range of temporal specifications. However, STL is traditionally defined for continuous signals, while our
methods target discrete-time signals. To bridge this gap, we utilize a discrete version of STL known as
Discrete-Time Signal Temporal Logic (DT-STL)[54], which has been formally introduced in the literature to
encode temporal specifications for discrete signals.
1.3 Thesis Organization & List of Contributions
This thesis presents a verification and learning-based control framework for closed loop neural network
controlled systems NNCS, under complex specifications, expressed as Signal Temporal Logic (STL) formulas.
It is organized into a total of seven chapters, which are outlined below. Chapters 2,3,4,5 form the core of the
thesis, providing a comprehensive framework for verification and scaling synthesis under STL specifications
while addressing the effects of distribution shift. Each chapter contains more detailed contributions, with
relevant publications available in [82, 81, 77, 79, 76, 78, 84, 83, 106, 139], all of which have been published in
peer-reviewed journals and conferences.
Chapter 1. These chapters provide the Introduction and Preliminaries of the research reported on this
thesis.
Chapter 2. In this chapter, we propose the first deterministic verification framework for the verification of
signal temporal logics. The main idea is to reformulate the robustness function of a given STL formula [53]
in the form of a ReLU neural network. We call this new representation of STL as STL2NN, that is, an end to
end vectorized computation graph which resembles a ReLU neural network. This new computation graph
is the key to solve the unsolved and notorious problem of deterministic verification for Signal Temporal
3
Logics. This provides us the opportunity to transform the challenging verification problem for STL into
the well-studied domain of neural network reachability analysis. In this chapter, we present the method
for computing STL2NN and demonstrate its application for STL verification. The covered material in this
chapter is based on the following publication.
• Hashemi, Navid, Bardh Hoxha, Tomoya Yamaguchi, Danil Prokhorov, Georgios Fainekos, and Jyotirmoy Deshmukh. "A neurosymbolic approach to the verification of temporal logic properties of
learning-enabled control systems." In Proceedings of the ACM/IEEE 14th International Conference
on Cyber-Physical Systems (with CPS-IoT Week 2023), pp. 98-109. 2023.
Chapter 3. In this chapter, we address the scalability challenges in control synthesis when handling
increasingly complex tasks and tasks with extended horizons. To tackle this, we leverage a key advantage
of STL2NN: its computational efficiency in robustness evaluation and gradient calculation with respect
to trajectories. We demonstrate that using STL2NN for policy training reduces runtime from hours to
minutes, enabling the consideration of more complex specifications in policy design. Additionally, to
address scalability with task horizon length, we propose a dropout-inspired sampling technique [155],
which mitigates numerical issues caused by the well-known problem of exploding gradients. The covered
material in this chapter is based on the following publication.
• Hashemi, Navid, Bardh Hoxha, Danil Prokhorov, Georgios Fainekos, and Jyotirmoy V. Deshmukh.
"Scaling Learning based Policy Optimization for Temporal Logic Tasks by Controller Network
Dropout." ACM Transactions on Cyber-Physical Systems (2024).
• Hashemi, Navid, Samuel Williams, Bardh Hoxha, Danil Prokhorov, Georgios Fainekos, and Jyotirmoy
Deshmukh. "LB4TL: A Smooth Semantics for Temporal Logic to Train Neural Feedback Controllers."
IFAC-PapersOnLine 58, no. 11 (2024): 183-188.
4
• Hashemi, Navid, Xin Qin, Jyotirmoy V. Deshmukh, Georgios Fainekos, Bardh Hoxha, Danil Prokhorov,
and Tomoya Yamaguchi. "Risk-awareness in learning neural controllers for temporal logic objectives."
In 2023 American Control Conference (ACC), pp. 4096-4103. IEEE, 2023.
Chapter 4. The existing transfer learning algorithms suffer from non-convexity and computational
difficulties. In this chapter, we introduce an efficient algorithm for policy modification based on a convex
optimization approach, enabling improved scalability and runtime efficiency. In some cases like model free
reinforcement learning, training policies to satisfy a reach-avoid specification can be time-consuming, and
once trained, deploying these policies in a new environment may fail if system dynamics changes. Here, we
propose that, rather than retraining a policy from scratch, obtaining a high-fidelity ReLU surrogate model
is more achievable. The contribution of this chapter lies in demonstrating that, given such a high-fidelity
ReLU surrogate model, we can apply convex optimization for online policy modification via path tracking
while the system is running under the old policy to address distribution shifts. This approach provides
a practical, deployable solution for online adjustments, as the convex optimization is efficient and can
promptly modify actions during live operation. We argue that this method is preferable to full retraining
the policy via the assumed high fidelity ReLU surrogate model, as it is fast and thus enables immediate,
on-the-fly adaptation of the previously trained policy. The material in this chapter is based on the following
publication.
• Hashemi, Navid, Justin Ruths, and Jyotirmoy V. Deshmukh. "Convex Optimization-based Policy
Adaptation to Compensate for Distributional Shifts." In 2023 62nd IEEE Conference on Decision and
Control (CDC), pp. 5376-5383. IEEE, 2023.
Chapter 5. This chapter introduces a scalable, learning-based statistical reachability analysis for stochastic
systems. Traditional methods for statistical reachability often lack data efficiency, which limits their ability
to provide reliable probabilistic guarantees—an issue that intensifies with distribution shifts. One of the
5
main contributions in this chapter is to provide a data-efficient reachability analysis technique by including
conformal inference [174] in reachability which provides scalability, accuracy, runtime and data efficiency
into the process. Notably, our approach maintains manageable data consumption even when addressing
distribution shifts [33]. The content of this chapter is based on the following publication.
• Hashemi, Navid, Xin Qin, Lars Lindemann, and Jyotirmoy V. Deshmukh. "Data-driven reachability
analysis of stochastic dynamical systems with conformal inference." In 2023 62nd IEEE Conference
on Decision and Control (CDC), pp. 3102-3109. IEEE, 2023.
• Hashemi, Navid, Lars Lindemann, and Jyotirmoy V. Deshmukh. "Statistical Reachability Analysis of
Stochastic Cyber-Physical Systems under Distribution Shift." IEEE Transactions on Computer-Aided
Design of Integrated Circuits and Systems (TCAD), https://doi.org/10.1109/TCAD.2024.3438072
• Hashemi, Navid, Lars Lindemann, and Jyotirmoy V. Deshmukh." PCA-DDReach: A Statistical Learning Based Reachability Analysis for Stochastic Dynamical Systems via Conformal Inference. " In
Proceedings of the AAAI Conference on Artificial Intelligence, volume 39. (under review)
Chapter 6. Concludes by summarizing the thesis results, placing them in perspective, and suggesting
directions for future research.
6
Chapter 2
Deterministic Formal Verification Framework for STL
2.1 Introduction
Learning-enabled components (LECs) offer the promise of data-driven control, and hence they are becoming
popular in many Cyber-physical system, CPS, applications. Among LECs, controllers trained using deep
learning are becoming popular due to the advances in techniques like deep reinforcement learning and deep
imitation learning. On one hand, the use of such LECs has the potential of achieving human level decision
making in tasks like autonomous driving, aircraft collision avoidance, and control for aerial vehicles. On
the other hand, the use of deep neural network (DNN)-based controllers raises serious concerns of safety.
Reasoning about DNNs is a challenge because DNNs are highly nonlinear [157], and due to the nature of
data-driven control, the behavior of a DNN controller at a previously unseen state can be difficult to predict
[120]. To address this challenge, there has been significant research on verification for DNNs. Broadly,
there are two categories of verification methods; the first category considers DNN controllers in isolation
and reasons about properties such as input-output robustness [49, 91, 98], range analysis [48], symbolic
constraint propagation through DNNs [115], and overapproximate reachable set computation for DNNs
[165]. The second category of methods reasons about DNN controllers in closed-loop with a dynamical
model of the environment/plant [48, 89, 87, 92].
7
In this chapter, we also address the closed-loop verification problem. In this problem, we are typically
provided with a set of initial states and a set of unsafe states for the system, and the goal is to prove that
starting from an arbitrary initial state, no system behavior ever reaches a state in the unsafe set. However,
we extend this problem in a significant manner. First, we assume that the desired behavior of the closed-loop
system is specified as a bounded horizon Signal Temporal Logic (STL) [126] formula. Second, in contrast
to most existing closed-loop verification methods that typically assume that an analytic representation of
the system dynamics exists, we allow the system dynamics themselves to be represented as a DNN. Such
a setting is quite common in techniques such as model-based deep reinforcement learning [36, 41]. This
crucially allows us to reason about systems where the analytic representation of the system dynamics may
not be available.
The central idea in our work is a neurosymbolic verification approach: we reformulate the robust
satisfaction (referred to as robustness) of an STL formula w.r.t. a given trajectory as a feed-forward neural
network with ReLU activation functions. We call this transformation STL2NN. We show that the output of
STL2NN is positive iff the STL formula is satisfied by the trajectory. We note that the verification problem
only requires establishing that the given closed-loop dynamical system satisfies a given STL specification.
However, by posing the verification problem as that of checking robust satisfaction, it allows us to conclude
that the given DNN controller robustly satisfies the given specification.
We then show that when the DNN-controller uses ReLU activation functions, the problem of closedloop STL verification can be reduced to computing the reachable set for a ReLU-DNN. If the controller is
not a ReLU neural network, we propose a technique called Lip-Verify based on computing the Lipschitz
constant of the robustness of the given STL formula (as a function of the initial state).
To summarize, the main contributions in this chapter are:
8
1. We formulate a neuro-symbolic approach for the closed-loop verification of a DNN-controlled
dynamical system against an STL-based specification by converting the given bounded horizon
specification into a feed-forward ReLU-based DNN that we call STL2NN.
2. For data-driven plant models using ReLU activation and ReLU-activation based DNN-controllers, we
show that the verification of arbitrary bounded horizon STL properties can be reduced to computing
the reach set of the composition of the plant and controller DNNs with STL2NN.
3. For arbitrary nonlinear plant models and DNN-controllers using arbitrary activation functions, we
compute Lipschitz constant of the function composition of the system dynamics with STL robustness,
and use this to provide a sound verification result using systematic sampling.
The rest of this chapter is as follows. In Section 2.2, we present the background, primary concepts with
STL semantics and problem definition. In Section 2.3, we present the steps to characterize STL2NN. In
Section 2.4 we classify the verification problem based on the involved activation functions and propose a
verification method for each class. We also introduce a structure for formulation of verification problems
and introduce our verification toolbox. Finally, we present several case studies and experimental results for
our verification methods in Sections 2.4.4 and 2.5.1. We conclude with a discussion on related works in
Section 2.6.
2.2 Preliminaries
In this section, we first provide the mathematical notation and terminology to formulate the problem
definition. We use bold letters to indicate vectors and vector-valued functions, and calligraphic letters to
denote sets. We assume that the reader is familiar with feedforward neural networks, see [68] for a brief
review.
9
Neural Network Controlled Dynamical Systems (NNCS). Let s and u respectively denote the state
and input control variables that take values from compact sets S ⊆ R
n
and C ⊆ R
m, respectively. We use
sk (resp. uk) to denote the value of the state variable (resp. control input) at time k. We first define deep
neural network controlled systems (NNCS) as a recurrent difference equation∗
:
sk+1 = f(sk, uk), uk = η(sk). (2.1)
Here, f is assumed to be any computable function, and η is a (deep) neural network. We note that we can
include time as a state, which allows us to encode time-varying plant models as well (where the dynamics
corresponding to the time variable simply increment it by 1).
Neural Plant Models. In the model-based development paradigm, designers typically create environment
or plant models using laws of physics. However, with increasing complexity of real world environments,
the data driven control paradigm suggests the use of machine learning models like Gaussian Process [144]
or neural networks as function approximators. Such models typically take as input the values of the state
and control input variables at time k and predict the value of the state at time k + 1. In this chapter, we
focus on environment models that use deep neural networks†
. On the other hand linear time-invariant
(LTI) models can be considered as a neural network with only linear activation functions. Finally, we note
that our technique can also handle time-varying plant models such as linear time-varying models and DNN
plant models that explicitly include time as an input.
∗We note that in some modeling scenarios, the dynamical equation describing the environment may be provided as continuoustime ODEs. In this case, we assume that we can obtain a difference equation (through numerical approximations such as a zeroorder hold of the continuous dynamics). Our verification results are then applicable to the resulting discrete-time approximation.
Reasoning about behavior between sampling instants can be done using standard error analysis arguments that we do not consider
in this paper [18].
†As we see later, the STL verification technique that we formulate is compatible with using plant models that use standard
nonlinear functions, e.g. polynomials, trigonometric functions, etc. However this requires integrating our method with closed-loop
verification tools such as Polar [87] , Sherlock [48] , or NNV [165] . We will consider this integration in the future.
10
Closed-loop Model Trajectory, Task Objectives, and Safety Constraints. Given a discrete-time NNCS
as shown in (3.1), we define I ⊆ S as a set of initial states of the system. For a given initial state s0,
and a given finite time horizon K ∈ Z
>0
, a system trajectory σs0
is a function from [0, K] to S, where
σs0
(0) = s0, and for all k ∈ [0, K − 1], σs0
(k + 1) = f(sk, η(sk)). We assume that task objectives or safety
constraints of the system are specified as bounded horizon Signal Temporal Logic (STL) formulas [126]; the
syntax‡ of STL is as defined in Eq. (3.2).
φ ::= µ(s) ▷◁ 0 | φ ∧ φ | φ ∨ φ | FIφ | GIφ | φ1UIφ2 | φ1RIφ2 (2.2)
Here, µ is a function representing a linear combination of S that maps to a number in R, ▷◁∈ {<, ≤, >, ≥}
and I is a compact interval [a, b] ⊆ [0, K]. The temporal scope or horizon of an STL formula defines the
number of time-steps required in a trajectory to evaluate the formula. The horizon H(φ) of an STL formula
φ can be defined as follows:
0 if φ ≡ µ(s) ▷◁ 0
max(H(φ1), H(φ2)) if φ ≡ φ1 ◦ φ2, where ◦ ∈ {∧, ∨}
b + H(ψ) if φ = Q[a,b]ψ, where Q ∈ {G, F}
b + max(H(φ1), H(φ2)) if φ = φ1Q[a,b]φ2, where Q ∈ {U, R}
Quantitative Semantics of STL. The Boolean semantics of STL define what it means for a trajectory
to satisfy an STL formula. A detailed description of the Boolean semantics can be found in [126]. The
quantitative semantics of STL define the signed distance of the trajectory from the set of traces satisfying
or violating the formula. This signed distance is called the robustness value. There are a number of ways to
define the quantitative semantics of STL [45, 53, 147, 5]; in this chapter, we focus on the semantics from
‡We do not include the negation operator as it is possible to rewrite any STL formula in negation normal form by pushing
negations to the signal predicates [86]
11
φ ρ(φ, k) φ ρ(φ, k)
µ(s) ≥ 0 µ(sk) F[a,b]ψ max
k
′∈[k+a,k+b]
ρ(ψ, k′
)
φ1 ∧ φ2 min(ρ(φ1, k), ρ(φ2, k)) φ1U[a,b]φ2 max
k
′∈[k+a,k+b]
min
ρ(φ2, k′
), min
k
′′∈[k,k′
)
ρ(φ1, k′′)
φ1 ∨ φ2 max(ρ(φ1, k), ρ(φ2, k)) φ1R[a,b]φ2 min
k
′∈[k+a,k+b]
max
ρ(φ2, k′
), max
k
′′∈[k,k′)
ρ(φ1, k′′)
G[a,b]ψ min
k
′∈[k+a,k+b]
ρ(ψ, k′
)
Table 2.1: Quantitative Semantics of STL
[45] that we reproduce below. The robustness value ρ(φ, σs0
, k) of an STL formula φ over a trajectory σs0
at time k can be defined recursively as follows. For brevity, we omit the trajectory from the notation as it is
obvious from the context.
We note that if ρ(φ, k) > 0 the STL formula φ is satisfied at time k (from [53]).
Problem Definition. The STL verification problem can be formally stated as follows: Given an NNCS as
shown in (3.1), a set of initial conditions I, and a bounded horizon STL formula φ with H(φ) = K, show
that:
∀s0 ∈ I : ρ(φ, σs0
, 0) > 0, (2.3)
where, the time horizon for σs0
is K.
2.3 STL Robustness as a Neural Network
In this section, we describe how the robustness of a bounded horizon STL specification φ with horizon = K
over a trajectory of length K can be encoded using a neural network with ReLU activation functions. The
first observation is that the quantitative semantics of STL described in (2.1) can be recursively unfolded to
obtain a tree-like representation where the leaf nodes of the tree are evaluations of the linear predicates
at various time instants of the trajectory and non-leaf nodes are min or max operations. The second
observation is that min and max operations can be encoded using a ReLU function. We codify these
observations in the following lemmas.
12
Lemma 2.3.1. Given x, y ∈ R, min(x, y) = x − ReLU(x − y). Similarly, max(x, y) = x + ReLU(y − x).
Mapping STL Robustness to the STL2NN Neural Network. We now describe how to transform the
robustness of a given STL formula and a trajectory into a multi-layer network representation. Though
we call this structure a neural network, it is bit of a misnomer as there is no learning involved. The name
STL2NN is thus reflective of the fact that the structure of the graphical representation that we obtain
resembles a multi-layer neural network.
The input layer of STL2NN is the set of all time points in the trajectory (thus the input layer is of width
K + 1). The second layer is the application of the m possible unique predicates {µ1, . . . , µm} in φ to the
(K + 1) possible time points. Thus, the output of this layer is of maximum dimension m ×(K + 1). Let this
layer be called the predicate layer, and we denote each node in this layer by two integers: (i, k), indicating
the value of µi(sk).
For a trajectory σs0 of length K, there are at most K + 1 time points at which these m predicates can
be evaluated. Thus, there are at most m × (K + 1) number of unique evaluations of the m predicates at
(K + 1) time instants.
Given the predicates (i, k) the Algorithm 1 constructs the next segment of STL2NN. Line 5 returns
the node corresponding to µi(sk), i.e. the node labeled (i, k) in the second layer of the network. Then the
network structure follows the structure of the STL formula. For example, in Line 7, we obtain the nodes
corresponding to φ1 and φ2 at time k, and these nodes are then input to the ReLU unit that outputs the min
of these two nodes (as defined in Lemma (2.3.1)). The interesting case is for temporal operators (Lines 15,12).
A temporal operator represents the min or max or combination thereof of sub-formulas over different
time instants. Suppose the scope of the temporal operator requires performing a min over ℓ different time
instants, then in the function MinMaxNode, we arrange these ℓ inputs in a balanced binary tree of depth
at most O(log ℓ) and repeatedly use the ReLU unit defined in Lemma 2.3.1. Executing Algorithm 1, will
13
Algorithm 1: Recursive formulation of a ReLU directed acyclic graph DAG for an STL formula
1 Function MinMaxNode(n1, . . . , nℓ
, type, k)
2 • Construct balanced binary tree with leafnodes ni
, i = 1, . . . , ℓ.
3 • Apply Lemma 2.3.1 to obtain a ReLU network of depth O(logℓ) for min or max as defined
by input type
4 Function Node(φ,k)
5 case φ = µi(sk) ≥ 0 return (i, k)
6 case φ = φ1 ∧ φ2
7 return MinMaxNode
Node(φ1,k),Node(φ2,k), min, k
8 case φ = φ1 ∨ φ2
9 return MinMaxNode
Node(φ1,k),Node(φ2,k), max, k
10 case φ = G[a,b]φ
11 return MinMaxNode (Node(φ,k + a), . . . ,
12 Node(φ, k + b), min, k)
13 case φ = F[a,b]φ
14 return MinMaxNode (Node(φ,k + a), . . . ,
15 Node(φ, k + b), max, k)
16 case φ = φ1U[a,b]φ2 or φ1R[a,b]φ2
17 similar to previous cases, following the robustness computation as defined in (2.1)
lead to a directed acyclic graph, DAG network with depth at most O(log K|φ|) (as there are at most |φ|
operators in φ) and each operator can require a network of depth at most O(log K).
DAG to Feedforward NN. Algorithm 1 creates a DAG-like structure where nodes can be arranged in
layers (corresponding to the distance from the leaf nodes). However, this is strictly not the structure of
a feed-forward neural network as some layers have connections that are skipped. To make the structure
strictly adhere to layer-by-layer computation, whenever an (i, k) node is required in a deeper layer, we can
add neurons (corresponding to an identity function) that copy the value of the (i, k) node to the next layer.
Observe that the addition of these additional neurons does not increase the depth of the network. Thus,
each layer in our STL2NN has a mixture of ReLU-activation neurons and neurons with linear (identity)
activations. We note that the position of these neurons corresponding to the linear and ReLU activations
can be separated through a process of modifying the weight matrices for each layer. This separation of the
linear and ReLU layers is crucial in downstream verification algorithms. We call this neural network with
redundant linear activations and reordered neurons as STL2NN. We codify the argument for the depth of
STL2NN in Lemma 2.3.2. The proof follows from our construction of STL2NN in Algorithm 1.
Lemma 2.3.2. Given a STL formula φ, the depth of STL2NN increases logarithmically with the length of the
trajectory, σs0
and linearly in the size of the formula.
Theorem 2.3.3. Given the STL formula, φ, the controller, uk = η(sk) and the resultant trajectory σs0
,
ρ(φ, σs0
, 0) ≥ 0 ⇐⇒ STL2NN(σs0
) ≥ 0
Lemma 2.3.2 shows, given a complex STL specification, although the width of STL2NN can be high, its
depth is logarithmic in the size of the trajectory,
2.4 STL Verification Using Reachability
2.4.1 Verification for STL Properties on DNN Models
Figure 2.1: The structure of ENFN (that encodes the
computation of the trajectory σs0
starting from initial
state s0) composed with STL2NN (that encodes the
computation of the robustness of the STL formula φ
w.r.t. σs0
).
In this section, based on the structure of the plant
model and the kind of activation functions used
by the DNN controller, we will look at two different methods for verification of STL. First, we
present a reachability analysis-based approach that
is both sound and complete which we present specifically for ReLU DNN based models and controllers.
We then introduce an alternative, sampling-based
method that also ensures soundness and completeness which is applicable on general DNN-based models
and controllers.
15
2.4.2 End-to-End Neural Feedback Network (ENFN)
Recall the dynamical system from (3.1), we can rewrite it simply as sk+1 = f(sk, η(sk)). From this equation,
we construct a neural network that we call the end-to-end neural feedback network. The name is derived
from the structure in which we arrange the neurons. The input to ENFN is the initial state s0. ENFN has K
blocks, where for k ≥ 1, the output of the k − 1
th block returns,
s0, s1, s2, · · · , sk−2, sk−1.
The k
th block essentially takes the k outputs of the previous block and “copies” them to the block output
using neuron layers that implement identity maps. The (k + 1)th output of the block is the computation of
sk+1 using the difference equation stated above. Thus, ENFN has a shape where each subsequent block
has an equal number of additional number of neurons (equal to the dimension of the state variable). The
output of the Kth block can be then passed off to the input of STL2NN. Recall that the output of STL2NN
is a single real number representing the robustness value of φ w.r.t. the trajectory σs0
. We pictorially
represent this in Figure 2.1. We remark that this structure is important and has a non-trivial bearing on the
verification methods that we develop in this chapter as we observe later. ENFN thus encodes a function
Rφ : R
n → R, where,
Rφ(s0) = ρ(φ, σs0
, 0).
Given a ENFN, we can use it to solve the problem outlined in (2.3). In rest of this section, we show how we
can use a generic neural network reachability analyzer to perform STL verification.
2.4.3 STL Verification for DNN Models Using Reachability Analysis
The following assumption encodes the fact that neural network reachability analyzers are sound.
16
Assumption 2.4.1. Consider a neural network N where the space of permitted inputs is X. Then a neural
network reachability analyzer produces as output a set Y s.t. ∀x ∈ X : N(x) ∈ Y .
The following theorem establishes how we can reduce the problem of STL verification to the problem
of NN reachability.
Theorem 2.4.2. Given an NNCS as described in (3.1), a set of initial conditions I ⊆ S, and a bounded horizon
STL formula φ, we can reduce the problem of checking (2.3) to a NN reachability analysis problem.
Proof. From Section 2.4.2, we know that given an NNCS, an initial state s0, and a bounded horizon STL formula φ, the ENFN function Rφ encodes ρ(φ, σs0
, 0). From Assumption 2.4.1, if we have an NN reachability
analyzer, given the set I we can obtain a set (say Y ) s.t. ∀s0 ∈ I : Rφ(s0) ∈ Y . We can then compute
inf Y and check if it is positive. If yes, the STL formula is satisfied by the set of all initial conditions for the
given NNCS.
While our method is broadly applicable with any NN reachability analysis tool that can compute sound
over-approximations of the set of outputs for a given input set [184, 48, 87, 89], in this chapter, we focus on
a specific type of NN reachability analysis tool that uses the notion of star sets for performing reachability
analysis [164]. The approach in [164] performs exact reachability analysis for DNNs with ReLU activation.
Thus, for the star sets-based technique to be applicable, we require that our given plant model either
uses ReLU activations or is a linear model, and our controller uses ReLU activations. We can then apply star
sets-based reachability by propagating the set I through the ENFN to compute the range of robustness
values through exact star based reachability analysis. This verification is sound and complete since the
output range for Rφ can be accurately computed.§
Exact star based reachability can be time inefficient due to exponential accumulation of star sets through
reachability analysis process. In this case, we can apply the approximate star based technique [164] to
§The ENFN that we compute is a combination of linear (purelin) and ReLU activation functions. This implies ENFN is not a
pure ReLU neural network, but the exact star set reachability algorithm in [164] can be updated to include purelin activations
and the exact reachability analysis can be still performed on the ENFN structure.
17
Reach Property I Property Model NN Controller NN Depth σs0
/ STL2NN Robustness Verified? Run-time
Tech. Horizon structure structure Range
E φ1 I1 100 [3,10,10,10,2] [2,50,1,2,1,2,1,1] 900 / 15 layers [0.0150, 0.0161 ] Yes 1167 sec
A φ1 I1 100 [3,10,10,10,2] [2,50,1,2,1,2,1,1] 900 / 15 layers [ -0.0319, 0.0256 ] No 35 sec
E φ2 I2 50 [4,10,10,3] [3,100,1,2,1,2,1,1] 400 / 14 layers [ 0.0057630, 0.005813 ] Yes 1903 sec
A φ2 I2 50 [4,10,10,3] [3,100,1,2,1,2,1,1] 400 / 14 layers [ -0.0308, 0.0136 ] No 43 sec
E φ3 I3 53 [7,10,10,6] [5,20,20,20,1] 265 / 9 layers [ 15.9077, 38.4651 ] Yes 259.7 sec
A φ3 I3 53 [7,10,10,6] [5,20,20,20,1] 265 / 9 layers [ 11.6941, 41.6572 ] Yes 23.82 sec
E φ3 I3 53 [7,6] (LTI) [5,20,20,20,1] 159 / 9 layers [ 17.0904, 38.9601 ] Yes 139.4 sec
A φ3 I3 53 [7,6] (LTI) [5,20,20,20,1] 159 / 9 layers [ 17.0904, 38.9744 ] Yes 5.5 sec
E φ4 I4 32 [4,8,2] [2,8,2] 64 / 10 layers [ 0.1033, 0.2000 ] Yes 77.78 sec
E φ5 I4 35 [4,8,2] [2,8,2] 70 / 16 layers [ 0.1033, 0.1735 ] Yes 1955 sec
E φ6 I4 35 [4,8,2] [2,8,2] 70 / 18 layers [ 0.1032, 0.1462 ] Yes 2368.8 sec
E φ7 I4 36 [4,8,2] [2,8,2] 72 / 18 layers [ -0.3271, 0.1040 ] Rejected 1023 sec
Table 2.2: Shows the result of verification utilizing the reachability analysis on ENFN. In each case study
we consider, both the plant model and the controller are ReLU-FFNNs. We use the abbreviations A for
Approximate star-set-based reachability, and E for the Exact star-set-based technique. No parallel computing
is used and no set partitioning is applied.
ENFNs to perform verification. Although this verification procedure is sound, it lacks completeness as it
may not be possible to algorithmically eliminate the conservatism of the approximate reachability analysis.
Remark 2.4.3. Our method can be used to verify closed-loop NN-based models with arbitrary activation
functions, e.g., using tools such as the CROWN library that support such general activations [184]. We
provide an alternate (sound and complete) verification procedure for NNs with such arbitrary activations
in the next section.
2.4.4 Experimental Evaluation
In this section, we experimentally evaluate the efficacy of our verification method using the ENFN/ STL2NN
networks that we have formulated. The case studies we present use physics-based plant models with
symbolic closed-form expressions for the dynamics. However, we only use the system dynamics to generate
data to train the ReLU-NN plant models. We train the plant ReLU-NNs by minimizing a loss function
defined as the difference between the ReLU-NN predicted next state and and the actual next state (obtained
by simulating the physics- based model).
18
Figure 2.2: Trajectories for the model
NFC-2d. The NN-controller is required
to drive trajectories to visit region P3
within time k, where k ∈ [75, 100]. The
controller should also avoid unsafe sets
P1, P2 at all times.
Figure 2.3: Trajectories for the model NFC-3d. The NNcontroller is required to drive the model to the region P3
within time k, where k ∈ [35, 50], while avoiding the unsafe
sets P1, P2 at all times.
2D Nonlinear Feedback Control Model (NFC-2d).: The dynamical system that was simulated to train a ReLUplant model is shown in Eq. (2.4). We used numerical simulation with a fixed step solver (sample period
0.1s) to mimic a discrete-time system. The number of tuples (sk, ak, sk+1) used in training is 106
. Initial
states I1 are as shown in (2.4). Figure 2.2 shows sample trajectories of this model.
x˙ 1
x˙ 2
=
−x1
0.1 + (x1 + x2)
2
(u + x1)
0.1 + (x1 + x2)
2
, I1 =
s0 |
0.8
0.4
≤ s0 ≤
0.9
0.5
(2.4)
Here, we want to verify the STL formula φ1 specified in Eq. (2.5) that encodes the spec requiring system to
reach region P3 within a specified interval while-avoiding regions P1 and P2.
φ1 = F[75,100] (s ∈ P3) ∧ G[1,100] (s /∈ P2) ∧ G[1,100] (s /∈ P1) (2.5)
3D Nonlinear Feedback Control Model. Figure. 2.3 shows the trajectories of the nonlinear dynamical model
shown in Eq. (2.6). The neural plant model is trained on 1.35 × 106
transitions after discretizing the model
with a sample time of 0.1 seconds.
x˙ 1
x˙ 2
x˙ 3
=
x
3
1 + x2
x
3
2 + x3
u
, I2 =
s0 |
0.35
−0.35
0.35
≤ s0 ≤
0.4
−0.3
0.4
(2.6)
We want to verify if the controller satisfies the formula φ2:
φ2 = F[35,50] [s ∈ P3]
^
G[0,50] [s ̸∈ P2]
^
G[0,50] [s ̸∈ P1] (2.7)
Adaptive Cruise Control. The third model we consider is a ReLU-NN plant model fit to a discretization of
the 6-dimensional adaptive cruise control model described in Eq. (2.8) (sample time was 0.1s). We used
1.5 × 106
samples to train the plant model. In (2.8), the constant µ denotes a coefficient of friction set to
10−4
.
x˙ 1
x˙ 2
x˙ 3
x˙ 4
x˙ 5
x˙ 6
=
x2
x3
−2x3 − 4 − µx2
2
x5
x6
−2x6 + 2u − µx2
4
, I3 =
s0
90
32
0
10
30
0
≤ s0 ≤
110
32.2
0
11
30.2
0
(2.8)
20
Figure 2.4: Trajectories of model used to plant and controller models used to show scalability of verification
in the spec size. We propose 4 different sets that we require the object to visit sequentially.
The NN-controller receives the observation, O = [Vset, tgap, x5(k), x1(k) − x4(k), x2(k) − x5(k)] and
returns the optimal control to satisfy the proposed STL specification (2.9) within 50 time steps. Here
Vset = 30, and tgap = 1.4 are fixed.
φ3 = G[0,50]
[x1(k) − x4(k) < dsafe] =⇒ F[0,3]
x1(k) − x4(k) > d∗
safe (2.9)
where , d
∗
safe = 12 + 1.4x5(k) and dsafe = 10 + 1.4x5(k).
If the friction coefficient µ = 0, then the model becomes an LTI system, and we can perform STL
verification of the NNCS (where the plant has LTI dynamics).
Testing scalability with specification size. To evaluate verification scalability as a function of the specification
size, we construct a simple 2D ReLU-NN plant model and ReLU-based controller. The model does not
have a physical meaning – we show the trajectories in Figure 2.4. The initial set of states I4 is a box with
lower left coordinates (1, 1) and upper right coordinates (2, 2). We gradually increase the complexity of
the STL spec to analyze the runtime for verification using the exact-star reachability technique. The STL
2
formulas we use as verification targets are shown in (2.10)-(2.12). We want to show that the formula in
(2.13) is not satisfied by all initial states. The difference in formula φ6 and φ7 is in the time interval colored
in red in φ7.
φ4 = F[5,8]
s ∈ P1 ∧ F[20,24]s ∈ P4
(2.10)
φ5 = F[5,8]
s ∈ P1 ∧ F[6,11]
s ∈ P2 ∧ F[12,16]s ∈ P4
(2.11)
φ6 = F[5,8]
s ∈ P1 ∧ F[6,11]
s ∈ P2 ∧ F[6,7]
s ∈ P3 ∧ F[8,9]s ∈ P4
(2.12)
φ7 = F[5,8]
s ∈ P1 ∧ F[6,11]
s ∈ P2 ∧ F[6,7]
s ∈ P3 ∧ F[9,10]s ∈ P4
(2.13)
Practical Exponential Stability. We next consider a linear plant model (Eq. (2.14)) and a ReLU-NN controller
that tries to stabilize the system to satisfy a practical exponential stability criterion as expressed by the STL
formula φ8 in (2.15); note that in φ8, P6 ⊂ P5 ⊂ · · · ⊂ P2 ⊂ P1.
sk+1 = Ask + Bu(sk), A =
0.9105 −0.9718
0.5177 0.3552
, B =
0.21 0.05
0.15 −0.28
. (2.14)
φ8 = G[9,16] [s ∈ P1] ∧ G[17,24] [s ∈ P2] ∧ G[25,32] [s ∈ P3] ∧
G[33,40] [s ∈ P4] ∧ G[41,43] [s ∈ P5] ∧ G[44,60] [s ∈ P6]
(2.15)
The architecture of NN controller is [2, 30, 30, 30, 2]. We attempt to verify if the controller satisfies the
mentioned STL specification for the initial state set I = {(x, y)|x ∈ [−50, −40], y ∈ [85, 95]}. The regions
P5 and P6 are small. This requires us to apply exact-star technique. On the other hand the exact-star is time
consuming on I but partitioning I in 25 partitions is quite helpful to verify within a reasonable running
time. The results are presented in Table
Figure 2.5: Trajectories for NNCS shown in Eq. (2.14).
2.4.5 Verification for STL Properties on ODE Models
The recent noticeable achievements in reachability of NNCS [47, 87, 165, 8], with ODE models, provide the
opportunity to compute accurate reach-tubes for the NNCS trajectories. We can introduce this reach-tube
to STL2NN and compute for a provably guaranteed robustness range. In case the left-bound of robustness
range is positive, we have verified our NN controller for the specified STL property. This verification
process is sound but not complete even if the computed reach-tube is exact, and we apply exact reachability
analysis on STL2NN. Appendix A explains the source of conservatism in detail. This is conventional
to apply set partitioning to compensate for this source of conservatism. Thus, the set of initial states
I will be partitioned to smaller sub-regions and the reachability analysis for NNCS will be applied on
every sub-region. The set partitioning can be applied either uniformly or adaptively. Finally, the resultant
reach-tubes will be introduced to STL2NN separately for the verification process.
23
I Robustness Range Run-time Verified? I Robustness Range Run-time Verified?
x ∈ [−50, −48], y ∈ [85, 87] [0.2380, 0.3173] 3199.4 sec Yes x ∈ [−50, −48], y ∈ [87, 89] [0.2414, 0.2884] 28.22 sec Yes
x ∈ [−50, −48], y ∈ [89, 91] [0.2383, 0.2638] 13.19 sec Yes x ∈ [−50, −48], y ∈ [91, 93] [0.2151, 0.2429] 114.3 sec Yes
x ∈ [−50, −48], y ∈ [93, 95] [0.1927, 0.2244] 199.8 sec Yes x ∈ [−48, −46], y ∈ [85, 87] [0.2539, 0.3130] 287.3 sec Yes
x ∈ [−48, −46], y ∈ [87, 89] [0.2435, 0.2881] 2708.5 sec Yes x ∈ [−48, −46], y ∈ [89, 91] [0.2376, 0.2669] 2645.6 sec Yes
x ∈ [−48, −46], y ∈ [91, 93] [0.2183, 0.2468] 45.8 sec Yes x ∈ [−48, −46], y ∈ [93, 95] [0.1934, 0.2228] 6.8 sec Yes
x ∈ [−46, −44], y ∈ [85, 87] [0.2824, 0.3140] 467.9 sec Yes x ∈ [−46, −44], y ∈ [87, 89] [0.2550, 0.2916] 1230.9 sec Yes
x ∈ [−46, −44], y ∈ [89, 91] [0.2386, 0.2680] 1408.4 sec Yes x ∈ [−46, −44], y ∈ [91, 93] [0.2138, 0.2432] 610.1sec Yes
x ∈ [−46, −44], y ∈ [93, 95] [0.1889, 0.2183] 16.7 sec Yes x ∈ [−44, −42], y ∈ [85, 87] [0.2839, 0.3133] 7.9 sec Yes
x ∈ [−44, −42], y ∈ [87, 89] [0.2590, 0.2884] 36.4 sec Yes x ∈ [−44, −42], y ∈ [89, 91] [0.2341, 0.2635] 152.2 sec Yes
x ∈ [−44, −42], y ∈ [91, 93] [0.2092, 0.2386] 796.4 sec Yes x ∈ [−44, −42], y ∈ [93, 95] [0.1844, 0.2138] 1282.8 sec Yes
x ∈ [−42, −40], y ∈ [85, 87] [0.2793, 0.3087] 7.6 sec Yes x ∈ [−42, −40], y ∈ [87, 89] [0.2545, 0.2839] 5.8 sec Yes
x ∈ [−42, −40], y ∈ [89, 91] [0.2296, 0.2590] 6sec Yes x ∈ [−42, −40], y ∈ [91, 93] [0.2047, 0.2341] 45.8sec Yes
x ∈ [−42, −40], y ∈ [93, 95] [0.1798, 0.2092] 142.4sec Yes — — — —
Table 2.3: Verifying φ8 against NNCS in Eq. (2.14) utilizing exact-star reachability on ENFN. Initial state
set I = {(x, y)|x ∈ [−50, −40], y ∈ [85, 95]}. The trajectory encoding has 180 layers and STL2NN has 8
layers.No parallel computing is utilized.
2.4.6 Example
In this section, we consider a trained controller for a quadcopter. The quadcopter is modeled in companion
with its set of initial conditions as,
p˙x
p˙y
p˙z
v˙x
v˙y
v˙z
=
vx
vy
vz
g tan(θ)
−g tan(ϕ)
g − τ
, I =
s0 |
0.0638
0.0638
−0.0213
0
0
0
≤ s0 ≤
0.1063
0.1063
0.0213
0
0
0
and we consider the trajectories are simulated with time-step T = 0.05 sec. The controller is an NN
controller with tanh() activation function and structure [7, 10, 3, 3] for this problem. Here, the control
inputs, u1(k) ∈ [−0.1, 0.1], u2(k) ∈ [−0.1, 0.1], u3(k) ∈ [7.81, 11.81]. The parameter g = 9.81 is the
gravity. Assuming p = [x(k), y(k), z(k)]⊤ then the STL specifications for this example is introduced as
follows,
φ :=
F[0,25] (p ∈ E1) ∨ F[0,25] (p ∈ E2)
∧ G[0,50] (p ̸∈ E3). (2.16)
2
Regions E1, E2 and E3 are demonstrated in Figure 2.6, where,
p ∈ E1 → (x(k) − 0.085)2 + (y(k) − 0.34)2 + z(k)
2 ≤ 0.0027,
p ∈ E2 → (x(k) − 0.34)2 + (y(k) − 0.085)2 + z(k)
2 ≤ 0.0027,
p /∈ E3 → (x(k) − 0.2125)2 + (y(k) − 0.2125)2 + z(k)
2 ≥ 0.0027,
(2.17)
Figure 2.6: This figure shows the trajectories of quadcopter
driven by a controller which has been trained to satisfy the
STL specification in (2.16)
In this section, we utilize the controller we trained and verify that deployment of this NN controller on the
dynamics satisfies the STL specification
in Eq. (2.16)
¶
. We employ the CORA
toolbox [8] to compute for an accurate
over-approximation of reach-tube of this
NNCS with time step ts = 0.05 sec. We
firstly partition I in 1000 cubical equal
sub-regions and compute the reach-tubes
originated from every sub-region. We
next, apply these reach-tubes to STL2NN to compute the robustness range. One of the restrictions of
STL2NN is that it only accepts linear barriers for predicates. Therefore, we accept one more level of
conservatism and replace E1, E2 with their cubical inner-approximations and call them E
c
1
, E
c
2
respectively.
We also replace the E3 with its cubical outer-approximation and denote it by E
c
3
. We finally replace the STL
property φ with a new one,
φ
c
:=
F[0,25] (p ∈ Ec
1
) ∨ F[0,25] (p ∈ Ec
2
)
∧ G[0,50] (p ̸∈ Ec
3
). (2.18)
¶Note the the STL properties in this work, are presented for discrete times.
2
We note, the events p ∈ Ec
1
, p ∈ Ec
2
and p /∈ Ec
3
imply p ∈ E1, p ∈ E2 and p /∈ E3 respectively. Thus,
satisfaction of φ
c
concludes satisfaction of φ. We also present the linear predicates as follows:
p ∈ Ec
1
:
(px − 0.055 > 0) ∧ (0.115 − px > 0) ∧ (py − 0.31 > 0)
∧ (0.37 − py > 0) ∧ (pz + 0.03 > 0) ∧ (0.03 − pz > 0)
p ∈ Ec
2
:
(px − 0.31 > 0) ∧ (0.37 − px > 0) ∧ (py − 0.055 > 0)
∧ (0.115 − py > 0) ∧ (pz + 0.03 > 0) ∧ (0.03 − pz > 0)
p /∈ Ec
3
:
(px − 0.2645 > 0) ∨ (0.1605 − px > 0) ∨ (py − 0.2645 > 0)
∨ (0.1605 − py > 0) ∨ (pz − 0.03 > 0) ∨ (−0.03 − pz > 0)
.
The set of initial conditions is partitioned uniformly in 1000 sub-region. Figure 2.7a plots the left-bounds
on robustness range for all sub-regions versus their index number. Figure 2.7b and Figure 2.7c also show
the run-times for reach-tube computation on trajectories and approx-star reachability analysis on STL2NN
for all sub-regions, respectively.
2.5 STL Verification of DNN Models Using Sampling
Consider the ENFN structure described in Sec. 2.4. If we compute the Lipschitz constant of the ENFN
function Rφ w.r.t. the initial state s0 (that is locally valid over the set of initial states), then we can use this
to obtain a certificate that all initial states satisfy the given STL formula. The basic idea is to sample the
set of initial states dense enough, and check if the value of Rφ is positive enough at all sample points. This
intuition is formalized in Theorem 2.5.1.
26
0 100 200 300 400 500 600 700 800 900 1000
index
0.004
0.009
0.014
0.019
0.024
(a)
0 100 200 300 400 500 600 700 800 900 1000
index
1
2
3
4
(b)
0 100 200 300 400 500 600 700 800 900 1000
index
3
5
7
9
STL2NN
(c)
Figure 2.8: For a given cubical sub-region S(i, j, k), i, j, k ∈ [10], one can compute its index
by index(i, j, k) = 100(i − 1) + 10(j − 1) + k and the location of its center center(i, j, k) =
[0.0649, 0.0649, −0.0202]⊤ + [i − 1, j − 1, k − 1]⊤ × 0.00213. Therefore, computing the index for
a subregion, (a) presents the left-bound on robustness range (b) presents the run-time to compute the
reach-tube originated from the sub-region (c) presents the run-time to compute the robustness range on
STL2NN with approx-star reachability analysis.
27
Theorem 2.5.1. Assume Lloc is the Lipschitz constant of the function f : R
n → R on the domain [ℓ, u] where
ℓ, u ∈ R
n
. We denote the set of all 2
n
vertices on [ℓ, u] by V ([ℓ, u]) and we assume,
∀x ∈ V ([ℓ, u]) : f(x) > 0
Given the certificates ρ1 > Lloc and ρ2 = min
x∈V ([ℓ,u])
f(x),
∥u − ℓ∥2 <
ρ2
ρ1
=⇒ ∀x ∈ [ℓ, u] : f(x) > 0
Proof. Consider x ∈ [ℓ, u] and x
∗ ∈ V ([ℓ, u]), f(x
∗
) = ρ2, this implies, ∥x
∗ − x∥2 ≤ ∥u − ℓ∥2. We know
ρ1 is an upper bound for the local Lipschitz constant Lloc, therefore,
∥ρ2 − f(x)∥2 ≤ ρ1∥x
∗ − x∥2 ≤ ρ1∥u − ℓ∥2 =⇒ ∥u − ℓ∥2 ≥
∥ρ2 − f(x)∥2
ρ1
We will prove by contradiction that f(x) > 0. Assume f(x) ≤ 0. Since ρ2 > 0, we can conclude
∥u − ℓ∥2 ≥ ρ2/ρ1 which contradicts our assumption.
The certificate ρ1, is an upper bound for the local Lipschitz constant of Rφ(s0) with respect to the
initial state, s0 ∈ I. If a bounded certificate ρ1 is accessible then we can utilize Theorem 2.5.1 for a sound
and complete verification of controllers. Based on Theorem 2.5.1 we are required to select an ϵ > 0 to build
an ϵ-net over the set of initial states. For every single hypercube in the ϵ-net we compute ρ2 and check
whether ϵ < ρ2/ρ1. In case this condition doesn’t hold we create a finer grid on the mentioned hypercube.
We terminate the process, return the counter example and reject the controller if we face ρ2 < 0. Otherwise,
we continue until ϵ < ρ2/ρ1 for every single hypercube and verify the controller.
The efficiency of this technique is highly related to the tightness of the upper-bound ρ1. For instance,
if the upper bound is large, to obtain a verification result, ϵ tends to be small, leading to a large number
28
Algorithm 2: Recursive algorithm for verification with local Lipschitz certificates.
1 Function Lip − Verify(φ, ρ1, I, model, controller, N,status)
2 − construct a uniform ϵ-net of N hyper-cubes over I
ϵ-net =
S
N
i=1
[ℓi
, ui
]
, ϵi = ∥ui − ℓi∥2
3 while true do
4 for i ← 1 to N do
5 if status ̸= Solved then
6 ρ1,status ← ENFN−Lip−SDP([ℓi
, ui
])
7
8 ρ2 ← min
x∈V ([ℓi, ui])
Rφ(x)
9 if ρ2 < 0 then
10 return Falsified + counter example
11 terminate;
12 else
13 if (status ̸= Solved) ∨ ((status = Solved) ∧ (ϵi > ρ2/ρ1)) then
14 return Lip − Verify(φ, ρ1, [ℓi
, ui
],
15 model, controller, N,status)
16 return Verified
of points over which the required condition needs to be checked. Thus, the key problem here is to get
an accurate estimate of the local Lipschitz constant for a given neural network. This problem has been
addressed by a variety of techniques in the literature [140, 19, 109, 95] but there is limitation on their time
and memory scalability, while also being mostly limited to ReLU activation functions. We use the convex
programming technique presented in [61, 83]; convex programming scales to larger neural networks with
low conservatism. The proposed technique [61, 83] in its current formulation is not directly applicable to
our verification process but we can apply it with small modifications; the details are discussed in Appendix
B. We call this specific formulation of proposed convex programming in [83, 61] as ENFN−Lip−SDP(). We
remark that this method is applicable to plant and controller models that are neural networks with arbitrary
activation functions or plants that have linear models. However, in this chapter, we have restricted our
empirical validation to NN-based plant models.
29
Figure 2.9: Trajectories for the model in Section 2.5.1. The controller is required to drive the model such
that it visits the region P1 after 3 time-steps but no later than 6 time-steps. Once it reaches region P1 it is
required to visit P2 after 9 time steps but no later than 13 time steps.
2.5.1 Experimental Validation
We now present results of applying our Lipschitz constant computation-based technique for verification.
Simple tanh-activation model. In this case study, we consider plant and controller NN models with one
hidden layer consisting of 5 neurons. The plant is fully observable and has 2 states and a 2-d control input.
Both models use the hyperbolic tangent activation function. In this problem we verify the STL formula
shown in (2.19). Here, the specified set of initial states is provided as I = [1, 2] × [1, 2].
φ10 = F[3,6]
[s ∈ P1] ∧ F[9,13] [s ∈ P2]
. (2.19)
The ENFN model contains a total of 38 hidden tangent hyperbolic (+ linear) layers for encoding the trajectory
and 10 hidden ReLU (+ linear) layers for STL2NN. We first partition I into 4 squares (see Figure 2.12) where
3
ϵ =
√
2/2 for each set. We employ the CROWN library [175] for the pre-activation bound computation on
each trajectory layer. We also utilize the approx-star technique [164] for pre-activation bound computation
on the STL2NN. Then we utilize convex programming approach (ENFN−Lip−SDP()) that we developed
with MOSEK [15] and YALMIP [122] solvers to compute ρ1. We also utilize STL2NN for each partition
to compute the certificate ρ2. The results are shown in Figure 2.12. In the first round of partitioning, the
desired condition ϵ ≤ ρ2/ρ1 does not hold for any partition. This implies we must partition all 4 subsets
(see Figure 2.12). In the next round of partitioning, ϵ =
√
2/4 and 8 subset from 16 are verified satisfying
(ϵ ≤ ρ2/ρ1). For the remaining 8 non-verified subsets we apply the third round of partitioning resulting in
ϵ =
√
2/8 where all of them become verified. Figure 2 presents the flow of recursive algorithm 2 With 3
recursive calls. The verification concludes after 90 seconds with this algorithm.
Linear Time-Varying Plant. Figure 2.13 shows the evolution of control feedback system with the following
LTV model, where,
A(τ ) =
0 1
−2 − sin(τ ) −1
, B =
1
0
, T =
2π
30
, sk =
xk
yk
which is the Zero-Order Hold discretization of s˙ = A(t)s + Bη(s), with sampling time T. The controller is
a neural network of structure [2, 7, 7, 1], with tanh() activation function and is expected to satisfy,
φ = F[30,35] [xk ≤ −0.5]^
F[37,43] [xk ≥ −0.4]^
F[45,50] [xk ≤ 0]
^
F[32,38] [yk ≥ 1]^
G[1,50] [yk − xk ≤ 5.5] .
Since the parallel computing does not support recursive algorithms, we manually partition
I := (
s0 |
−1, −1
⊤
≤ s0 ≤
0, 0
⊤
)
,
31
0 8 16 24 32 40 48 56 64
0
1
2
3
4
5
6
7
8
9
Figure 2.10: Shows the verification run-time in linear time-varying plant, for every 64 partitions of the set
of initial states. We apply algorithm 2 on every partition and conclude the verification. The red line shows
the average run time on the partitions which is approximately 6 minutes. On the other hand, the maximum
run-time is 8 minutes and 40 seconds.
into 64 equal subsets, and run Algorithm 2 on every set. The average verification time for each sub-problem
was around 6 minutes. See Figure 2.10 for more detail.
Neural Network Controlled Quadrotor System. Figure 2.14 shows the evolution of control feedback system for
a quadrotor. The model is trained on the following dynamics with T = 0.05 and trajectories start from I,
p˙x
p˙y
p˙z
v˙x
v˙y
v˙z
=
vx
vy
vz
g tan(θ)
−g tan(ϕ)
τ − g
, I =
s0 |
0.0638
0.0638
−0.0213
0
0
0
≤ s0 ≤
0.1063
0.1063
0.0213
0
0
0
32
0 8 16 24 32 40 48 56 64
0
500
1000
1500
125
Figure 2.11: Shows the verification run-time in neural network controlled quadrotor System, for every 64
partitions of the set of initial states. We apply algorithm 2 on every partition and conclude the verification.
This figure shows the verification run time on the majority of partitions is approximately 40 minutes. The
red line shows the average run-time which is approximately 125 minutes.
We train a tanh-activation NN on these dynamics using 3.696 × 106
training samples. The NN plant model
has 9 inputs (6 states and 3 control inputs) and two hidden layers with 10 neurons each. The controller also
uses tanh activations and two hidden layers with 10 and 3 neurons. We wish to verify the formula:
φ = F[1,20] ([s ∈ E1] ∨ [s ∈ E2])^
G[1,20] [s /∈ E3]
Here the controller is time-varying and its first bias vector linearly varies with time. (b1(k) = ¯b1 + kδb1).
Since the parallel computing does not support recursive algorithms, we manually partition I into 64 equal
cubes and run the algorithm 2 on every one of them. The approximate running time for the majority of
them was 40 minutes. But for some regions the verification was time-consuming. See Figure 2.11 for more
detail.
33
(a)
(b)
(c)
Figure 2.12: [Recursive partitioning for STL verification with
local Lipschitz computation: (a) Presents the certificates ρ1 for
each partition at each step. These certificates are computed
with convex programming utilizing MOSEK and YALMIP. The
results are rounded upwards. (b) Presents the certificates ρ2
for each partition at each step. These certificates are computed
over ENFN. The results are rounded downwards. (c) shows the
verification results. The result is 0 when ϵ > ρ2/ρ1 and is 1
when ϵ < ρ2/ρ1. Obviously 1 indicates the controller is verified over the subset. We partitioned I in three steps to receive
1 on every partition. The diameter ϵ is √
2/2,
√
2/4,
√
2/8
for the biggest, medium and smallest partitions respectively.
Figure 2.13: Shows the evolution of states
in a control feedback system for proposed
LTV model in 50 time steps.
Figure 2.14: Shows the evolution of states
for the quadrotor example, the quadrotor is controlled with a pre-trained tanh
FFNN controller, the quadrotor is planned
to avoid E3 but requires to meet on of destinations E1 or E2 within 20 time steps.
34
2.6 Related Works
Related work. Safety verification of NNCS is an emerging area that has seen considerable recent activity.
We can classify research in this area into two categories: (1) verification for open-loop controllers, and (2)
closed-loop verification. The authors in [98] present a SMT-based verification of open-loop controllers
called ReluPlex by extending the simplex method to handle ReLU activation functions. In [99], they extend
their technique to more general class of activation functions. In [137], the authors propose verification
for multi-layer perceptrons by converting it to a satisfiability query over a Boolean combination of linear
arithmetic constraints. The authors in [48, 103, 123] consider verification techniques for NNs based on
Mixed Integer Linear Programming.
Closer to our work are closed-loop verification techniques for NNCS. The authors in [156] propose
sound and complete verification for discrete plants based on Satisfiability Modulo Convex (SMC) techniques.
The authors in [47] propose a fast and efficient algorithm that does regressive polynomial inference on
ReLU activation functions. For verification on NNCS with ODE models, the authors in [89] propose a
reachability analysis technique on nonlinear plants employing Taylor series and Bernstein Polynomials.
This method is not restricted to ReLU activations and can adjust the level of conservatism. In the future,
we will use such methods for STL verification when the symbolic plant dynamics are known.
35
Chapter 3
Learning Based Neurosymbolic Algorithm for STL Control Synthesis
3.1 Introduction
The use of Neural Networks (NN) for feedback control enables data-driven control design for highly
nonlinear environments. The literature about training NN-based controllers or neuro-controllers is plentiful,
e.g., see [28, 117, 36, 56, 85]. Techniques to synthesize neural controllers (including deep RL methods) largely
focus on optimizing cost functions that are constructed from user-defined state-based rewards or costs.
These rewards are often proxies for desirable long-range behavior of the system and can be error-prone
[129, 153, 14] and often require careful design [71, 154].
On the other hand, in most engineered safety-critical systems, the desired behavior can be described
by a set of spatio-temporal task-objectives, e.g., [52, 27]. For example, consider modeling a mobile robot
where the system must reach region R1 before reaching region R2, while avoiding an obstacle region. Such
spatio-temporal task objectives can be expressed in the mathematically precise and symbolic formalism of
Discrete-Time variant (DT-STL) [51] of Signal Temporal Logic (STL) [126] . A key advantage of DT-STL is
that for any DT-STL specification and a system trajectory, we can efficiently compute the robustness degree,
i.e., the approximate signed distance of the trajectory from the set of trajectories satisfying/violating the
specification [45, 51].
36
Control design with DT-STL specifications using the robustness degree as an objective function to be
optimized is an approach that brings together two separate threads: (1) smooth approximations to the
robustness degree of STL specifications [67, 130] enabling the use of STL robustness in gradient-based
learning of open-loop control policies, and (2) representation of the robustness as a computation graph
allowing its use in training neural controllers using back-propagation [182, 113, 77, 80]. While existing
methods have demonstrated some success in training open-loop NN policies [113, 112], and also closedloop NN policies [77, 80, 182], several key limitations still remain. Consider the problem of planning the
trajectory of a UAV in a complex, GPS-denied urban environment; here, it is essential that the planned
trajectory span several minutes while avoiding obstacles and reaching several sequential goals [176, 132,
168]. However, none of the existing methods to synthesize closed-loop (or even open-loop) policies scale to
handle long-horizon tasks.
A key reason for this is the inherent computational challenge in dealing with long-horizon specifications.
Training open-loop policies treats the sequence of optimal control actions over the trajectory horizon as
decision variables to maximize the robustness of the given STL property. Typical approaches use gradientdescent where in each iteration, the new control actions (i.e. the open-loop policy) are computed using the
gradient of the DT-STL property w.r.t. the control actions. If the temporal horizon of the DT-STL property
is K, then, this in turn is computed using back-propagation of the DT-STL robustness value through a
computation graph representing the composition of the DT-STL robustness computation graph and K
copies of the system dynamics. Similarly, if we seek to train closed-loop (neural) feedback control policies
using gradient descent, then we can treat the one-step environment dynamics and the neural controller as
a recurrent unit that is repeated as many times as the temporal horizon of the DT-STL property. Gradient
updates to the neural controller parameters are then done by computing the gradient of the STL computation
graph composed with this RNN-like structure. In both cases, if the temporal horizon of φ is several hundred
steps, then gradient computation requires back-propagation through those many steps. These procedures
37
are quite similar to the ones used for training an RNN with many recurrent units. It is well-known that
back-propagation through RNNs with many recurrent units faces problems of vanishing and exploding
gradients [68, 20]. To address these limitations, we propose a sampling-based approximation of the gradient
of the objective function (i.e. the STL property), that is particularly effective when dealing with behaviors
over large time-horizons. Our key idea is to approximate the gradient during back-propagation by an
approximation scheme similar to the idea of dropout layers used in deep neural networks [155]. The main
idea of dropout layers is to probabilistically set the output of some neurons in the layer to zero in order to
prevent over-fitting. We do a similar trick: in each training iteration we pick some recurrent units to be
"frozen", i.e., we use older fixed values of the NN parameters for the frozen layers, effectively approximating
the gradient propagation through those layers. We show that this can improve training of NN controllers
by at least an order of magnitude. Specifically, we reduce training times from hours to minutes, and can also
train reactive planners for task objectives that have large time horizons.
To summarize, we make the following contributions:
1. We propose smooth versions of computation graphs representing the robustness degree computation of a DT-STL specification over the trajectory of a dynamical system. Our computation graph
guarantees that it lower bounds the robustness value with a tunable degree of approximation, and it
is noticeably more efficient in gradient computation and robustness computation comparing to the
existing state of the art in the literature.
2. We develop a sampling-based approach, inspired by dropout [155], to approximate the gradient of
DT-STL robustness w.r.t. the NN controller parameters. Emphasizing the time-steps that contribute
the most to the gradient, our method randomly samples time points over the trajectory. We utilize the
structure of the STL formula and the current system trajectory to decide which time-points represent
critical information for the gradient.
38
3. We develop a back-propagation method that uses a combination of the proposed sampling approach
and the smooth version of the robustness degree of a DT-STL specification to train NN controllers.
4. We demonstrate the efficacy of our approach on high dimensional nonlinear dynamical systems
involving long-horizon and complex temporal specifications.
3.1.1 Organization and Notations
The rest of the chapter is organized as follows. In Section 3.2, we introduce the notation and the problem
definition. We propose our learning-based control synthesis algorithms in Section 3.4, present experimental
evaluation in Section 3.8, and conclude in Section 3.9. We use bold letters to indicate vectors and vectorvalued functions, and calligraphic letters to denote sets. A feed forward neural network (NN) with ℓ hidden
layers is denoted by the vector [n0, n1, · · · nℓ+1], where n0 denotes the number of inputs, nℓ+1 is the number
of outputs and for all i ∈ 1, 2, · · · , ℓ, and ni denotes the width of i
th hidden layer. The notation x
u∼ X
implies the random variable x is sampled uniformly from the set X .
3.2 Preliminaries
NN Feedback Control Systems (NNFCS). Let s and a denote the state and action variables that take
values from compact sets S ⊆ R
n
and C ⊆ R
m, respectively. We use sk (resp. ak) to denote the value of the
state (resp. action) at time k ∈ Z
≥0
. We define a neural network controlled system (NNFCS) as a recurrent
difference equation
sk+1 = f(sk, ak), (3.1)
where ak = πθ(sk, k) is the control policy. We assume that the control policy is a parameterized function
πθ, where θ is a vector of parameters that takes values in Θ. Later in the chapter, we instantiate the specific
parametric form using a neural network for the controller. That is, given a fixed vector of parameters θ, the
39
parametric control policy πθ returns an action ak as a function of the current state sk ∈ S and time k, i.e.,
ak = πθ(sk, k)
∗
.
Figure 3.1: Shows an illustration of the recurrent structure for the control feedback system.
Closed-Loop Model Trajectory. For a
discrete-time NNFCS as shown in Eq. (3.1), and
a set of designated initial states I ⊆ S, under a
pre-defined feedback policy πθ, Eq. (3.1) represents an autonomous discrete-time dynamical
system. For a given initial state s0 ∈ I, a system trajectory σ[s0 ; θ] is a function mapping time instants k ∈ 0, 1, · · · , K to S, where σ[s0 ; θ](k) = sk,
and for all k ∈ 0, 1, · · · , K − 1, sk+1 = f(sk, πθ(sk, k)). In case the dependence to θ is obvious from the
context, we utilize the notation sk to refer to σ[s0 ; θ](k). Here, K is some integer called the trajectory
horizon, and the exact value of K depends on the DT-STL task objective that the closed-loop model trajectories must satisfy. The computation graph for this trajectory is a recurrent structure. Figure 3.1 shows an
illustration of this structure and its similarity to an RNN.
Task Objectives and Safety Constraints. We assume that task objectives and safety constraints are
specified using the syntax of Discrete-Time variant (DT-STL) [51] of Signal Temporal Logic (STL) [126] .
We assume that DT-STL formulas are specified in positive normal form, i.e., all negations are pushed to the
signal predicates †
φ = h(s) ▷◁ 0 | φ1 ∧ φ2 | φ1 ∨ φ2 | φ1UIφ2 | φ1RIφ2 (3.2)
where UI and RI are the timed until and release operators, ▷◁∈ {≤, <, >, ≥}, and h is a function from S to
R. In this work, since we use discrete-time semantics for STL (referred to as DT-STL), the time interval I is
∗Our proposed feedback policy explicitly uses time as an input. This approach is motivated by the need to satisfy temporal
tasks, which requires time awareness for better decision-making.
†Any formula in DT-STL can be converted to a formula in positive normal form using DeMorgan’s laws and the duality
between the Until and Release operators)
40
a bounded interval of integers, i.e., I = [a, b], a ≤ b. The timed eventually (FI ) and always (GI ) operators
can be syntactically defined through until and release. That is, FIφ ≡ ⊤UIφ and GIφ ≡ ⊥RIφ where ⊤
and ⊥ represent true and false. The formal semantics of DT-STL over discrete-time trajectories have been
previously presented in [51]. We briefly recall them here.
Boolean Semantics and Formula Horizon. We denote the formula φ being true at time k in trajectory
σ[s0 ; θ] by σ[s0 ; θ], k |= φ. We say that σ[s0 ; θ], k |= h(s) ▷◁ 0 iff h(σ[s0 ; θ](k)) ▷◁ 0. The semantics of the
Boolean operations (∧, ∨) follow standard logical semantics of conjunctions and disjunctions, respectively.
For temporal operators, we say σ[s0 ; θ], k |= φ1UIφ2 is true if there is a time k
′
, s.t. k
′ − k ∈ I where φ2
is true and for all times k
′′ ∈ [k, k′
), φ1 is true. Similarly, σ[s0 ; θ], k |= φ1RIφ2 is true if for all times k
′
with k
′ −k ∈ I, φ2 is true, or there exists some time k
′′ ∈ [k, k′
) such that φ1 was true. The temporal scope
or horizon of a DT-STL formula defines the last time-step required to evaluate the formula, σ[s0 ; θ], 0 |= φ
(see [126]). For example, the temporal scope of the formula F[0,3](x > 0) is 3, and that of the formula
F[0,3]G[0,9](x > 0) is 3 + 9 = 12. We also set the horizon of trajectory equivalent to the horizon of formula,
as we plan to monitor the satisfaction of the formula by the trajectory.
Quantitative Semantics (Robustness value) of DT-STL. Quantitative semantics of DT-STL roughly
define a signed distance of a given trajectory from the set of trajectories satisfying or violating the given DTSTL formula. There are many alternative semantics proposed in the literature [45, 51, 147, 5]; in this chapter,
we focus on the semantics from [45] that are shown in Table 3.1. The robustness value ρ(φ, σ[s0 ; θ], k) of
a DT-STL formula φ over a trajectory σ[s0 ; θ] at time k is defined recursively as reported in Table 3.1‡
.
We note that if ρ(φ, k) > 0 the DT-STL formula φ is satisfied at time k, and we say that the formula φ
is satisfied by a trajectory if ρ(φ, 0) > 0.
‡
For brevity, we omit the trajectory from the notation, as it is obvious from the context.
41
φ ρ(φ, k) φ ρ(φ, k)
h(sk) ≥ 0 h(sk) F[a,b]ψ max
k
′∈[k+a,k+b]
ρ(ψ, k′
)
φ1 ∧ φ2 min(ρ(φ1, k), ρ(φ2, k)) φ1U[a,b]φ2 max
k
′∈[k+a,k+b]
min
ρ(φ2, k′
), min
k
′′∈[k,k′
)
ρ(φ1, k′′)
φ1 ∨ φ2 max(ρ(φ1, k), ρ(φ2, k)) φ1R[a,b]φ2 min
k
′∈[k+a,k+b]
max
ρ(φ2, k′
), max
k
′′∈[k,k′)
ρ(φ1, k′′)
G[a,b]ψ min
k
′∈[k+a,k+b]
ρ(ψ, k′
)
Table 3.1: Quantitative Semantics of STL
Prior Smooth Quantitative Semantics for DT-STL. To address non-differentiability of the robust
semantics of STL, there have been a few alternate definitions of smooth approximations of the robustness
in the literature. The initial proposal for this improvement is provided by [130]. Later the authors in [67]
proposed another smooth semantics which in addition is a guaranteed lower bound for the robustness
value that can be even more advantageous computationally. We denote the smooth robustness of trajectory
σ[s0 ; θ] for temporal specification φ, with ρ˜(φ, σ[s0 ; θ], 0).
Problem Definition. In this chapter, we provide model-based algorithms to learn a policy πθ
⋆ that
maximizes the degree to which certain task objectives and safety constraints are satisfied. In particular,
we wish to learn a neural network (NN) control policy πθ (or equivalently the parameter values θ), s.t. for
any initial state s0 ∈ I, using the control policy πθ, the trajectory obtained, i.e., σ[s0 ; θ] satisfies a given
DT-STL formula φ. In other words, our ultimate goal is to solve the optimization problem shown in Eq. (3.3).
For brevity, we use F(sk, k ; θ) to denote f (sk, πθ (sk, k)).
θ
∗ = arg maxθ (mins0∈I [ρ(φ, σ[s0 ; θ], 0)]), s.t. ∀(k ∈ Z ∧ 1 ≤ k < K) : sk+1 = F(sk, k ; θ).(3.3)
However, ensuring that the robustness value is positive for all s0 ∈ I is computationally challenging.
Therefore, we relax the problem to maximizing the min value of the robustness only over a set of states Iˆ
sampled from the initial states I, i.e., θ
∗ ≈ arg maxθ
mins0∈Ib [ρ(φ, σ[s0 ; θ], 0)]
. We solve this problem
42
Figure 3.2: Shows a comparison between ReLU,
swish and softplus activation function. This figure demonstrates the fact that softplus activation
function is a guaranteed upper bound for ReLU,
and swish activation function is a guaranteed lower
bound for ReLU activation function.
Figure 3.3: This figure shows a comparison between
a non-smooth objective function and its smooth
approximation. This approximation can be very
helpful to improve the efficiency of optimization.
In this thesis STL2NN is an example of the nonsmooth objective function and LB4TL is its smooth
approximation.
using algorithms based on stochastic gradient descent followed by statistical verification to obtain highconfidence control policies.
3.3 Contribution of STL2NN in Learning Based Synthesis
The existing smooth semantics for gradient computation [67, 130, 113] perform backward computation
(Robustness → trajectory) and forward (trajectory → Robustness) on a computation graph that is generated
directly based on STL tree [45, 53](recursive formula in (2.1)). These computation graphs are not efficient
for forward and backward computation when the specification is highly complex. This is because the
size of the tree gets very large which requires us to traverse over every single branch to evaluate the
robustness value, and the machine should perform these computations one by one in a series, However,
STL2NN, converts the STL tree to generate a feedforward ReLU neural network as a computation graph
whose depth grows logarithmically with the complexity of DT-STL specification. The main contribution
here is that STL2NN vectorizes this process, replacing the requirement to traverse over the branches with
multiplication of a weight matrix to a vector and then utilizing ReLU activation layers to post-process the
multiplication result. This makes the robustness computation noticeably more convenient for the machine,
specifically for GPU, which is known for matrix multiplication. This makes back-propagation more feasible
43
0 2 4 6 8
1
2
3
4
5
6
7
8
Figure 3.4: Shows the sets considered in the specification and also three different satisfying and violating trajectories.
Figure 3.5: Shows the comparison between the robustness computation runtime between STL2NN
and STLCG [113]. This comparison shows the noticeable improvent for computation efficiency provided by vectorization.
for complex specifications. Therefore, the way it formulates the robustness (Feedforward NN) facilitates
the back-propagation process, by enabling vectorized computation of the gradient. The following example
justifies this contribution clearly.
Example 1. In this example, we justify the contribution of vectorization provided by STL2NN through a
comparison with a famous toolbox for this purpose [113]. We consider the following STL formula,
φ =
_
K
i=1
F[0,i−1] [p ∈ Goal1] ∨ F[i,K]
[p ∈ Goal2]
∧ G[0,K]
[p /∈ Unsafe set]
which implies the position of the robot p ∈ R
2
, should be planned such that over the task horizon the
robot first visits Goal1 and then visits Goal2 and the robot will always avoid the unsafe set. The sets are
also introduced in Figure 3.4 in the presence of some random satisfying and violating trajectories. Here,
we compare the average runtime for robustness computation over 1000 random trajectories by increasing
the horizon in range K ∈ [5, 100]. Figure 3.5 shows the comparison. This figure shows for all horizons
44
Figure 3.6: Shows the schematic for the computation graph for STL2NN that given the trajectory σ
returns the robustness value. This graph repesents
a FFNN that vectorizes the robustness computation
providing noticable level of efficiency in a training
algorithm. To maintain its vectorized nature, in
each activation layer, we locate the ReLU activation function in the buttom and Linear activation
sunctions on the top.
Figure 3.7: Shows the schematic for LB4TL which
is a smotth guaranteed lowe bound for STL2NN
that given the trajectory σ returns the robustness
value. To maintain its vectorized nature, in each
layer of activations, we locate the softplus activation
functions on the bottom and the swish activation
functions in the middle, and the Linear activation
functions on the top.
selected from K ∈ [5, 100], robustness computation via STL2NN is at least 10 times faster than robustness
computation via STLCG toolbox and in some instances STL2NN is about 70 times faster. This contribution
is the direct impact of vectorization, which computes the robustness by matrix multiplication. Figure 3.6
also presents this vectorized computation map. This contribution is specially important for training, as a
training algorithm requires forward and backward computation thousands of times to meet convergence.
This implies, using STL2NN can reduce the training time from hours to minutes.
However, STL2NN is exactly identical to the non-smooth robustness introduced in Eq. (2.1). This implies
it will produce unpleasant gradients in a training process for control synthesis and provide us an inefficient
control synthesis algorithm. This is suggested in the literature [67, 130], that using a smooth approximation
for the robustness can improve the efficiency (see Figure 3.3). Therefore, we approximate STL2NN with
a smooth function. It is also preferable that this smooth approximation also acts as a guaranteed lower
bound for the robustness. Ensuring its positivity guarantees that the real robustness is also positive. Thus,
we approximate STL2NN with a smooth under-approximator, and we call this smooth function as LB4TL.
We also propose a thorough and clear comparison between the performance of LB4TL and the previous
45
smooth semantics, available in the literature. To generate LB4TL, we firstly replace ReLU activations in
the min() operation (see lemma 2.3.1) with the softplus activation function defined as:
softplus(a1 − a2 ; b) = 1
b
log
1 + e
b(a1−a2)
, b > 0.
and, we also replace the ReLU activation functions contributing in max() operation (see lemma 2.3.1) with
the swish activation function:
swish(a1 − a2 ; b) = a1 − a2
1 + e−b(a1−a2)
, b > 0.
Figure 3.2 shows a demonstration of ReLU,swish and softplus activation functions altogether and Figure
3.7 also shows a schematic of, LB4TL that is an end-to-end vectorized computation graph constructed with
swish and softplus layers. Next, we show that LB4TL is a guaranteed lower-bound for STL2NN. To that
end, we start with the following proposition,
Proposition 3.3.1. For any two real numbers x, y ∈ R:
y + swish(x − y) ≤ max(x, y), x − softplus(x − y) ≤ min(x, y).
Proof. We know for all x, y ∈ R, max(x, y) = y + ReLU(x − y) and min(x, y) = x − ReLU(x − y). We
also know, for all z ∈ R, swish(z) < ReLU(z) and softplus(z) > ReLU(z) [141].
The result of the Proposition. 3.3.1, can be utilized to propose the following result,
Proposition 3.3.2. Assume φ1 and φ2 are two different DT-STL formulas, and assume
L1 = LB4TL(σ
θ
s0
, φ1, 0 ; b) ≤ R1 = ρ(σ
θ
s0
, φ1, 0)
and L2 = LB4TL(σ
θ
s0
, φ2, 0 ; b) ≤ R2 = ρ(σ
θ
s0
, φ2, 0).
46
Then we can conclude,
LB4TL(σ
θ
s0
, φ1 ∨ φ2, 0 ; b) ≤ ρ(σ
θ
s0
, φ1 ∨ φ2, 0),
LB4TL(σ
θ
s0
, φ1 ∧ φ2, 0 ; b) ≤ ρ(σ
θ
s0
, φ1 ∧ φ2, 0).
Proof. Based on Proposition. 3.3.1, we know LB4TL(σ
θ
s0
, φ1 ∨ φ2, 0 ; b) = L2 + swish(L1 − L2 ; b) ≤
max(L1, L2) and ρ(σ
θ
s0
, φ1 ∨ φ2, 0) = max(R1, R2) We also know L1 ≤ R1, and L2 ≤ R2 which implies
max(L1, L2) ≤ max(R1, R2). Therefore, we can conclude LB4TL(σ
θ
s0
, φ1∨φ2, 0 ; b) ≤ ρ(σ
θ
s0
, φ1∨φ2, 0).
Likewise, from Proposition. 3.3.1, we know LB4TL(σ
θ
s0
, φ1 ∧ φ2, 0; b) = L1 − softplus(L1 − L2 ; b) ≤
min(L1, L2) and ρ(σ
θ
s0
, φ1 ∧ φ2, 0) = min(R1, R2). We also know L1 ≤ R1, and L2 ≤ R2 which implies
min(L1, L2) ≤ min(R1, R2). Therefore, we can conclude LB4TL(σ
θ
s0
, φ1 ∧ φ2, 0 ; b) ≤ ρ(σ
θ
s0
, φ1 ∧
φ2, 0).
The result of Proposition. 3.3.2 can also be utilized to introduce the following result.
Lemma 3.3.3. For any formula φ belonging to DT-STL framework in positive normal form, and b > 0, for
a given trajectory σ
θ
s0 = s0, s1, . . . , sK, if LB4TL(σ
θ
s0
, φ, 0; b) > 0, then σ
θ
s0
|= φ, where LB4TL is a
computation graph for DT-STL robustness, but with the softplus activation utilized in min operation and the
swish activation employed in max.
Proof. Let’s denote the set of predicates that are contributing to robustness computation of a DT-STL
formula φ as, A = {a1 > 0, a2 > 0, · · · , aN > 0}. The DT-STL formula, φ can be expanded in terms
of ∨ and ∧ operations applied to predicates, a > 0 where (a > 0) ∈ A (see [45]). In addition, for all
predicates (a > 0) ∈ A, we have LB4TL(σ
θ
s0
,(a > 0), 0 ; b) ≤ ρ(σ
θ
s0
,(a > 0), 0), since both are equal to
a. Therefore, we can start from the predicates (a > 0) ∈ A, and utilize the result of the Proposition. 3.3.2
to conclude for any DT-STL formula φ we have, LB4TL(σ
θ
s0
, φ, 0; b) ≤ ρ(σ
θ
s0
, φ, 0).
Remark 3.3.4. The hyperparameter b in the softplus and swish activation functions controls how closely
these functions approximate the ReLU activation function. A larger value of b offers a closer approximation
47
Figure 3.8: This figure shows the symbolic trajectory generated by NN feedback controller, and the computation
graph for DT-STL robustness. The DTSTL robustness is presented as a Nerosymbolic computation graph [77] via
ReLU and Linear activation functions.
Algorithm 3: Standard Gradient Ascent Backpropagation via smooth semantics
1 Initialize variables
2 while
min
s0∈Ib
ρ(φ, σ[s0 ; θ
(j)
], 0)
< ρ¯
do
3 s0 ← Sample from Ib
4 σ[s0 ; θ
(j)
] ← Simulate using policy πθ
(j)
5 d ← ∇θρ˜(σ[s0 θ
(j)
]) using σ[s0 ; θ
(j)
]
6 θ
(j+1) ← θ
(j) + Adam(d)
7 j ← j + 1
to ReLU, though it decreases the smoothness of the objective function. Conversely, a smaller b increases
smoothness but provides a weaker approximation. This introduces a trade-off in choosing b: excessively
high values can hinder training due to non-smoothness, while excessively low values may impair training
efficiency due to poor approximation.
3.4 Training Neural Network Control Policies
Our solution strategy is to treat each time-step of the given dynamical equation in Eq. (3.1) as a recurrent
unit. We then sequentially compose or unroll as many units as required by the horizon of the DT-STL
specification.
Example 2. Assume a one-step dynamics with scalar state, x ∈ R and scalar feedback control policy
ak = πθ(xk) as, xk+1 = f(xk, πθ(xk)). If the specification is F[0,3](x > 0), then, we use 3 instances of
f(xk, πθ(xk)) by setting the output of the k
th unit to be the input of the (k + 1)th unit. This unrolled
structure implicitly contains the system trajectory, σ[x0, ; θ] starting from some initial state x0 of the
system. The unrolled structure essentially represents the symbolic trajectory, where each recurrent unit
4
shares the NN parameters of the controller (see Figure. 3.8 for more detail). By composing this structure
with the robustness semantics representing the given DT-STL specification φ; we have a computation
graph that maps the initial state of the system in Eq. (3.1) to the robustness degree of φ. Thus, training the
parameters of this resulting structure to guarantee that its output is positive (for all initial states) guarantees
that each system trajectory satisfies φ.
Controller Synthesis as an Optimization Problem. In order to train the controller, we solve the
following problem:
θ
∗ = arg maxθ
min
s0∈Ib
[ρ(φ, σ[s0 ; θ], 0)]
, s.t. σ[s0 ; θ](k + 1) = F(sk, k ; θ). (3.4)
We thus wish to maximize the expected value of the robustness for trajectories starting in states uniformly
sampled from the set of initial states. An approximate solution for this optimization problem is to train the
NN controller using a vanilla back-propagation algorithm to compute the gradient of the objective function
for a subset of randomly sampled initial states I ⊂ I b , and updates the parameters of the neural network
controller using this gradient. We utilize LB4TL for this purpose. See Algorithm 3 for more detail.
Remark 3.4.1. A training-based solution to the optimization problem does not guarantee that the specification
is satisfied for all initial states s0 ∈ I. To tackle this, we can use a methodology like [77] that uses reachability
analysis to verify the synthesized controller. However, given the longer time-horizon, this method may
face computational challenges. An alternative approach is to eschew deterministic guarantees, and instead
obtain probabilistic guarantees (see Sec. 3.8.5).
49
3.5 Numerical Evaluation for Training with LB4TL
Managing Uncertainty in NN Feedback Controllers versus Open-loop Alternatives. In this section,
we empirically demonstrate that feedback NN controllers are more robust to noise and uncertainties
compared to open loop controllers, even when the feedback controller is not trained in the presence of
noise. We then show that if we train the feedback controller after introducing a stochastic noise in the
original system dynamics, the performance vastly outperforms open loop control trained in the presence of
noise. To illustrate, we use the example proposed in [113] but add a stochastic noise and also include some
uncertainty on the choice of initial condition. The modified system dynamics are shown in Eq. (3.5), where
the sampling time dt = 0.1.
sk+1 = sk + ukdt + c1vk, s0 = [−1, −1]⊤ + c2η. (3.5)
Here, for k ∈ 1, · · · , K and vk and η are both i.i.d. random variables with standard distribution, e.g.,
η, vk ∼ N (02×1, I2×2), where I2×2 is the identity matrix. In this example, the desired objective for the
system is:
φ8 = F[0,44]
G[0,5] (s ∈ Goal1)
^
F[0,44]
G[0,5] (s ∈ Goal2)
^
G[0,49]¬ (s ∈ Unsafe), (3.6)
where the regions Goal1,Goal2, and Unsafe are illustrated in Figure 3.9 §
.
§We also add the following updates to the original problem presented in [113]:
• We omit the requirement sK = [1, 1]⊤ from both control problems for simplicity.
• We increase the saturation bound of the controller to uk ≤ 4
√
2. We also apply this condition to the open-loop controller
proposed in [113].
Expt. Controller Initial states Sampled initial Controller Runtime (secs) Num.
bounds I states Ib NN LB4TL build Training iterations
1 (v, γ) ∈ [0, 5] × [−
π
4
,
π
4
] (x, y, θ)(0) ∈ (6, 8)2 × [ −3π
4
, −π
2
] corners, center [4, 10, 2] 1.31 72.9 750
2
(u1, u2) ∈ [−0.1, 0.1]2 (x, y, z, vx, vy, vz)(0) ∈
corners, center [7, 10, 3] 0.13 354.36 16950
u3 ∈ [7.81, 11, 81] [0.02, 0.05] × [0, 0.05] × [0]4
Table 3.2: Description of the case studies and the training results. Both used tanh activation.
In the first step of the experiment, we train the feedback and open-loop controllers in the absence of the
noise (c1 = c2 = 0) and deploy the controllers on the noisy environment (c1 = 0.0314, c2 = 0.0005) and
compare their success rate¶
. In the second step of the experiment, we train both the feedback and open-loop
controllers on the noisy environment (c1 = 0.0314, c2 = 0.0005), and also deploy them in the noisy
environment (c1 = 0.0314, c2 = 0.0005) to compare their success rate. If we train the open-loop and NN
feedback controllers in the absence of noise, then the controllers will respectively satisfy the specification
in 3.7% and 65.4% of trials when deployed in a noisy environment. However, we can substantially improve
performance of feedback controllers by training in the presence of noise; here, the controllers satisfy the
spec 5.4% and 94.4% of trials respectively showing that the NN feedback controller has better overall
performance in the presence of noise, which open-loop control lacks.
We utilized STLCG PyTorch toolbox [113] to solve for the open-loop controller. We also utilized the
standard gradient ascent proposed in Algorithm 3 (via LB4TL as smooth semantics ρ˜) for training the
feedback controllers. We let the training process in Algorithm 3 and STLCG to run for 5000 iterations, and
then terminated the training process. Figure 3.9 shows the simulation of trained controllers when they are
deployed to the noisy environment. Here, we generate 100 random trajectories via trained controllers and
plot them in green and red when they satisfy or violate the specification, respectively. However, all trained
feedback controllers in this chapter exhibit the same level of robustness to noise.
We now present an evaluation of the performance of our proposed method∥ on two case studies.
¶To report the success rate, we deploy the controllers 1000 different times and compute the percentage of the trajectories that
satisfy the specification.
∥To increase the efficiency of our training process, we check for min
s0∈Ib
ρ(φ, σθ
j
s0
, 0)
once every 50 gradient steps to make a
decision on terminating the training algorithm.
51
-1.5 -1 -0.5 0 0.5 1 1.5
-1.5
-1
-0.5
0
0.5
1
1.5
-1.5 -1 -0.5 0 0.5 1 1.5
-1.5
-1
-0.5
0
0.5
1
1.5
-1.5 -1 -0.5 0 0.5 1 1.5
-1.5
-1
-0.5
0
0.5
1
1.5
-1.5 -1 -0.5 0 0.5 1 1.5
-1.5
-1
-0.5
0
0.5
1
1.5
(a) (b) (c) (d)
Figure 3.9: This figure shows the simulation of trajectories when the trained controller is deployed on the
noisy deployment environment, both controllers are trained in the presence of noise. The trajectories of NN
feedback controller that satisfy (a) and violate (b) the specification and those of the open-loop controller
that satisfy (c) and violate (d) the specification are shown.
Here we apply our technique on two examples. The problem setting and training results for experiments
are provided in Table 3.2, and the simulation of their trained controllers are available in Figures 3.10,3.11. In
these figures, the simulation for the random initial guess of control parameters are also presented in black
color that clearly violate the specifications. The experiments are explained as follows, and the sampling
time is δt = 0.05 sec.
Experiment 1: [Sequential Goals for a Simple Car Navigation Task].
We use a standard 3-dimensional model from [182] to represent the dynamics of a simple car as follows:
x˙
y˙
˙θ
=
v cos(θ)
v sin(θ)
v
L
tan(γ)
,
v ← 2.5 tanh(0.5a1) + 2.5,
γ ← π/4 tanh(0.5a2), a1, a2 ∈ R.
(3.7)
Here (x, y) is the position, and θ is the heading angle. The controllers v, γ are the velocity and steering
angle, respectively. Assuming the outputs of the NN controller are the (unbounded) values, [a1(k), a2(k)]⊤
we secure satisfaction of our controller bounds via Eq. (3.7)(see Table 3.2). The value of b in LB4TL is 10.
52
0 2 4 6 8
0
1
2
3
4
5
6
7
8
Figure 3.10: Simulated trajectories of the NNFCS
representing control for the simple car dynamics for
the trained controller, in contrast with those for a
controller initialized with random parameter values.
The trajectories are initiated from the set of sampled
initial conditions, which is θ =
−3π
4
,
−5π
8
,
−π
2
.
Figure 3.11: Simulated trajectories for the trained
controller in comparison to the trajectories for the
NN controller initialized with random parameter
values for the quadrotor case study. Trajectories are
initiated from the set of sampled initial conditions
consisting of the corners of I and its center.
The task objective is for the car to first visit the goal region Goal1 and then visit the region Goal2.
Further, we require this sequential task to be finished in 40 time steps. However, the car should always
avoid the unsafe set Unsafe (see Figure 3.10).
This temporal task can be formalized in DT-STL framework as follows:
φ7 := F[0,40] [Goal1 ∧ F[Goal2]] ∧ G[0,40] [¬ Unsafe set]
Experiment 2: [Reach/Avoid Tasks for a Quadrotor]. We use the 6-dimensional model for a quadrotor
from [182] presented as follows:
53
x˙
y˙
z˙
v˙x
v˙y
v˙z
=
vx
vy
vz
g tan(u1)
−g tan(u2)
g − u3
,
u1 ← 0.1 tanh(0.1a1),
u2 ← 0.1 tanh(0.1a2),
u3 ← g − 2 tanh(0.1a3),
a1, a2, a3 ∈ R, g = 9.81.
(3.8)
Here, x = (x, y, z) respectively denotes the quadrotor’s positions and v = (vx, vy, vz) denote its velocities
along the three coordinate axes. The control inputs u1, u2, u3 respectively represent the pitch, roll, and
thrust inputs. Assuming the outputs of the NN controller are the (unbounded) values, [a1(k), a2(k), a3(k)]⊤
we secure satisfaction of our controller bounds (see Table 3.2 ) in Eq. (3.13). The value of parameter b in
LB4TL is 20.
The quadrotor launches from a position in the set I, and the task objective is to visit the goal set while
avoiding obstacles. The projections of the obstacle and goal sets into the quadrotor’s position states are
[−∞, 0.17] × [0.2, 0.35] × [0, 1.2] and [0.05, 0.1] × [0.5, 0.58] × [0.5, 0.7], respectively. This temporal task
can be formalized in DT-STL framework as, φ2 = G[0,35][¬ Obstacle] ∧ F[32,35][Goal].
Experiment 3: [Comparison with Exisiting Literatur]. We now compare against the smooth semantics
in [67, 130, 113] and empirically demonstrate that LB4TL outperforms them when used for training NN
controllers. We also show that with increasing complexity of the DT-STL formula, the other smooth
semantics show significant increases in runtime during gradient computation while LB4TL scales well.
Given a fixed initial guess for the control parameters and a fixed set of sampled initial states Ib, we run this
algorithm, 4 times, and we utilize the following smooth semantics, each time (b = 10).
54
1. The first one is the smooth semantics proposed in [130] that replaces the min()/ max() operators in
Eq. (2.1) with:
min( g a1, · · · , aℓ) = −
1
b
log X
ℓ
i=1
e
−bai
!
, max( g a1, · · · , aℓ) = 1
b
log X
ℓ
i=1
e
bai
!
.
2. The second one is the smooth semantics proposed in [67] that replaces min()/ max() operators in
Eq. (2.1) with:
min( g a1, · · · , aℓ) = −
1
b
log X
ℓ
i=1
e
−bai
!
, max( g a1, · · · , aℓ) = X
ℓ
i=1
aie
bai
Pℓ
i=1 e
bai
.
3. The third one is the computation graph proposed in [113]. This computation graph reformulates the
robustness semantics in an RNN like structure and utilizes this graph for back-propagation.
4. The last one is LB4TL that is introduced in this work.
We utilize Pytorch’s automatic differentiation toolbox for all the examples. We also utilize the vehicle
navigation case study with its provided specification as our case study. The mentioned DT-STL formula
can be rephrased as:
_
40
i=1
F[0,i−1] [Goal1] ∧ F[i,40][Goal2]
∧ G[0,40] [¬ Unsafe set] ,
that is a combination of 78 different future formulas, which results in a quite large size for LB4TL and is
a great candidate to showcase the superiority of LB4TL comparing to the existing smooth semantics in
terms of training runtime. We also assume a ReLU neural network with similar structure proposed in that
example.
Since the runtime of the training algorithm is also highly related to the choice of initial guess for the
controller, we repeat this experiment 5 times, and we assign a unique initial guess for the controller on
55
θ
0
[67] [130] STLCG LB4TL
Runtime( sec ) Runtime( sec ) Runtime( sec ) Runtime( sec )
1
st guess 1465 626 NF[−1.3057] 104
2
nd guess 1028 1017 NF[−1.3912] 580
3
rd guess 2444 NF[−0.0821] 2352 419
4
th guess NF[−0.1517] NF[−0.9681] NF[−0.4281] 429
5
th guess 617 NF[−1.2562] 1946 124
Average 2159 2488 3019 331
Table 3.3: Comparison between policy training with LB4TL and the other smooth robustness semantics.
all the 4 examples in a specific experiment. Table. 3.5 shows the report of training runtimes for all the
experiments. In case the proposed example of smooth semantics in an experiment is unable to solve for
a valid controller within 1 hour, we report it as NF[ρ
end], that implies the training did not finish. This
also reports the minimum robustness ρ
end = min
s0∈Ib
ρ(φ, σθ
(j)
s0
, 0)
, where j is the last iteration before
termination. Assuming the runtime for NF[.] to be 3600 sec, the average of the training runtime for the first,
second, and third objective functions are 2159, 2488, and 3019 sec (via a Core i9 CPU), respectively. This is
while the average of training runtime for our objective function (LB4TL) is 331 sec, which shows LB4TL
is a more convenient choice for the training process when the specification becomes more complex.∗∗
3.6 The Challenge of Exploding Gradient
In this methodology, we face a challenge in training the neural network controller.
Challenge: Since our computation graph resembles a recurrent structure with repeated units proportional
to the formula’s horizon, naïve gradient-based training algorithms struggle with gradient computation
when using back-propagation through the unrolled system dynamics.
In other word, the gradient computation faces the same issues of exploding gradients when dealing
with longer trajectories or higher dimensional systems. The primary issue lies in the repeated application of
∗∗Our experimental results show, for simple specifications, the performance of LB4TL and the previous smooth semantics are
similar.
56
the neural network controller along the trajectory. Neural networks can generate large derivatives during
training, which becomes problematic when neural network controllers are repeatedly applied. This repetition causes these large derivatives to multiply, exploding the gradients used to update control parameters
to meet temporal specifications. For longer trajectories, the number of multiplications increases, and in
higher-dimensional systems, the magnitude of the derivatives being multiplied is larger. Both conditions
can lead to exploding gradients, causing prohibitive numerical challenges that prohibit convergence and
prevent the training algorithm from successfully completing. In what follows, we detail our strategy to
address this issue in the middle of the training process.
3.7 Extension to Longer Horizon Temporal Tasks & Higher Dimensional
Systems
In this section, we introduce an approach to alleviate the problem of exploding gradients as a result of the
repetition for Neural Network controllers. Our solution approach is inspired by the idea of using dropout
layers [155] in training deep neural networks. In our approach, we propose a sampling-based technique,
where we only select certain time-points in the trajectory for gradient computation, while using a fixed
older control policy at the non-selected points. Our approach to gradient sampling can be also viewed
through the lens of stochastic depth, as suggested by [90], which involves sampling layers followed by
identity transformations provided in ResNet. However, our methodology differs as we employ a distinct
approach that is better suited for control synthesis within the Signal Temporal Logic (STL) framework.
Before starting our main discussion on this topic, we first provide an overview of this section,
• In section 3.7.1, we introduce the notion of gradient approximation through sampling the trace, and
justify why it is a suitable replacement for the original gradient, in case the original gradient is not
accessible (e.g. longer-horizon tasks).
57
• In section 3.7.2, we put forward the notion of critical time which states that the robustness of DT-STL
is only related to a specific time-step. We then propose the idea of including this time-step into our
gradient approximation technique.
• In section 3.7.3, we bring up the point that gradient approximation using the critical time, may in some
cases, result in failure for training. In these cases, we suggest approximating the DT-STL robustness as
a function of all the trace, that is the smooth version of the robustness semantics.
• In section 3.7.4, we explain how to approximate the gradient for both scenarios we proposed above (e.g.,
critical time & smooth semantics). We also introduce Algorithm 4 which concludes section 3.7.
3.7.1 Sampling-Based Gradient Approximation Technique
We propose to sample random time-steps in the recurrent structure shown in Figure 3.1 and at the selected
time-step, we do an operation that is similar to dropping the entire neural controller. However, approximating the gradient by dropping out the controller at several time-steps may result in inaccurate approximation.
We compensate for this by repeating our modified dropout process and computing cumulative gradients.
Restriction of dropout to sample time-steps results in less number of self multiplication of weights and
therefore alleviates the problem of exploding gradient. To ensure that the trajectory is well-defined, when
we drop out the controller unit at a selected time-step, we replace it with a constant function that returns
the evaluation of the controller unit (at that specific time-step) in the forward pass. We formalize this using
the notion of a sampled trajectory in Definition. 3.7.1.
Definition 3.7.1 (Sub-trajectory & Sampled trajectory). Consider the set of N different sampled timesteps T = {t0 = 0, t1, t2, · · · , tN } sampled from the horizon K = {0, 1, 2, · · · , K}, and also the initial
state s0, and the control parameters θ
(j)
in the gradient step j. The sub-trajectory, sub
σ[s0 ; θ
(j)
], T
=
s0, st1
, st2
, · · · , stN
is simply a selection of N states from σ[s0 ; θ
(j)
] with time-steps ti ∈ T . In other
words, for all i ∈ {0, 1, · · · , N}: sub
σ[s0 ; θ
(j)
], T
(i) = σ[s0 ; θ
(j)
](ti). Now, consider the sub-trajectory
sub
σ[s0 ; θ
(j)
], T
, and a sequence of actions a0, a1, · · · , aK−1 resulting from s0 and θ
(j)
. For any ti ∈ T ,
we drop out the NN controllers on time steps ti + 1, ti + 2, · · · , ti+1 − 1 and replace them with the actions
a1+ti
, a2+ti
, · · · ati+1−1. This provides a variant of sub-trajectory called sampled trajectory, and we denote it
by smpl
σ[s0 ; θ
(j)
], T
. In other words, for any time-step ti ∈ T , assuming the function fi+1 : S ×Θ → S
(for brevity, henceforth, we denote fi+1(s ; θ
(j)
) by f
(j)
i+1(s)):
f
(j)
i+1(s) = f(f(· · ·f( F(s, ti
; θ
(j)
), a1+ti
), a2+ti
), · · ·), ati+1−2), ati+1−1),
we have smpl
σ[s0 ; θ
(j)
], T
(0) = s0, and for all i ∈ {0, 1, · · · , N − 1}, we have,
smpl
σ[s0 ; θ
(j)
], T
(i + 1) = f
(j)
i+1
smpl
σ[s0 ; θ
(j)
], T
(i)
.
Remark 3.7.2. The sub-trajectory sub
σ[s0 ; θ
(j)
], T
with parameters θ
(j)
can also be recursively defined
as:
sub
σ[s0 ; θ
(j)
], T
(i + 1) =
F
· · ·
F
F
sub
σ[s0 ; θ
(j)
], T
(i), ti
; θ
(j)
, ti + 1 ; θ
(j)
· · ·
, ti+1 − 1 ; θ
(j)
.
Notice that the parameters θ
(j)
are referenced multiple times while in smpl
σ[s0 ; θ
(j)
], T
only once.
Figure 3.14 presents Definition 3.7.1 through visualization. This definition replaces the set of selected
nodes - on a randomly selected time-step - with its pre-computed evaluation. This set of nodes are indeed a
controller unit on the time-steps sampled to apply dropout††
. Excluding the time steps with fixed actions,
we then name the set of states on the remaining timesteps - as the sampled trajectory, and we denote it as
smpl
σ[s0 ; θ
(j)
], T
.
Example 3. Let the state and action at the time k be xk ∈ R and ak ∈ R, respectively. The feedback
controller is ak = πθ(xk, k), θ ∈ R
3
and the dynamics is also xk+1 = f(xk, ak), x0 = 1.15. Let’s also
††The set of sampled time-steps for dropout is in fact the set-difference between K and T , where T is the set of sampled times
steps that is generated to define the sampled trajector
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
-15
-10
-5
0
5
10
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
0
10
20
30
40
50
Figure 3.12: this figure shows a common challenge in using critical predicate for control synthesis. This
figure presents the robustness as a piece-wise differentiable function of control parameter θ (with resolution,
0.00001), where each differentiable segment represent a distinct critical predicate.
assume a trajectory of horizon 9 over time-domain (i.e., K = {i | 0 ≤ i ≤ 9}) with a trajectory σ[x0 ; θ] =
x0, x1, x2, x3, x4, x5, x6, x7, x8, x9. Suppose, we are in the gradient step j = 42, and in this iteration, we
want to generate a sampled trajectory with N = 3 time-steps, where, T = {0, t1 = 1, t2 = 3, t3 = 6}.
The control parameters at this gradient step are also θ
(42) = [1.2, 2.31, −0.92] that results in the control
sequence a = 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8. Given this information, we define the sampled trajectory
as the sequence smpl
σ[x0 ; θ
(42)], T
= x0, x˜1, x˜3, x˜6, where,
x˜1 = f
(42)
1
(x0) = F(x0, 0 ; θ
(42)),
x˜3 = f
(42)
2
(x˜1) = f( F(x˜1, 1 ; θ
(42)), 0.2),
x˜6 = f
(42)
3
(x˜3) = f( f( F(x˜3, 3 ; θ
(42)), 0.4), 0.5).
=⇒
where the constants 0.2, 0.4, 0.5 are the 3
rd
, 5
th, and 6
th elements in the pre-evaluated control sequence a,
respectively.
6
3.7.2 Including the Critical Predicate in Time Sampling
While it is possible to select random time-points to use in the gradient computation, in our preliminary
results, exploiting the structure of the given DT-STL formula – specifically identifying and using critical
predicates [3] – gives better results. Proposition 3.1 in [3] introduces the notion of critical predicate. Here,
we also provide this definition as follows:
Definition 3.7.3 (Critical Predicate). As the robustness degree of DT-STL is an expression consisting of
min and max of robustness values of predicates at different times, the robustness degree is consistently
equivalent to the robustness of one of the predicates h(·) at a specific time. This specific predicate h
∗ > 0
is called the critical predicate, and this specific time k
∗
is called the critical time.
Example 4. We again consider Example 2 to clarify the notion of critical predicate. In this example,
we have 4 predicates of a unique type, e.g. h(xk) = xk > 0. Thus, the robustness values of the
predicate h(x) > 0 at time points 0, 1, 2, 3 are respectively x0, x1, x2, x3. Assume the trajectory is
σ[x0 ; θ] = [x0 = 1, x1 = 2, x2 = 3, x3 = 1.5]. Since the robustness function is defined as ρ(φ, 0) =
max (h(x0), h(x1), h(x2), h(x3)), the robustness value is equivalent to h(x2). Thus, we can conclude,
the critical predicate is h
∗ = h(x2) > 0 and the critical time is k
∗ = 2.
The critical predicate and critical time of a DT-STL formula can be computed using the same algorithm
used to compute the robustness value for a given DT-STL formula. This algorithm is implemented in the
S-Taliro tool [17].
3.7.3 Safe Re-Smoothing
A difficulty in using critical predicates is that a change in controller parameter values may change the
system trajectory, which may in turn change the predicate that is critical in the robustness computation.
Specifically, if the critical predicate in one gradient step is different from the critical predicate in the
61
subsequent gradient step, our gradient ascent strategy may fail to improve the robustness value, as the
generated gradient in this gradient step is local.
Example 5. To clarify this with an example, we present a specific scenario in Figure 3.12. This figure
shows the robustness value as a non-differentiable function of control parameters, that is a piece-wise
differentiable relation where every differentiable segment represents a specific critical predicate. The system
dynamics is xk+1 = 0.8x
1.2
k − e
−4uk sin(uk)
2
, where the system starts from x0 = 1.15 and the controller is
uk = tanh(θxk). The robustness is plotted based on control parameter −1 ≤ θ ≤ 1 and is corresponding
to the formula Φ = F[0,45]
G[0,5] (x > 0)
∧ G[0,50] (1 − 10x > 0). Assume the training process is in
the 15th gradient step of back-propagation with θ = θ
(15) = 0.49698 where the critical predicate for this
control parameter is denoted by p1 := (x1 > 0). The gradient generated from the critical predicate p1
suggests increasing the value of θ, which should result in θ = θ
(16) = 0.50672. However, applying the
gradient would move the parameter value to a region of parameter space where the critical predicate is
p2 := (1 − 10x45 > 0). In this case, the gradient generated from the critical predicate p1 is local to this
gradient step, as the critical predicate shifts from p1 to p2. Our approach in this scenario is to first reduce
the learning rate. If this does not lead to an increase in the robustness value, we then transition to smooth
semantics, which takes all predicates into account. The scenario proposed in this figure shows this local
gradient may result in a drastic drop in the robustness value from 8.09 to −6.15. Therefore, the gradient of
critical predicate is useful, only if the gradient step preserves the critical predicate.
Given a predefined specification φ, a fixed initial state, differentiable controller with parameter θ, and
a differentiable model, the robustness value is a piece-wise differentiable function of control parameter,
where each differentiable segment represents a unique critical predicate (see Figure 3.13). However, the
Adam algorithm‡‡ assumes a differentiable objective function. Therefore, we utilize the critical predicate
as the objective function when we are in the differentiable segments, and we replace it with the smooth
‡‡In this chapter, we utilize MATLAB’s adamupdate() library, https://www.mathworks.com/help/deeplearning/ref/
adamupdate.html
6
Figure 3.13: This figure shows an example for
the relation between control parameters and the
resulting robustness as a piece-wise differentiable function. Assuming a fixed initial state,
every control parameter is corresponding to a
simulated trajectory, and that trajectory represents a robustness value. This robustness value
is equal to the quantitative semantics for the
critical predicate. Within each differentiable
segment in this plot, the control parameters
yield trajectories associated with a unique critical predicate.
Figure 3.14: This figure depicts the sampling-based
gradient computation. In our approach, we freeze the
controller at some time-points, while at others we assume the controller to be a function of its parameters
that can vary in this iteration of back-propagation
process. The actions that are fixed are highlighted in
red, whereas the dependent actions are denoted in
black. The red circles represent the time-steps where
the controller is frozen.
semantics of DT-STL robustness, ρ˜, at the non-differentiable local maxima where the critical predicate
is updated. We refer to this shift between critical predicate and smooth semantics as safe re-smoothing.
However, it is practically impossible to accurately detect the non-differentiable local maxima, thus we take
a more conservative approach and we instead, utilize ρ˜ at every gradient step when the critical predicate
technique is unable to improve the robustness.
3.7.4 Computing the Sampled Gradient
We now explain how we compute an approximation of the gradient of original trajectory (that we call the
original gradient). We call the approximate gradient from our sampling technique as the sampled gradient.
In the back-propagation algorithm - at a given gradient step j and with control parameter θ
(j)
- we wish to
compute the sampled gradient [∂J /∂θ(j)
]sampled. The objective function J in our training algorithm can
be either the robustness for critical predicate or the smooth semantics for the robustness of trajectory, ρ˜.
The former is defined over a single trajectory state, (i.e., at critical time) while the latter is defined over
63
the entire trajectory. In response, we propose two different approaches for trajectory sampling for each
objective function.
1. In case the objective function J is the robustness for critical predicate, it is only a function of the
trajectory state sk
∗ . Thus, we sample the time-steps as, T = {0, t1, t2, · · · , tN } , tN = k
∗
to generate a
sampled trajectory smpl
σ[s0 ; θ
(j)
], T
that ends in critical time. We utilize this sampled trajectory to
compute the sampled gradient. The original gradient regarding the critical predicate can be formulated
as, ∂J /∂θ = (∂J /∂sk
∗ ) (∂sk
∗ /∂θ), where sk
∗ = sub
σ[s0 ; θ
(j)
], T
(N). However, we define J on our
sampled trajectory and propose the sampled gradient as,
∂J
∂θ
sampled
=
∂J
∂smpl
σ[s0 ; θ
(j)
], T
(N)
! ∂smpl
σ[s0 ; θ
(j)
], T
(N)
∂θ !
.
2. In case the objective function is the smooth semantics for the robustness ρ˜, it is a function of all the
trajectory states. In this case, we consequently segment the trajectory into M subsets, by random time
sampling as, T
q =
0, tq
1
, tq
2
, · · · , tq
N
⊆ K, q ∈ {1, · · · , M} (See Example 6), where,
(∀q, q′ ∈ {1, · · · , M} : T
q ∩ T q
′
= {0}) ∧ (K =
[
q∈{1,··· ,M}
T
q
). (3.9)
Let’s assume the sub-trajectories sub
σ[s0 ; θ
(j)
], T
q
= s0, st
q
1
, · · · , st
q
N
and their corresponding sampled
trajectories as smpl
σ[s0 ; θ
(j)
], T
q
. As the sampled time-steps T
q
, q ∈ {1, · · · , M} have no time-step
in common other than 0 and their union covers the horizon K, we can reformulate the original gradient
(∂J /∂θ =
PK
k=1(∂J /∂sk)(∂sk/∂θ)) as:
∂J
∂θ =
X
M
q=1
∂J
∂sub
σ[s0 ; θ
(j)
], T
q
! ∂sub
σ[s0 ; θ
(j)
], T
q
∂θ
However, in our training process to compute the sampled gradient, we relax the sub-trajectories,
sub
σ[s0 ; θ
(j)
], T
q
, q ∈ {1, · · · , M}
with their corresponding sampled trajectories smpl
σ[s0 ; θ
(j)
], T
q
. In other words,
∂J
∂θ
sampled
=
X
M
q=1
∂J
∂smpl
σ[s0 ; θ
(j)
], T
q
! ∂smpl
σ[s0 ; θ
(j)
], T
q
∂θ !
.
Remark 3.7.4. Unlike ∂sk
∗ /∂θ and ∂sub
σ[s0 ; θ
(j)
], T
q
/∂θ, q ∈ {1, · · · , M} that are prone to explode
problem, the alternatives, ∂smpl(σ[s0 ; θ], T ) (N)/∂θ, and ∂smpl(σ[s0 ; θ], T
q
)/∂θ, q ∈ {1, · · · , M}, can
be computed efficiently§§
.
Example 6. Here, we propose an example to show our methodology to generate sampled trajectories
when J = ˜ρ. We again consider the Example 3, but we sample the trajectory with M = 3 sets of
sampled time-steps T
1 = {0, 2, 4, 9} , T
2 = {0, 5, 7, 8} and T
3 = {0, 1, 3, 6}. Here, the time-steps are
sampled such that their intersection is {0} and their union is K. The resulting sampled trajectory for T
1
is
smpl
σ[x0 ; θ
(42)], T
1
= x0, x˜2, x˜4, x˜9, where,
x˜2 = f
(42)
1
(x0) = f( F(x0, 0 ; θ
(42)), 0.1),
x˜4 = f
(42)
2
(x˜2) = f( F(x˜2, 2 ; θ
(42)), 0.3),
x˜9 = f
(42)
3
(x˜4) = f( f( f( f( F(x˜4, 4 ; θ
(42)), 0.5), 0.6), 0.7), 0.8),
=⇒
and the resulting sampled trajectory for T
2
is smpl
σ[x0 ; θ
(42)], T
2
= x0, x˜5, x˜7, x˜8, where,
x˜5 = f
(42)
1
(x0) = f( f( f( f( F(x0, 0 ; θ
(42)), 0.1), 0.2), 0.3), 0.4),
x˜7 = f
(42)
2
(x˜5) = f( F(x˜5, 5 ; θ
(42)), 0.6),
x˜8 = f
(42)
3
(x˜7) = F(x˜7, 7 ; θ
(42)),
=⇒
§§The efficiency results from the control parameters θ repeating in fewer time-steps over the trajectory, as most of them are
fixe
and finally, the resulting sampled trajectory for T
3
is smpl
σ[x0 ; θ
(42)], T
3
= x0, x˜1, x˜3, x˜6 that has been
previously explained in Example 3. We emphasize that the introduced sampled trajectories are exclusively
generated for gradient step j = 42 and we perform a new random sampling for the next iteration.
3.7.5 A Detailed Discussion on Training Algorithm
Remark 3.7.5. At the start of the training process, we can envision a desired path for the model to track.
Tracking this path may not be sufficient to satisfy the temporal specification, but its availability is still
valuable information, which its inclusion to the training process can expedite it. Therefore, we also
utilize a desired path to generate a convex and efficient waypoint function (denoted by J
wp (σ [s0 ; θ]))
for our training process. However, Algorithm 4 performs effectively even without the waypoint function.
Section 3.8.3.1 explores this aspect using a numerical example. Nonetheless, integrating a waypoint function
enhances the efficiency of the training process.
We finally present our overall training procedure in Algorithm 4. Here, we use ρ
φ(σ[s0 ; θ]) as shorthand
for the non-smooth robustness degree of σ[s0 ; θ] w.r.t. φ at time 0, i.e., ρ(φ, σ[s0 ; θ], 0) . We terminate the
algorithm in line 2 if the robustness is greater than a pre-specified threshold ρ >¯ 0. We also evaluate the
performance of the algorithm through challenging case studies. During each iteration of this algorithm, we
compute the robustness value for an initial state s0 selected from the pre-sampled set of initial states Ib in
line 3. This selection can be either random, or the initial state with the lowest robustness value in the set Ib.
The Boolean parameter use_smooth is provided to toggle the objective between robustness of the critical
predicate and the smooth robustness for the DT-STL formula. We initialize this parameter use_smooth in
line 3 to be False and further update it to True in line 21, in case the gradient from critical predicate is unable
to increase the robustness. The lines 18,19 and 21 aim to improve the detection of non-differentiable local
maxima by employing a more accurate approach. This involves maintaining the direction of the gradient
generated with the critical predicate, and exponentially reducing the learning rate until a small threshold
6
Algorithm 4: Gradient sampling and training the controller for longer horizon tasks.
1 Input: ϵ, M, N, N1, N2, θ(0), φ, ρ, ¯ Ib, j = 0
2 while ρ
φ(σ[s0 ; θ
(j)
]) ≤ ρ¯ do
3 s0 ← Sample from Ib use_smooth ← False j ← j + 1
4 if use_smooth = False then
5 θ1, θ2 ← θ
(j)
// θ1&θ2 are candidates for parameter update via critical predicate
// and waypoint.
// The following loop updates θ1 and θ2 via cumulation of N1
// sampled gradients.
6 for i ← 1, · · · , N1 do
7 σ[s0 ; θ1], σ[s0 ; θ2] ← Simulate the trajectory via θ1, θ2, and s0
8 k
∗
, h∗
(sk∗ ) ← obtain the critical time and the critical predicate
9 T
1
, smpl
σ[s0 ; θ1], T
1
←
sample set of time steps T
1 = {0, t1, .., tN = k
∗} and its sampled trajectory
10 T
2
, smpl
σ[s0 ; θ2], T
2
←
sample set of time steps T
2 = {0, t1, .., tN } and its sampled trajectory
11 J ← h
∗
smpl
σ[s0 ; θ1], T
1
(N)
d1 ← [∂J /∂θ]sampled θ1 ← θ1 + Adam(d1/N1)
12 J ← J wp
smpl
σ[s0 ; θ2], T
2
d2 ← [∂J /∂θ]sampled θ2 ← θ2 + Adam(d2/N1)
// Update the control parameter with θ2 if it increases the robustness value
// Otherwise update the control parameter with θ1 if it increases
// the robustness value.
// Otherwise, check for non − differentiable local maximum
13 if ρ
φ(σ[s0 ; θ2]) ≥ ρ
φ(σ[s0 ; θ
(j)
]) then θ
(j+1) ← θ2
14 else if ρ
φ(σ[s0 ; θ1]) ≥ ρ
φ(σ[s0 ; θ
(j)
]) then θ
(j+1) ← θ1
15 else
16 ℓ ← 1 update ← True
17 while update & (use_smooth=False) do
// Keep the gradient direction & reduce the learning rate
18 ℓ ← ℓ/2,
ˆθ ← θ
(j) + ℓ(θ1 − θ
(j)
)
// Update the control parameter with ˆθ if it increases
// the robustness value.
19 if ρ(φ, σ[s0 ;
ˆθ], 0) ≥ ρ
φ(σ[s0 ; θ
(j)
]) then h
θ
(j+1) ← ˆθ update ← Falsei
20 else if ℓ < ϵ then
21 use_smooth ← True // swap the objective with ρ˜ if ℓ < ϵ
22 if use_smooth = True then
23 θ3 ← θ
(j) // θ3 is the candidate for parameter update via smooth semantic ρ˜
// The following loop updates θ3 via cumulation of N2 sampled gradients
24 for i ← 1, · · · , N2 do
25 T
q
,smpl(σ[s0 ; θ3], T
q
), q ∈ 1, · · · , M ←
Make M sets of sampled time steps from Eq. (3.9) & their sampled trajectories
26 J ← ρ d ˜ 3 ← [∂J /∂θ]sampled θ3 ← θ3 + Adam(d3/N2)
27 θ
(j+1) ←
ϵ is reached. If, even with an infinitesimal learning rate, this gradient fails to increase the robustness, it
suggests a high likelihood of being in a non-differentiable local maximum.
3.8 Experimental Evaluation
In this section, we evaluate the performance of our proposed methodology. We executed all experiments
for training with Algorithm 4 using our MATLAB toolbox¶¶. These experiments were carried out on a
laptop PC equipped with a Core i9 CPU. In all experiments performed using Algorithm 4, we utilize LB4TL
as the smooth semantics. We also present an experiment in Section ?? to empirically demonstrate that NN
feedback controllers provide robustness to noise compared to open-loop alternatives. Finally, we conclude
this section with statistical verification of controllers∗∗∗
.
First, we provide a brief summary of results on evaluation of Algorithm 4. Following this, we elaborate
on the specifics of our experimental configuration later in this section.
Evaluation Metric. We evaluate the effectiveness of our methodology outlined in Algorithm 4 through
four case studies, each presenting unique challenges. First, we present two case studies involving tasks
with longer time horizons:
• 6-dimensional quad-rotor combined with a moving platform with task horizon K = 1500 time-steps.
• 2-dimensional Dubins car with task horizon K = 1000 time-steps.
Subsequently, we present two additional case studies characterized by higher-dimensional state spaces:
• 20-dimensional Multi-agent system of 10 connected Dubins cars with task horizon K = 60 time-steps.
• 12-dimensional quad-rotor with task horizon K = 45 time-steps.
¶¶The source code for the experiments is publicly available from https://github.com/Navidhashemicodes/STL_dropout
∗∗∗Our results show that integrating a waypoint function in Algorithm 4 enhances the efficiency of the training process to a
small extent.
68
Case Study Temporal System Time NN Controller Number of Runtime Optimization Setting
Task Dimension Horizon Structure Iterations (seconds) [M, N, N1, N2, ϵ, b]
12-D Quad-rotor φ3 12 45 steps [13,20,20,10,4] 1120 6413.3 [9, 5, 30, 40, 10−5
, 5]
Multi-agent φ4 20 60 steps [21,40,20] 2532 6298.2 [12, 5, 30, 1, 10−5
, 15]
6-D Quad-rotor & Frame φ5 7 1500 steps [8,20,20,10,4] 84 443.45 [100, 15, 30, 3, 10−5
, 15]
Dubins car φ6 2 1000 steps [3,20,2] 829 3728 [200, 5, 60, 3, 10−5
, 15]
Table 3.4: Results on different case studies. Here, b is the hyper-parameter we utilized to generate LB4TL
in [76].
Table 3.4 highlights the versatility of Algorithm 4 in handling above case studies. We use a diverse set of
temporal tasks which include nested temporal operators and two independently moving objects (quad-rotor
& moving platform case study). The detail of the experiments are also discussed as follows.
3.8.1 12-Dimensional Quad-rotor (Nested 3-Future Formula)
Figure 3.15: This figure shows the simulation of
trained control parameters to satisfy the specified temporal task in companion with the simulation result for initial guess for control parameters.
We assume a 12-dimensional model for the quad-rotor
of mass, m = 1.4 kg. The distance of rotors from the
quad-rotor’s center is also ℓ = 0.3273 meter and the
inertia of vehicle is Jx = Jy = 0.054 and Jz = 0.104
(see [25] for the detail of quad-rotor’s dynamics). The
controller sends bounded signals δr, δl
, δb, 0 ≤ δf ≤ 1 to
the right, left, back and front rotors respectively to drive
the vehicle. Each rotor is designed such that given the
control signal δ it generates the propeller force of k1δ
and also exerts the yawing torque k2δ into the body of
the quad-rotor. We set k1 = 0.75mg such that, the net
force from all the rotors can not exceed 3 times of its weight, (g = 9.81). We also set k2 = 1.5ℓk1 to make
it certain that the maximum angular velocity in the yaw axis is approximately equivalent to the maximum
angular velocity in the pitch and roll axis. We use the sampling time δt = 0.1 seconds in our control
process. The dynamics for this vehicle is proposed in Eq. (3.10), where F, τϕ, τθ, τψ are the net propeller
69
x˙ 1 = cos(x8) cos(x9)x4 + (sin(x7) sin(x8) cos(x9) − cos(x7) sin(x9))x5
+(cos(x7) sin(x8) cos(x9) + sin(x7) sin(x9))x6
x˙ 2 = cos(x8) sin(x9)x4 + (sin(x7) sin(x8) sin(x9) + cos(x7) cos(x9))x5
+(cos(x7) sin(x8) sin(x9) − sin(x7) cos(x9))x6
x˙ 3 = sin(x8)x4 − sin(x7) cos(x8)x5 − cos(x7) cos(x8)x6
x˙ 4 = x12x5 − x11x6 − 9.81 sin(x8)
x˙ 5 = x10x6 − x12x4 + 9.81 cos(x8) sin(x7)
x˙ 6 = x11x4 − x10x5 + 9.81 cos(x8) cos(x7) − F/m
x˙ 7 = x10 + (sin(x7)(sin(x8)/ cos(x8)))x11 + (cos(x7)(sin(x8)/ cos(x8)))x12
x˙ 8 = cos(x7)x11 − sin(x7)x12
x˙ 9 = (sin(x7)/ cos(x8))x11 + (cos(x7)/ cos(x8))x12
x˙ 10 = −((Jy − Jz)/Jx)x11x12 + (1/Jx)τϕ
x˙ 11 = ((Jz − Jx)/Jy)x10x12 + (1/Jy))τθ
x˙ 12 = (1/Jz)τψ
F
τϕ
τθ
τψ
=
k1 k1 k1 k1
0 −ℓk1 0 ℓk1
ℓk1 0 −ℓk1 0
−k2 k2 −k2 k2
δf
δr
δb
δl
δf = 0.5(tanh(0.5 a1) + 1),
δr = 0.5(tanh(0.5 a2) + 1),
δb = 0.5(tanh(0.5 a3) + 1),
δl = 0.5(tanh(0.5 a4) + 1),
a1, a2, a3, a4 ∈ R.
(3.10)
force, pitch torque, roll torque and yaw torque respectively. We plan to train a NN controller with tanh()
activation function and structure [13, 20, 20, 10, 4] for this problem that maps the vector, [s
⊤
k
, k]
⊤ to the
unbounded control inputs [a1,k, a2,k, a3,k, a4,k]
⊤. In addition to this, the trained controller should be valid
for all initial states,
I =
s0 |
h
−0.1, −0.1, −0.1,⃗09×1
i⊤
≤ s0 ≤
h
0.1, 0.1, 0.1,⃗09×1
i⊤
Figure 3.15 shows the simulation of quad-rotor’s trajectories with our trained controller parameters.
The quad-rotor is planned to pass through the green hoop, between the 10th and 15th time-step. Once it
passed the green hoop it should pass the blue hoop in the future 10th to 15th time-steps and again once it
has passed the blue hoop it should pass the red hoop again in the future next 10 to 15 time-steps. This is
called a nested future formula, in which we design the controller such that the quad-rotor satisfies this
70
specification. Assuming p as the position of quad-rotor, this temporal task can be formalized in DT-STL
framework as follows:
φ3 = F[10,15]
p ∈ green_hoop ∧ F[10,15]
p ∈ blue_hoop ∧ F[10,15] ( p ∈ red_hoop )
(3.11)
Figure 3.15 shows the simulation of trajectories, generated by the trained controller. The black trajectories are also the simulation of the initial guess for the controller, which are generated completely at
random and are violating the specification. We sampled I with 9 points, that are the corners of I including
its center. The setting for gradient sampling is M = 9, N = 5. We trained the controller with ρ¯ = 0, in
Algorithm 4 with optimization setting (N1 = 30, N2 = 40, ϵ = 10−5
) over 1120 gradient steps (runtime
of 6413.3 seconds). The runtime to generate LB4TL is also 0.495 seconds and we set b = 5 for it. The
Algorithm 4, utilizes gradients from waypoint function, critical predicate, and LB4TL , 515, 544, and 61
times respectively.
3.8.2 Multi-Agent: Network of Dubins Cars (Nested Formula)
In this example, we assume a network of 10 different Dubins cars that are all under the control of a neural
network controller. The dynamics of this multi-agent system is,
x˙
i
y˙
i
=
v
i
cos(θ
i
)
v
i
sin(θ
i
)
,
v
i ← tanh(0.5a
i
1
) + 1, ai
1 ∈ R
θ
i ← a
i
2 ∈ R
, i ∈ 1, · · · , 10, (3.12)
that is, a 20 dimensional multi-agent system with 20 controllers, 0 ≤ v
i ≤ 1θ
i ∈ R, i ∈ 1, · · · , 10.
Figure 3.16a shows the initial position of each Dubins car in R
2
in companion with their corresponding
goal sets. The cars should be driven to their goal sets, and they should also keep a minimum distance of
d = 0.5 meters from each other while they are moving toward their goal sets. We assume a sampling time
of δt = 0.26 seconds for this model, and we plan to train a NN controller with tanh() activation function
and structure [21, 40, 20] via Algorithm 4. For this problem, the controller maps the vector, [s
⊤
k
, k]
⊤ to the
unbounded control inputs n
a
i
1,k, ai
2,ko10
i=1
. Note that s
i
k = (x
i
k
, yi
k
). This temporal task can be formalized
in DT-STL framework as follows:
φ4 := ^
10
i=1
F[20,48]G[0,12]
s
i ∈ Goali
^ ^
i̸=j
i,j∈{1,··· ,10}
G[0,60]
∥s
i − s
j
∥∞ > d
Figure 3.16c shows the simulation of the trajectories for the trained controller, and Figure 3.16b presents
the simulation of trajectories for the initial guess for control parameters. We observe that our controller
manages the agents to finish the task in different times. Thus, we present the time-stamps with asterisk
markers to enhance the clarity of the presentation regarding satisfaction of the specification in Figure 3.16c.
Although the task is not a long horizon task, due to the high dimension and complexity of the task, we
were unable to solve this problem without time sampling. However, we successfully solved this problem
with Algorithm 4 within 6298 seconds and 2532 gradient steps.
We also set the optimization setting as, M = 12, N = 5, N1 = 30, N2 = 1, ϵ = 10−5
. The runtime to
generate LB4TL is also 6.2 seconds and we set b = 15 for it. Over the course of the training process we
utilized 187, 1647 and 698 gradients from waypoint function, critical predicate and LB4TL respectively.
3.8.3 6-Dimensional Quadrotor & Moving Platform: Landing a Quadrotor
We use a 6-dimensional model for quad-rotor dynamics as follows.
x˙ y˙ z˙ v˙x v˙y v˙z
=
vx vy vz g tan(u1) −g tan(u2) g − u3
, where,
u1 ← 0.1 tanh(0.1a1), u2 ← 0.1 tanh(0.1a2), u3 ← g − 2 tanh(0.1a3), a1, a2, a3 ∈ R.
(3.13)
0 2 4 6 8 10 12
0
2
4
6
8
10
12
(a) agents vs goal sets
-10 -5 0 5 10 15
-10
-5
0
5
10
15
(b) initial guess for θ
(0)
0
1
2
3
4
5
6
7
8
9
10
11
12
13
0
1
2
3
4
5
6
7
8
9
10
11
12
13
(c) Simulation of trajectories for trained control parameters.
Figure 3.16: These figures show a multi-agent system of 10 connected Dubins cars. Figure (a) shows the start (blue
dots) and goal points (green squares) for agents. Figs. (b,c) show simulated system trajectories with both the initial
untrained controller and the centralized NN controller trained with Algorithm 4. The controller coordinates all cars
to reach their respective goals between 20 and 48 seconds, and then stay in their goal location for at least 12 seconds.
It also keeps the cars at a minimum distance from each other. We remark that the agents finish their tasks (the first
component of φ4) at different times.
Let x = (x, y, z) denote the quad-rotor’s position and v = (vx, vy, vz) denote its velocity along the three
coordinate axes. The control inputs u1, u2, u3 represent the pitch, roll, and thrust inputs respectively. We
assume that the inputs are bounded as follows: −0.1 ≤ u1, u2 ≤ 0.1, 7.81 ≤ u3 ≤ 11.81.
The horizon of the temporal task is 1500 time-steps with δt = 0.05s. The quad-rotor launches at a
helipad located at (x0, y0, z0) = (−40, 0, 0). We accept a deviation of 0.1 for (x0 and y0 and train the
controller to be valid for all the states sampled from this region. The helipad is also 40m far from a building
located at (0, 0, 0). The building is 30m high, where the building’s footprint is 10m × 10m. We have also a
moving platform with dimension 2m × 2m × 0.1m that is starting to move from (10, 0, 0) with a variable
velocity, modeled as, x˙
f = u4. We accept a deviation of 0.1 for x
f
0
, and our trained controller is robust with
73
respect to this deviation. We define Ib with 9 samples located at the corners of I and the center of I. The
frame is required to keep a minimum distance of 4.5 meters from the building. We train the NN controller
to control both the quad-rotor and the platform to ensure that the quad-rotor will land on the platform
with relative velocity of at most 1 m/s in x, y and z directions, and its relative distance is at most 1m in
x, y direction and 0.4m in z direction. Let p = (x, y, z) be the position of the quad-rotor, this temporal
task can be formulated as a reach-avoid formula in DT-STL framework as follows:
φ5 = G[0,1500] (p /∈ obstacle) ∧ F[1100,1500](p ∈ Goal) ∧ G[0,1500](x
f
k > 9.5) (3.14)
where the goal set is introduced as follows,
Goal =
xk
yk
zk
vx,k
vy,k
vz,k
x
f
k
|
−1
−1
0.11
0
−1
−1
≤
xk − x
f
k
yk
zk
vx,k
vy,k
vz,k
≤
1
1
0.6
2
1
1
(3.15)
We plot the simulated trajectory for the center of set of initial states I, in Figure 3.17. The NN controller’s
structure is specified as [8, 20, 20, 10, 4] and uses tanh() activation function. We initialize it with a random
guess for its parameters. The simulated trajectory for initial guess of parameters is also depicted in black.
The setting for gradient sampling is M = 100, N = 15. We trained the controller with ρ¯ = 0, over 84
gradient steps (runtime of 443 seconds). The runtime to generate LB4TL is also 7.74 seconds and we set
74
Figure 3.17: This figure shows the simulated trajectory for trained controller in comparison to the trajectories
for naive initial random guess. The frame is moving with a velocity determined with the controller that
also controls the quad-rotor.
b = 15 , for it. In total, the Algorithm 4, utilizes gradients from waypoint function, critical predicate, and
LB4TL , 5, 71, and 8 times respectively.
3.8.3.1 Influence of Waypoint Function, Critical Predicate and Time Sampling on Algorithm 4
Here, we consider the case study of landing a quad-rotor, and perform an ablation study over the impact
of including 1) critical predicate, 2) waypoint function and 3) time sampling, in the training process via
Algorithm 4. To that end, we compare the results once these modules are excluded from the algorithm.
In the first step, we remove the waypoint function and show the performance of the algorithm. In the
next step, we also disregard the presence of critical time in time-sampling and train the controller with
completely at random time-sampling, and finally we examine the impact of time sampling on the mentioned
results. Table 3.3 shows the efficiency of training process in each case, and Figure 3.18 compare the learning
curves. Our experimental result shows, the control synthesis for quad-rotor (landing mission) faces a small
reduction in efficiency when the waypoint function is disregarded and fails when the critical predicate
75
is also removed from time sampling. This also shows that control synthesis fails when time-sampling is
removed.
3.8.4 Dubins Car: Growing Task Horizon for Dubins Car (Ablation Study on Time
Sampling)
In this experiment, we consider Dubins car with dynamics,
x˙
y˙
=
v cos(θ)
v sin(θ)
, v ← tanh(0.5a1) + 1, a1 ∈ R, θ ← a2 ∈ R,
Figure 3.19: This figure shows the simulation of
the results for Dubins car in the ablation study
proposed in section (3.8.4). In this experiment,
the task horizon is 1000 time-steps.
and present an ablation study on the influence of
gradient sampling on control synthesis. Given a scale
factor a > 0, a time horizon K and a pre-defined initial guess for control parameters θ
(0), we plan to train a
tanh() neural network controller with structure [3, 20, 2],
to drive a Dubins car, to satisfy the temporal task, φ6 :=
F[0.9K,K]
(p ∈ Goal)∧G[0,K]
(p /∈ Obstacle), where p =
(x, y) is the position of Dubins car. The Dubins car starts
from (x0, y0) = (0, 0). The obstacle is also a square centered on (a/2, a/2) with the side length 2a/5. The goal
region is again a square centered on (9a/10, 9a/10) with the side length a/20. We solve this problem for
K = 10, 50, 100, 500, 1000 and we also utilize a = K/10 for each case study. We apply standard gradient
ascent (see Algorithm 3) to solve each case study, both with and without gradient sampling. Furthermore,
in addition to standard gradient ascent, we also utilize Algorithm 4 to solve them. Consider we set the
initial guess and the controller’s structure similar, for all the training processes, and we also manually stop
76
0 50 100 150 200 250 300
-9
-8
-7
-6
-5
-4
-3
-2
-1
0
1
Figure 3.18: This figure shows the learning curve for training processes. Note, the figure has been truncated
and the initial robustness for all the experiments at iteration 0 is −47.8. This figure shows that Algorithm 4
in the presence of the waypoint function concludes successfully in 84 iterations while when the waypoint
function is not included, it terminates in 107 iterations. The algorithm also fails if the critical predicate is
not considered in time sampling.
Learning curve’s Waypoint Critical Time- Number of Runtime
color in Figure 3.18 function predicate sampling Iterations
✓ ✓ ✓ 84 443 sec .
× ✓ ✓ 107 607 sec .
✓ × ✓ DNF[−0.74] 6971 sec .
× × ✓ DNF[−1.32] 4822 sec .
✓ ✓ × DNF[−4.52] 1505 sec .
× ✓ × DNF[−11.89] 1308 sec .
Table 3.5: Ablation studies for picking different options for the optimization process. This table shows
the results of the training algorithm in case study 3.8.3.1. We indicate that the training does not result in
positive robustness within 300 gradient steps by DNF (did not finish) with the value of robustness in iteration
300 in brackets. The table represents an ablation study, where we disable the various heuristic optimizations
in Algorithm 4 in different combinations and report the extent of reduction in efficiency. We use ✓, × to
respectively indicate a heuristic being included or excluded. The time-sampling technique is utilized in all the
experiments.
77
Standard gradient ascent Standard gradient ascent Algorithm 4 (no waypoint) Algorithm 4 (with waypoint)
Horizon (No time Sampling) (With time Sampling) (With time Sampling) (With time Sampling)
Num. of Runtime Num. Runtime Num. of Runtime Num. of Runtime
Iterations (seconds) Iterations (seconds) Iterations (seconds) Iterations (seconds)
10 34 2.39 11 1.39 6 0.9152 4 5.61
50 73 2.46 53 14.01 20 2.7063 25 6.09
100 152 8.65 105 112.6 204 79.33 157 90.55
500 DNF[−1.59] 4986 3237 8566 2569 2674 624 890.24
1000 DNF[−11.49] 8008 DNF[−88.42] 28825 812 1804 829 3728
Table 3.6: Ablation study. We mark the experiment with DNF[.] if it is unable to provide a positive
robustness within 8000 iterations, and the value inside brackets is the maximum value of robustness it
finds. We magnify the environment proportional to the horizon. All experiments for K = 10, 50, 100 use a
unique guess for initial parameter values, and all the experiments for K = 500, 1000 use another unique
initial guess. Here, we utilized critical predicate module in both cases of Algorithm 4 (columns 3 & 4).
the process once the number of iterations exceeds 8000 gradient steps. We also assume a singleton as the
set of initial states {(0, 0)} to present a clearer comparison. The runtime and the number of iterations for
each training process is presented in Table 3.6. Figure 3.19 displays the simulation of trajectories trained
using Algorithm 4 for K = 1000 time-steps (via gradient sampling), alongside the simulations for the
initial guess of controller parameters.
Table. 3.6 shows our approximation technique outperforms the original gradient when the computation
for original gradient faces numerical issues (such as longer time-horizons K = 500, 1000). However, in
case the computation for original gradient does not face any numerical issues, then the original gradient
outperforms the sampled gradient which is expected. This table also shows that the standard gradient
ascent (with time sampling) is still unable to solve for the case K = 1000 while the Algorithm 4 solves for
this case efficiently. This implies the combination of time-sampling, critical predicate, and safe-resmoothing
provides significant improvement in terms of scalability. The experiment K = 500 in this table also shows,
inclusion of waypoint in Algorithm 4 is sometimes noticeably helpful.
78
3.8.5 Statistical Verification of Synthesized Controllers
In [77], we showed that if the trained neural network controller, the plant dynamics and the neural network
representing the STL quantitative semantics all use ReLU activation functions, then we can use tools
such as NNV [165] that compute the forward image of a polyhedral input set through a neural network to
verify whether a given DT-STL property holds for all initial states of the system. However, there are few
challenges in applying such deterministic methods here: we use more general activation functions, the
depth of the overall neural network can be significant for longer-horizon tasks, and the dimensionality
of the state-space can also become a bottleneck. In this chapter, we thus eschew the use of deterministic
techniques, instead reasoning about the correctness of our neural network feedback control scheme using a
statistical verification approach. In other words, given the coverage level δ1 ∈ (0, 1) and confidence level
δ2 ∈ (0, 1) we are interested in a probabilistic guarantee of the form, Pr[ Pr[σ[s0 ; θ] |= φ] ≥ δ1 ] ≥ δ2.
The main inspiration for our verification is drawn from the theoretical developments in conformal
prediction [174]. Of particular significance to us is the following lemma:
Lemma 3.8.1 (From [40]). Consider m independent and identically distributed (i.i.d.), real-valued data
points drawn from some distribution D. After they are drawn, suppose we sort them in ascending order and
denote the i
th smallest number by Ri
,(i.e., we have R1 < R2 < . . . < Rm). Let Beta(α, β) denote the Beta
distribution†††. Then, for an arbitrary Rm+1 drawn from the same distribution D, the following holds:
Pr [Rm+1 < Rℓ
] ∼ Beta(ℓ, m + 1 − ℓ), 1 ≤ ℓ ≤ m. (3.16)
The original m i.i.d. data-points are called a calibration set. The above lemma says that the probability
for a previously unseen data-point Rm+1 drawn from the same distribution D being less than the ℓ
th
†††The Beta distribution is a family of continuous probability distributions defined on the interval 0 ≤ x ≤ 1 with shape
parameters α and β, and with probability density function f(x; α, β) = x
α−1(1−x)
β−1
B(α,β)
, where the constant B(α, β) = Γ(α)Γ(β)
Γ(α+β)
and Γ(z) = R ∞
0
t
z−1
e
−tdt is the Gamma function.
79
smallest number in the calibration set is itself a random variable that has a specific Beta distribution. We
next show how we can exploit this lemma to obtain probabilistic correctness guarantees for our trained
controllers.
We assume that there is some user-specified distribution over the set of initial states in I, and that we
can sample m initial states s0,1, . . . , s0,m from this distribution. For a sampled initial state s0,i, i ∈ 1, · · · , m,
we can obtain the corresponding negative robust satisfaction value, and set: Ri = −ρ(φ, σ[s0i
; θ], 0), i ∈
1, · · · , m.
From Lemma 3.8.1, we know that for a previously unseen initial state s0,m+1, the corresponding
(negative value of) robustness Rm+1 satisfies the relation in (3.16). Now, almost all sampled trajectories
generated by a trained controller are expected to have positive robustness value, so we expect the quantities
R1, . . . , Rm to be all negative. In the pessimistic case, we expect at least the first ℓ of these quantities to
be negative. If so, the guarantee in Eq. (3.16) essentially quantifies the probability of the robustness of a
trajectory for a previously unseen initial state to be positive. Note that:
(Rm+1 < Rℓ) ∧ (Rℓ < 0) =⇒ (Rm+1 < 0) =⇒ (σ[s0,m+1 ; θ] |= φ) (3.17)
∴ Pr(σ[s0,m+1 ; θ] |= φ) ≥ Pr(Rm+1 < Rℓ) ∼ Beta(ℓ, m + 1 − ℓ) (3.18)
In addition, from [40] we know that the mean and variance of the Beta distribution are given as follows:
E
Pr[Rm+1 < Rℓ
]
=
ℓ
m + 1
Var
Pr[Rm+1 < Rℓ
]
=
ℓ(m + 1 − ℓ)
(m + 1)2(m + 2). (3.19)
As the Beta distribution has small variance and is noticeably sharp, the desired coverage level for a
probabilistic guarantee can be obtained in the vicinity of its mean value. From the closed form formula
in (3.19), we observe that in case we wish to have a coverage level close to (1 − 10−4
) or 99.99%, then
we can set ℓ = ⌈(m + 1)(1 − 10−4
)⌉. Here we also set m to 105
, giving the value of ℓ = 99991. Let’s
80
denote Pr[Rm+1 < Rℓ
] as δ. Since δ is a random variable sampled from Beta(ℓ, m + 1 − ℓ) where
(ℓ = 99991, m = 105
)
‡‡‡, we can utilize the cumulative density function of Beta distribution (i.e, regularized
incomplete Beta function) and for a given δ1 ∈ (0, 1) propose the following guarantee,
Pr[δ ≥ δ1] = 1−Iδ1
(ℓ, m+1−ℓ), where Ix( . , . ) is the regularized incomplete Beta function at point x.
Here δ1 is the desired confidence level that we consider for the probabilistic guarantee. However, if we set
δ1 = 0.9999 then Pr[δ ≥ 0.9999] = 0.54 which indicates that the confidence in the 99.99% guarantee is
low. If we instead set δ = 0.9998, this results in Pr[δ ≥ 0.9998] = 0.995, which indicates a much higher
level of confidence. Finally, based on (3.18), we can consider the provided guarantee also for the trajectories
and conclude,
Pr[ Pr[σ[s0 ; θ] |= φ] ≥ 99.98% ] ≥ 99.5% (3.20)
To summarize, in each of our case studies, we sample m = 105
i.i.d. trajectories, compute their sorted
negative robustness values R1, . . . , Rm, and check that Rℓ
for ℓ = 99991 is indeed negative. This gives us
the probabilistic guarantee provided in (3.20) that from unseen initial conditions the system will not violate
the DT-STL specification.
3.9 Related Work
Related Work. In the broad area of formal methods, robotics, and cyber-physical systems, there has been
substantial research in synthesizing controllers from temporal logic specifications. This research involves
different considerations. First, the plant dynamics may be specified as either a differential/difference
equation-based model [67, 131, 142, 57, 119, 143, 70], or as a Markov decision process [148, 72, 96] that
models stochastic uncertainty, or may not be explicitly provided (but is implicitly available through a
‡‡‡We can compute for its mean and variance via (3.19) as µ = E[δ] = 0.9999 and var[δ] = 9.9987 × 10−10
.
81
simulator that samples model behaviors). The second consideration is the expressivity of the specification
language, i.e., if the specifications are directly on the real-valued system behaviors or on Boolean-valued
propositions over system states, and if the behaviors are over a discrete set of time-steps or over dense time.
Specification languages such as LTL (Linear Temporal Logic) [136], Metric Temporal Logic (MTL) [104]
and Metric Interval Temporal Logic (MITL) [10] are over Boolean signals, while Signal Temporal Logic
(STL) [126] and its discrete-time variant DT-STL considered in this paper are over real-valued behaviors.
MTL, MITL and STL are typically defined over dense time signals while LTL and DT-STL are over discrete
time-steps. The third consideration is the kind of controller being synthesized. Given the plant dynamics,
some techniques find the entire sequence of control actions from an initial state to generate a desired
optimal trajectory (open loop control) [181, 142, 130, 119], while some focus on obtaining a feedback
controller that guarantees satisfaction of temporal logic objectives in the presence of uncertainty [182] (in
the initial states or during system execution). We now describe some important sub-groups of techniques
in this space that may span the categories outlined above.
Reactive Synthesis. A reactive synthesis approach models the system interaction with its environment as a
turn-based game played by the system and the environment over a directed graph [30]. The main idea is to
convert temporal logic specifications (such as LTL) into winning conditions and identify system policies
that deterministically guarantee satisfaction of the given specification [105]. As reactive synthesis is a
computationally challenging problem, there are many sub-classes and heuristics that have been explored
for efficiency; for instance, in [178] a receding horizon framework is used; in [166], the authors focus on
piece-wise affine nondeterministic systems, while [143] investigates reactive synthesis for STL.
Reinforcement and Deep Reinforcement Learning. Reinforcement learning (RL) algorithms learn control
policies that maximize cumulative rewards over long-term horizons. Recently, RL temporal has been
used to infer reward functions that can guarantee satisfaction of an LTL specification [74, 149, 32, 64].
82
The work in [170, 97, 73, 138, 22] generate reward functions from STL specifications. While the ultimate
objective of these methods is similar to our problem setting, we adopt a model-based approach to control
synthesis where we assume access to a differentiable model of the system and use gradient ascent to train
the controller in contrast to RL algorithms that may rely on adequate exploration of the state space to
obtain near-optimal policies (that may guarantee satisfaction of specifications).
MPC and MILP. A clever encoding of LTL as mixed integer linear (MIL) constraints was presented in [177]
for the purpose of reactive synthesis. This idea was then extended in [142] to show that model predictive
control of linear/piecewise affine systems w.r.t. STL objectives (with linear predicates) can be solved using
mixed integer linear programming (MILP) solvers. MILP is an NP-hard problem, and various optimization
improvements to the orignal problem [107, 67, 158, 130] and extensions to stochastic systems [57, 150] have
been proposed. In contrast to a model-predictive controller, we obtain a NN feedback controller that does
not require online optimization required in MPC.
Barrier Function-based Approaches. A control barrier function (CBF) can be thought of as a safety envelope
for a controlled dynamical system. As long as the CBF satisfies validity conditions (typically over its Lie
derivative), the CBF guarantees the existence of a control policy that keeps the overall system safe [180].
CBFs can be used to enforce safety constraints and also to enforce temporal specifications such as STL [119,
13, 12, 42]. The design of barrier functions is generally a hard problem, though recent research studies
compute for the CBFs through learning [170, 146], and using quantitative semantics of STL [79].
Gradient-based Optimization methods. This class of methods investigates learning neural network controllers
by computing the gradient of the robustness function of STL through back-propagation STL. For instance,
training feedback neural network controllers is studied in [80, 182, 181, 121, 85] and for open-loop controllers
is investigated in [114]. The main contributions in this paper over previous work is to scale gradient descent
83
to long time-horizons using the novel idea of dropout, and a more efficient (and smooth) computation graph
for STL quantitative semantics.
Prior work on NN controllers for STL. The overall approach of this paper is the closest to the work in [182,
113, 112, 77, 80], where STL robustness is used in conjunction with back-propagation to train controllers.
The work in this paper makes significant strides in extending previous approaches to handle very long
horizon temporal tasks, crucially enabled by a novel sampling-based gradient approximation. Due to the
structure of our NN-controlled system, we can seamlessly handle time-varying dynamics and complex
temporal dependencies. We also note that while some previous approaches focus on obtaining open-loop
control policies, we focus on synthesizing closed-loop, feedback NN-controllers which can be robust to
minor perturbations in the system dynamics. In addition, we cover a general DT-STL formula for synthesis,
and we utilize LB4TL [76] for backward computation that has shown significant improvement for efficiency
of training over complex DT-STL formulas.
Limitations. Some of the key limitations of our approach include: (1) we do not address infinite time horizon
specifications. (2) We only consider a discrete-time variant of STL. (3) Our approach would fail if the chosen
neural network architecture for the controller has too few parameters (making it difficult to control highly
nonlinear environment dynamics) or if it has too many parameters (making it a difficult optimization
problem). (4) We assume full system observability and do not consider stochastic dynamics.
84
Chapter 4
Online Convex Optimization-based Policy Modification
4.1 Introduction
Systems operating in highly uncertain environments are often modeled using controlled Markovian stochastic difference equations or Markov decision processes (MDPs). Given a state st
(i.e., the state at time t), a
discrete-time MDP defines a distribution on st+1 conditioned on the state st and the control action at
. We
call this distribution the transition dynamics. For such systems, a number of model-based and data-driven
control design methods have been explored to learn an optimal policy (i.e. a function from the set of states
to the set of actions) that minimizes some trajectory-based cost function [101]. Model-based methods
explicitly, and data-driven methods implicitly, assume a specific distribution for the transition dynamics.
However, when the system is deployed in the real-world, this distribution may not be the same; this change
in distribution is called a distribution shift.
The fundamental problem addressed by this chapter is adapting a pre-learned control policy to compensate for distribution shifts. While it is possible to retrain the control policy on the new environment,
it is typically expensive to learn the precise dynamics of the new environment and then synthesize the
optimal control policy on the learned dynamics. However, a crucial observation that we make is that while
learning a precise high-fidelity model and the optimal policy is expensive, learning a reasonable fidelity
model of the transition dynamics may be feasible. We call such a learned model a surrogate model. In
85
this chapter, we show that under certain kinds of distribution shifts, the problem of adapting an existing
optimal policy to the new deployment environment can be framed as a nonlinear optimization problem
over the optimal trained trajectory and the surrogate model. Furthermore, we show that if the surrogate is
a neural network (with rectified linear unit or ReLU based activation), then there is a convex relaxation of
the original optimization problem that permits an efficient procedure to find a modified action to minimize
the error between the desired optimal trajectory and the actual trajectory in the deployment environment.
Finally, we empirically show that if the trained trajectory meets desired objectives of safety, then such
policy adaptation can provide safety during deployment.
The main technical idea in our work is inspired by recent work in [59], where the authors proposed
an efficient method to provide probabilistic bounds on the output of a neural network, given a Gaussian
distribution on its inputs. We show how we can use this result to propagate the effects of a distribution shift.
However, the result in [59] does not consider the problem of finding optimal actions (which is a non-convex
problem). In this research, we propose a methodology to convexify this result for finding optimal actions.
We demonstrate our technique on a tracking problem using a Dubin’s car model and a collision avoidance
problem that uses adaptive cruise control.
Related Work. The work in this chapter is related to transfer learning in robot learning, where the objective
is to train control policies using a simulator and then transfer them to the physical robot. There are several
approaches for transfer learning; in [37], the authors use a modular approach to separate the sensing,
planning and low-level actuator control components and learning the planning policy in the simulator. This
eases transferring the policy to the real robot. In [35], the authors use a learned deep inverse dynamics model
to decide which real-world action is most suitable to achieve the same state as in the simulator. Mutual
alignment transfer learning approaches employ auxiliary rewards for transfer learning under discrepancies
in system dynamics for simulation to robot transfer ([179]). Approaches such as [50] compensate for
the difference in dynamics by modifying the reward function such that the modified reward function
86
penalizes the agent for visiting states and taking actions in the source domain which are not possible in
the target domain. In [135], the authors study robust adversarial reinforcement learning. Inspired from
∥H∥∞ control, they assume destabilizing adversaries such as the gap between simulation and the real
environment as uncertainties and devise a learning algorithm that is robust to the worst case adversary.
Transfer learning has also been investigated in the multi-agent setting [116], where the problem of training
agents with continuous actions is studied to ensure that the trained agents can still generalize when their
opponent’s policies alter. In [134], the authors propose to use Bayesian optimization (BO) to actively select
the distribution of the environment variable that maximizes the improvement generated by each iteration
of the policy gradient method. Unlike the authors of [50] who propose a reward modification technique, in
this work we propose a policy modification technique to tackle the problem when the environment model
in training is different from what is expected.
The rest of the chapter is organized as follows. In Section 4.2 we discuss the preliminaries, terminology
and technical notation. In Section 4.3, we discuss our policy adaptation approach, and provide experimental
results in Section 4.4. We conclude with related work in Section ??.
4.2 Preliminaries
Notation. A multi-variate Gaussian distribution is denoted as N (µ, Σ), where µ and Σ represent the mean
vector and covariance, respectively. For a Gaussian-distributed random vector r ∈ R
n
, we denote its mean
value by µr and its covariance by Σr. Let c ∈ R
n
, then an ellipsoid centered at c with the shape matrix Ω is
denoted as E(c, Ω), i.e., E(c, Ω) =
x
(x − c)
⊤Ω
−1
(x − c) ≤ 1
. Given a non-convex set Y, we use the
notation H(Y) to denote the set of ellipsoids that cover Y, i.e., {E(c, Ω)| Y ⊆ E(c, Ω)}.
Markov Decision Process, Optimal Policy. We now formalize the notion of the type of stochastic dynamical
systems that we address in this chapter as Markov decision processes.
87
Definition 4.2.1 (Markov Decision Process (MDP)). A Markov decision process is a tuple M = (S, A, T, ι),
where S and A denote the set of states and actions respectively, T(s
′
| s, a) is the probability distribution
on the next state conditioned on the current state and action, and ι is a distribution on S that is sampled to
identify an initial state of the MDP∗
.
In our approach, we are interested in finite-horizon trajectories sampled from the MDP’s transition
dynamics. A policy π(a | s) of the MDP is a distribution of the set of actions conditioned on the current state.
Given a fixed policy of the MDP, a T-length trajectory (denoted τ ) or behavior of the MDP is a sequence
of states s0, . . . , sT such that s0 ∼ ι, and for all t ∈ [0, T − 1], st+1 ∼ T(s
′
| st
, at ∼ π(a | s = st)). In
control-design problems, we assume that there is a cost function J on the space of trajectories that maps
each trajectory to a real value. An optimal policy π
∗
is defined as the one that minimizes the expected value
of the cost function over trajectories starting from a state s0 sampled according to the initial distribution ι.
Distribution Shifts. Obtaining an optimal control policy is often a computationally expensive procedure
for MDPs where the underlying transition dynamics are highly nonlinear. Several design methods, both
model-based methods such as model-predictive control [66], stochastic optimal control [29], and model-free
methods such as data predictive control [93] and deep reinforcement learning [128] have been proposed
to solve the optimal control problem for such systems. Regardless of whether the method is model-based
or model-free, these methods explicitly or implicitly assume a model or the distribution encoded by the
transition dynamics of the environment. A key issue is that this distribution may change once the system
is deployed in the real-world. To differentiate between the training environment and the deployment
environment, we use Ttrn and Tdpl to respectively denote the transition distributions.
Problem Definition. Suppose we have a system where we have trained an optimal policy π
∗ under the
transition dynamics Ttrn, and for a given initial state s0 sampled from ι, we sample an optimal trajectory
∗Technically, this definition pertains to the transition structure of a stochastic dynamical system. Typically, dynamical systems
are defined in terms of difference or differential equations describing the temporal evolution of a state variable. We assume that
T(s
′
|s, a) is thus the infinite set of transitions consistent with any given system dynamics.
88
τopt for the system using the policy π
∗
. We denote this as τ ∼ (ι, π∗
). Let τ = (s0, s1, . . . , sT ). Let τ (t)
be shorthand to denote st
. For a trajectory that starts from the same initial state (but in the deployment
environment), we want to find the adapted policy πˆ such that the error between τopt and the trajectory
under Tdpl dynamics at time instant t ∈ [1, T] is small. Formally,
πˆ(t) = argminπ E
at∼π(a|st),
st+1∼Tdpl(s
′
|st,at)
∥τopt(t + 1) − st+1∥2
(4.1)
A key challenge in solving the optimization problem in (4.1) is that Tdpl is not known. In this chapter,
we propose that we learn a surrogate model for the deployment transition dynamics. Essentially, a surrogate
model is a data-driven model that approximates the actual system dynamics reasonably accurately. There
are several choices for surrogate models including Gaussian Processes [7], probabilistic ensembles [36], and
deep neural networks (NN). In this chapter, we focus on NN surrogates, as they allow us to consider convex
relaxations of the policy adaptation problem.
Surrogate-Based Policy Adaptation. We now show that surrogate-based policy adaptation can be phrased as
a nonlinear optimization problem. First, we specify the problem of finding good surrogates. In this work,
Tdpl(st+1 | st
, at) is assumed to be a time-invariant Gaussian distribution with mean µ(s, a) and covariance
Σ(s, a). A surrogate model for the transition dynamics is a tuple (µNN (s, a; θµ), ΣNN (s, a; θΣ)), where
µNN and ΣNN are deep neural networks with parameters θµ and θΣ respectively. We can train such NNs
by minimizing the following loss functions:
Lµ(θµ) = E
s∼S,s′∼Tdpl(s
′
|s,a)
µNN (s, a; θµ) − s
′
2
(4.2)
LΣ(θΣ) = E
s∼S
∥ΣNN (s, a; θΣ) − Σs(s
′
)∥2 (4.3)
89
In the above equations, the expectation is computed by standard Monte Carlo based sampling. In the second
equation, Σs represents the sample covariance of s
′ w.r.t. the sample mean.
Assuming that we have learned surrogate models to a desired level of accuracy, the next step is to
frame policy adaptation as a nonlinear optimization problem. We state the problem w.r.t. a specific optimal
trajectory τopt sampled from the optimal policy (though the problem generalizes to any optimal trajectory
sampled from an arbitrary initial state). Note that τopt(0) = s0.
∀t ∈ [0, T −1] : at = argmina∈A∥τopt(t + 1) − µNN (st
, a; θµ)∥2 (4.4)
We observe that as the equation above consists of a neural network, it is a highly nonlinear optimization
problem. In the next section, we will show how we can convexify this problem.
4.3 Policy Adaptation
Solution Overview. The quantity in Eq. (4.4) being minimized is, at each time t, the residual error between
the optimal trajectory and the mean predicted state by the deployment environment, conditioned on its
state and action. Let rt+1 = τopt(t + 1) − µNN (st
, at
; θµ). Our main idea is:
1. At any given time t, assume that the state st
lies in a confidence set described by an ellipsoid
E(µst
, Ωst
),
2. assume that the action at
lies in a confidence set also described by an ellipsoid E(µat
, Ωat
),
3. show that the residual error rt can be bounded by an ellipsoid, the center and shape matrix of which
depends on the action at
.
4. Find the action at that minimizes the residual error by convex optimization.
90
We now explain each of these steps in sequence. First, we motivate why need to consider confidence
sets. Suppose the system starts in state s0, then the state s1 is distributed according to the transition
dynamics of the deployment environment. In reality, we are only interested in the next states that are
likely with at least probability threshold p. For a multi-variate Gaussian distribution, this corresponds to
the sublevel set of the inverse CDF of this distribution, which according to the following lemma can be
described by an ellipsoid:
Lemma 4.3.1. [133] A random vector r ∈ R
n
, with Gaussian distribution r ∼ N (µ, Σ), satisfies,
Pr
1
ρn
(r − µ)
⊤Σ
−1
(r − µ) ≤ 1
= p, (4.5)
where, ρn = Γ−1
(
n
2
,
p
2
) and Γ
−1
(., .) indicates the n-dimensional lower incomplete Gamma function.
The above lemma allows us to define ellipsoidal confidence sets using truncated Gaussian distributions.
An ellipsoidal confidence region with center µ and shape matrix ρnΣ (where ρn is as defined Lemma 4.3.1)
defines a set with probability measure p.
Now, as the policy we are considering is stochastic (which we also model as a Gaussian distribution),
an action that can be taken is described by a conditional Gaussian distribution. Let µak
be the mean of the
distribution of the action at time t, then all actions with probability greater than p can be described by an
ellipsoid confidence set E(µat
, Ωat
).
Because the distribution of transition dynamics may have shifted, applying an action ∼ π
∗
(a|st) may
result in a residual error rt+1 that is unacceptable. So, we want to find a new action at which reduces the
residual error. We assume that at
is in an ellipsoidal uncertainty set by picking actions that have probability
greater than a fixed threshold p. We note that the center or the shape matrix of the ellipsoidal set for the
action is not known, but is a decision variable for the optimization problem. We note that the relation
between at and rt+1 is highly nonlinear. However, we show, how we can convexify this problem.
91
Before we present the convexification of the optimization problem, we need to introduce the notion
of the reachable set of residual values. We call this the residual reachset. Formally, given the ellipsoidal
confidence region E(µst
, Ωst
) for the state st
, and the ellipsoidal confidence region E(µat
, Ωat
) for the
action at
, the residual reachset Rt+1 is defined as follows:
Rt+1(µat
, Ωat
) = {µNN (st
, at
; θµ) − τopt(t + 1) |
st ∈ E(µst
, Ωst
), at ∈ E(µat
, Ωat
)}
(4.6)
In the above equation, we note that the residual reachset is parameterized by at and Ωat
, and we wish
to find the values for at and Ωat
that minimize the size of the residual reachset. However, the residual
reachset is a non-convex set. To make the optimization problem convex, we basically approximate the
residual reachset by an ellipsoidal upper bound in the set H(Rt+1(µat
, Ωat
)) (the set of all ellipsoidal upper
bounds).
We can now express the problem of finding the best adapted action distribution as the following
optimization problem:
(ˆµat
, Ωˆ
at
, Ωˆ Rt+1 ) = argmin
µat
,Ωat
,ΩRk+1
Logdet(ΩRk+1 )
s.t. Rt+1(µat
, Ωat
) ⊂ E(0, ΩRk+1 )
(4.7)
We set the center of ellipsoidal bound of residual reachset to be 0 so as to minimize the size of the
residuals. Equation (4.7) selects the best action aˆt ∈ E(ˆµat
, Ωˆ
at
) s.t. the the ellipsoid E(0, Ωˆ Rk+1 ) is the
smallest ellipsoid that bounds the residual reachset. In this optimization, the volume of this ellipsoid is
represented by Logdet.
The construction of an ellipsoidal bound over the reach-set of a neural network given a single ellipsoidal
confidence region is derived in [58]. The author has upgraded this technique later for multiple ellipsoidal
92
confidence regions in Theorem 1 of [75]. We rephrase the key results from these papers in our context in
Lemma 4.3.2.
Lemma 4.3.2. In what follows, (bℓ
, Wℓ) ∈ θµ represent the bias vector and the weights of the last layer in µNN .
Suppose st ∈ E(µst
, Ωst
), ak ∈ E(µak
, Ωat
). Then, the residual reachset Rt+1(µak
, Ωak
) is upper-bounded
by E(0, ΩRt+1 ) (as defined in (4.7)) if the following constraint holds:
τ1Mst + τ2Mak + Mϕ − Mout ≤ 0, where τ1, τ2 ≥ 0. (4.8)
Here, Mϕ is a quadratic constraint proposed in [58], representing ReLU hidden layers in the neural network and†
,
Mst =
1
ρn
E
⊤
1
−Σ
−1
sk
Σ
−1
sk
µsk
µ
⊤
sk
Σ
−1
sk −µ
⊤
sk
Σ
−1
sk
µsk + ρn
E1, Mat = E
⊤
2
−Ω
−1
ak
Ω
−1
ak
µak
µ
⊤
ak
Ω
−1
ak −µ
⊤
ak
Ω
−1
ak
µak + 1
E2
E1 =
In 0n×m 0n×(
Pℓ+1
i=2 Ni)
0n×1
01×n+m 01×(
Pℓ+1
i=2 Ni)
1
, E2 =
0m×n Im 0m×(
Pℓ+1
i=2 Ni)
0m×1
01×n+m 01×(
Pℓ+1
i=2 Ni)
1
Mout =
C b
0 1
⊤
−Ω
−1
Rk+1
0
0 1
C b
0 1
, C =
0 0 · · · Wℓ
, b = bℓ − τopt(t + 1).
In the above lemma, as the adapted action and the shape matrix representing its covariance is assumed
to be known Mak
is a fixed matrix; however, in the optimization problem that we wish to solve, in the corresponding matrix Mat
, µat
and Ωat will appear as variables, which causes the problem to become nonlinear.
†The parameter Ni in transformation matrices E1, E2 is the number of ReLU activations in layer i of µNN .
93
We can address this by performing two transformations. The first transformation, through a change
of variables, concentrates the nonlinearity in a single scalar entry of Mat
. We set Uat = τ2Ω
−1
at
, Vat =
τ2Ω
−1
at
µat
, and the resulting Mat
is shown as below:
M∗
ak = E
⊤
2
−Uat Vat
V
⊤
at −
τ2µ
⊤
atΩ
−1
at
Ωat
τ2
τ2Ω
−1
at
µat
+ τ2
E2
= E
⊤
2
−Uat Vat
V
⊤
at −V
⊤
at U
−1
at
Vat + τ2
E2
(4.9)
The proposed matrix, M∗
ak
, is nonlinear where the nonlinearity shows up in the scalar variable V
⊤
a U
−1
a Va.
We also note that the adapted actions should satisfy actuator bounds [ℓ, u], we include this as a convex
constraint below:
Uat
ℓ ≤ Vat ≤ Uatu, (4.10)
This is because, the proposed action should be inside the following hyper-rectangle: ℓ ≤ at ≤ u.
In Appendix C we proved that the solution of optimization (4.11) is a singleton At = {at}, therefore
we neglect the shape matrix, τ2U
−1
at
and only bound the mean value in the mentioned hyper-rectangle,
ℓ ≤ U
−1
at
Vat ≤ u, which implies, Uat
ℓ ≤ Vat ≤ Uatu
Before stating the final theorem, we make an observation about (4.7). Without additional constraints,
the optimal solution to (4.7) always returns Ωˆ
ak
such that tr(Ωˆ
ak
) = 0 (proof in the appendix A). This
causes numerical errors, as Lemma 4.3.2 requires computing the inverse of this matrix. To avoid such a
problem, we impose a tiny lower bound on the trace of this matrix.
Finally, given that all the constraints for the optimization of E(µat
, Ωat
) are provided, we can collect
them in a convex optimization that results in the modified action set. The following Theorem characterizes
the correctness of the modified action and its conservatism.
Theorem 4.3.3. Given the regulation factor δ > 0, and defining Ω = Ω−1
Rk+1
, assume decision variables
τ1, τ2 ≥ 0. Then the following convex optimization,
min
Mϕ,Vat
,Uat
,τ1,τ2
−Logdet(Ω)
s.t. −τ1Msk −E
⊤
2
−Uat Vat
V
⊤
at
τ2
E2 − Mϕ + Mout ≥ 0,
Uat
ℓ ≤ Vat ≤ Uatu, tr(Uat
)δ ≤ τ2.
(4.11)
results in values (Vat
, Uat
) such that the modified deterministic decision aˆ
c
t
can be approximated with
aˆ
c
t = ˆµat = U
−1
at
Vat
.
Proof. Based on [58] we know the sufficient condition for an ellipsoid E(0, Ω) to bound the reachset of the
residual is,
τ1Msk + E
⊤
2
−Uat Vat
V
⊤
at −V
⊤
at U
−1
at
Vat + τ2
E2 + Mϕ − Mout ≤ 0 (4.12)
we move the linear terms to the right and keep the nonlinear term at the left,
E
⊤
2
0 0
0 −V
⊤
at U
−1
at
Vat
E2 ≤ − τ1Msk − E
⊤
2
−Uat Vat
V
⊤
at
τ2
E2
− Mϕ + Mout
(4.13)
The matrix in the left of inequality is negative definite, therefore if we introduce the new constraint,
95
−τ1Msk − E
⊤
2
−Uat Vat
V
⊤
at
τ2
E2 − Mϕ + Mout ≥ 0 (4.14)
we have satisfied the required constraint in (5.13). Based on our observations, this new linear constraint
will not impose conservatism because the value V
⊤
at U
−1
at
Vat
is always near to zero, thus we are neglecting
the negative value of the nonlinear term. As we discussed before, we know Ωat = τ2U
−1
at
converges to
zero, this implies there is a chance for Uat
to become unbounded and this is why an infinitesimal Ωat
is
problematic for our convex optimization. In order to avoid unbounded solution for Uat
, we provide a small
lower bound on the tr(Ωat
). We know Ωat
is a positive definite matrix, therefore if tr(Ω−1
at
) is smaller than
a big number, σ, that suffices to have tr(Ωat
) to be greater than a small number (lower bound on size of At
).
This can be rephrased with the convex constraint tr(Uat
) ≤ τ2σ, or in another word, tr(Uat
)δ ≤ τ2 where
δ =
1
σ
is preferably a small number. Thus, to justify the presence of convex constraint tr(Uat
)δ ≤ τ2, we
mention this is just a precautionary measure (δ is very small) to avoid unbounded solutions.
Regarding the possible robustness issues with models (adversarial examples) we need to make a
comparison between π
∗
and aˆ
c
t
as a precautionary measure and select for the best choice for the modified
action aˆt
, via,
aˆt = arg min
a∈{π∗, aˆ
c
t
}
∥τopt(t + 1) − µ
∗
NN (st
, a; θµ)∥2. (4.15)
This is because, due to the presence of adversarial examples, we need to certify the modified action
performs better than autonomous agent on deployment environment. Since we can not utilize the environment directly for this purpose, we must employ a deep surrogate model for deployment environment. On
the other hand, reading through [60] clarifies, although a deep network is accurate, it results in noticeable
conservatism for convex programming in Theorem 4.3.3. Therefore, we train two networks for surrogate in
a highly nonlinear environment. The former will be obtained from Embedded technique in section 4.3.1
96
and will be utilized for convex programming. The latter is a very deep network that provides reliability for
an accurate comparison between, π
∗
and a
c
k
. We call this deep neural network as, µ
∗
NN .
We summarize the main steps of our proposed method in algorithm 5.
4.3.1 Scalability
The conservatism of tight ellipsoidal bound approximation introduced in [58] increases with the complexity
of neural network’s structure and results in inaccurate solution for Theorem 4.3.3. However, for a highly
nonlinear deployment environment, this is necessary to train a deep neural network for the surrogate. In
response to this problem, (similar to [125]), we utilize an embedder network, Mp, which maps the state st
to another space s
′
k ∈ R
n
′
(s
′
t = Mp(st
; θp)), such that s
′
k
is more tractable than sk for training purposes.
We next define the surrogate model and its parameters θµ based on s
′
t
as,
µsk+1 = µNN (s
′
k
, aˆk; θµ).
Given this setting for a highly nonlinear deployment environment, the neural network µNN is not necessarily a deep neural network. Thus, given the pair (sk, ak) as the input vector and sk+1 as output, we
arrange a training procedure for the function,
µsk+1 = µNN (Mp(st
; θp), ak; θµ)
to learn the parameters θµ, θp together and utilize θµ in the convex programming. In another word, given
the distribution of sk and the parameters θp, we can approximate the distribution for s
′
k with Gaussian
mixture model techniques [145] to introduce its confidence region to the convex optimization (through
µNN ) for policy modification.
97
Algorithm 5: Compensation Process for Distributional shifts
Input: s1 and trained autonomous agent, π
∗
Result: Small reachset for residual with its center closer to origin.
1 Sample the optimal trajectory τopt. foreach time step t do
•
if(t = 1) [ˆa1 ← π
∗
]
else(t ≥ 2) (
1- Apply Theorem 4.3.3 and compute aˆ
c
t
2- Select the best action between aˆ
c
t
and π
∗
, with (4.15) and return aˆt
.
• Use the observation st and aˆt to characterize the confidence region St+1 using the surrogate model.
• Record the observation st+1 generated by exertion of aˆt to the environment
2 end
4.4 Experimental Results
Comparison with PSO. We assume simple car environment and compare the performance of our convex
programming technique with Particle Swarm Optimization, (PSO) [100]
‡
, on solving the optimization (4.4).
PSO has shown acceptable performance in low dimensional environments. Thus, we plan to show our
convex programming technique can outperform PSO even if the scalability is not an important issue. The
environment represents the following simple car model:
x˙ = ucos(θ), y˙ = usin(θ),
˙ sin(θ) = u
ℓ
tan(ϕ)cos(θ),
cos˙ (θ) = −
u
ℓ
tan(ϕ)sin(θ)
(4.16)
The system represents a car of length ℓ moving with constant velocity u and driven with control action ϕ.
The training environment is characterized by ℓ = 2.5 and, u = 4.9 while the deployment environment is
slightly different with ℓ = 2.1 and u = 5.1. We collect a training data set from deployment environment
with time step 0.01 second.
‡While it is well-known that nonlinear optimization techniques lack guarantees and can suffer from local minima, techniques
like particle swarm work well in practice, especially in low-dimensional systems. Hence, we perform this comparison to show that
convexification outperforms state-of-the-art global optimization approaches to residual minimization.
98
0 100 200 300 400 500
-10
-5
0
5
0 100 200 300 400 500
-6
-4
-2
0
2
4
6
8
10
12
0 100 200 300 400 500
-1
-0.5
0
0.5
1
0 100 200 300 400 500
-1
-0.5
0
0.5
1
Figure 4.1: Shows the comparison between PSO and our convex programming. The green and blue curves
are the results of algorithm 5 and optimal trajectory, respectively . The red curves represent the deployment
environment’s trajectory when there is no policy modification. We utilize PSO for optimization (4.4) on the
same model with convex programming, µNN where the resultant trajectory is demonstrated in black. We
also utilize PSO over the deep model, µ
∗
NN and the resultant trajectory is demonstrated in magenta.
We train two surrogates for the deployment environment µNN , µ∗
NN from deployment environment.
The former is utilized in optimization (4.11) and is a ReLU neural network with dimension [5, 8, 4]. This
ReLU neural network is obtained from the proposed procedure in section 4.3.1. The latter is utilized
for comparison discussed in equation (4.15) and Appendix D, which is a deep tanh() neural network
of dimension [5, 200, 200, 200, 200, 200, 200, 200, 4]. The results of policy modification are presented in
Figure4.1. This figure presents the optimal trajectory, τopt, in green color. This trajectory is simulated with
an optimal control and hight-fidelity surrogate for the training environment computed from a model-based
algorithm. The blue curves are the results of algorithm 5 for (500 steps), which closely tracks the optimal
trajectory in all the three states x, y, θ. The red curves represent the deployment environment’s trajectory
when there is no policy modification. The run time for convex programming is between[0.005, 0.027] on a
personal laptop with YALMIP and MOSEK solver. Thus, we restrict the run time of PSO with 0.027 to have a
fair comparison and employ it for policy modification. In one attempt, we utilize PSO for optimization (4.4)
on the same model with convex programming, µNN where the resultant trajectory is demonstrated in black.
In another attempt we utilize PSO over the deep model µ
∗
NN and the resultant trajectory is demonstrated
in magenta. The results clearly show our convex programming outperforms the PSO in both cases.
99
Linear Environment of a Car. The training environment is a stochastic linear dynamics as follows:
xt+1 =
1 0.1 0.0047
0 1 0.0906
0 0 0.8187
xt +
0.003
0.0094
0.1813
ut + νt
, νt ∼ N
0 0 0.2
⊤
, exp(−8)I3
!
The deployment environment is also a stochastic linear dynamics as follows:
xt+1 =
1 0.1 0.0046
0 1 0.0885
0 0 0.7788
xt +
0.004
0.0115
0.2212
ut + ηt
, ηt ∼ N
⃗0, exp(−8)I3
where the sampling time is ts = 0.1 s. The state xt ∈ R
3
is defined as xt = [xt
, vt
, at
]
⊤
that are position,
velocity and acceleration of car respectively. The scalar action ut
is also bounded within ut ∈ [−3, 3].
Since the environment is linear, it is not required to use the embedder network. Thus, we train only one
surrogate for the deployment environment µNN with a 2 hidden layer ReLU neural network of dimension
[4, 10, 5, 3]. We also have access to the model of training environment and a trained optimal feedback policy.
Therefore, we perform policy modification through algorithm 5 for deployment environment and the results
are presented in Figure 4.2. In this figure, the green curve presents the simulated optimal trajectory. Blue
and red curves also represent the trajectory of deployment environment in the presence and absence of
policy modification, respectively. This figure shows the algorithm 5 forces the deployment environment to
track the planner τopt and the policy modification process is successful.
Adaptive Cruise Control. Consider the Simulink environment for adaptive cruise control in MATLAB
documentation §
. We consider this trained feedback controller and assume we have access to the model
of training environment. We then simulate the optimal trajectory τopt with model and controller. This
§
https://www.mathworks.com/help/reinforcement-learning/ug/train-ddpg-agent-for-adaptive-cruise-control.html
100
0 100 200
-150
0
60
0 100 200
-15
-10
-5
0
5
10
15
20
25
0 100 200
-4
-2
0
2
4
0 100 200
-3
-2
-1
0
1
2
3
Figure 4.2: Shows the results of policy modification on stochastic linear environment of a car. In this figure,
the green curve presents the simulated optimal trajectory. Blue and red curves also represent the trajectory
of deployment environment in the presence and absence of policy modification, respectively.
0 1000 2000
-10
0
10
0 1000 2000
-10
0
100
Figure 4.3: The green
curves represent the optimal trajectory for vrel and
drel, while the red and
blue curves present the trajectory of deployment environment without policy
adaptation and with adaptation, respectively.
controller is trained over 14 hours, which clearly shows how learning a new controller can be expensive
and justifies the contribution of our technique. The input of the trained controller is the vector x =
[
R
verr, verr, vego]
⊤. Thus, we take this vector as the state of the environment ¶
. This environment is
highly nonlinear due to the presence of logic-based relations in a signal processing block in the model. The
model uses a velocity set-point as a parameter (vset), which in the training environment is set to 30 m/s
while it is set to 34.5 m/s in the deployment environment. This difference characterizes the distribution
shift.
Figure4.3 shows the evolution of the relative velocity and the relative position, vrel, drel between lead
and ego cars. The green line shows the simulation for vrel, drel when the optimal policy, π
∗
is applied to
the training environment. On the other hand, blue and red lines show the evolution of vrel, drel in the
presence and absence of policy modification, respectively. Policy modification process aims to force the
states of deployment environment to track optimal trajectory τopt. Consider the parameter drel < 0 on red
¶Here verr is a logic based function of xego, xlead, vego and vlead. See the MATLAB documentation for more detail. Here,
(vego, vlead) and (xego, xlead) are the velocity and position for ego and lead car, respectively.
101
line at time t = 183s. Thus, the distribution shift can cause a collision in the absence of policy modification.
Figure4.3 shows that our policy modification keeps the system safe from the collision.
102
Chapter 5
Learning-based Statistical Reachability Analysis
5.1 Introduction
Safety-critical cyber-physical systems operate in highly dynamic and uncertain environments. It is common
to model such systems as stochastic dynamical systems where given an initial configuration (or state) of
the system, system parameter values, and a sequence of exogenous inputs to the system, a simulator can
provide a system trajectory. Several executions of the simulator can generate a sample distribution of
the system trajectories, and such a distribution can then be studied with the goal of analyzing safety and
performance specifications of the system. In safety verification analysis, we are interested in checking
if any system trajectory can reach an unsafe state. A popular approach for safety verification considers
only bounded-time safety properties using (bounded-time) reachability analysis [2, 1, 89, 46, 23]. Here, the
typical assumption is that the symbolic dynamics of the simulator (i.e. the equations it uses to provide the
updated state from a previous state and stimuli) are known. Most reachability analysis methods rely on a
deterministic description of the symbolic dynamics and use set-propagation methods to compute a flowpipe
or an overapproximation of the set of states reachable over a specified time horizon. Other methods allow
the system dynamics to be stochastic, but rely on linearity of the dynamics to propagate distributions over
initial states/parameters to compute probabilistic reach sets [171, 172, 4, 65].
103
However, for complex cyber-physical systems, dynamical models may be highly nonlinear or hybrid
with artifacts such as look-up tables, learning-enabled components, and proprietary black-box functions
making the symbolic dynamics either unavailable, or difficult for existing (symbolic) reachability analysis
tools to analyze them. To address this issue, we pursue the idea of model-free analysis, where the idea is
to compute reachable sets for the system from only sampled system trajectories [81, 160]. The main idea
of data-driven reachability analysis in [81] consists of the following main steps: Step 1. Sample system
trajectories based on a user-specified distribution on a parametric set of system uncertainties (such as the
set of initial states). Step 2. Train a data-driven surrogate model to predict the next K states from a given
state (for example, a neural network-based model). Step 3. Perform set-propagation-based reachability
analysis using the surrogate dynamics. Step 4. Inflate the computed flowpipe with a surrogate error term
that guarantees that any actually reached state is within the inflated reach set with probability not smaller
than a user-provided threshold.
There are three main challenges in this overall scheme: (1) In [81], a simple training loss based on
minimizing the mean square error between the surrogate model and the actual system is used. This may
lead to the error distribution to have a heavy tail, which in turn leads to conservatism in the inflated
reach set. (2) The approach in [81] uses the uncertainty quantification technique of conformal inference
to construct the inflated flowpipes, but quantifies surrogate error per trajectory component (i.e, per state
dimension and per trajectory time-step). These per-component-wise probabilistic guarantees are then
combined using union bounding, i.e., using that P(A ∪ B) ≤ P(A) + P(B), leading to conservatism. This
is because requiring a 1 − ϵ probability threshold on the inflated reach set requires stricter probability
thresholds in the conformal inference step per component, i.e., thresholds 1 − ϵ
′ with ϵ
′ =
ϵ
nK , where n
is the number of dimensions and K is the number of time-steps in the trajectory. A stricter probability
threshold induces a larger uncertainty set, which implies greater conservatism. (3) The most significant
real-world challenge is that the surrogate model is usually learned based on the trajectories sampled from
104
the simulator, and thus distributed according to the assumptions on stochasticity made by the simulator.
However, the actual trajectory distribution in the deployed system may change. Typically, such distribution
shifts can be quantified using divergence measures such as an f-divergence or the Wasserstein distance
[169].
To address these challenges, we propose a robust and efficient approach to computing probabilistic
reach sets for stochastic systems, with the following main contributions: (1) We propose novel training
algorithms to obtain surrogate models to forecast trajectories from sampled initial states (or other model
parameters). Instead of minimizing the mean square loss between predicted trajectories and the training
trajectories, we allow minimizing an arbitrary quantile of the loss function. This provides our models with
better overall predictive performance over the entire trajectory space (i.e., over different state dimensions
and time steps). (2) Similar to [81], we utilize conformal inference (CI) to quantify prediction uncertainty.
However, inspired by work in [38], we compute the maximum of the weighted residual errors to compute
the nonconformity score to use with CI which has the effect of normalizing component-wise residuals.
In contrast to [38], which solves a linear complementarity problem to compute these weights, we obtain
these weights when training the surrogate model using gradient descent and backpropagation. (3) Finally,
to address distribution shifts, we use techniques from robust conformal inference [34]. Our analysis is
motivated by [185] and valid for all trajectory distributions corresponding to real-world environments that
are close to the original trajectory distribution used for training the surrogate model; here, the proximity is
measured by a certain f-divergence metric [39].
We show that our training procedure and the use of the max-based nonconformity score noticeably
enhances data efficiency and significantly improves the conservatism in reachability analysis. This improvement in data efficiency is the key factor that enables us to efficiently incorporate robust conformal
inference in our reachability analysis. We empirically validate our algorithms on challenging benchmark
105
problems from the cyber-physical systems community [88], and demonstrate considerable improvement
over prior work.
Related Work.
Reachability Analysis for Stochastic Systems with known Dynamics. Reachability analysis is a widely studied
topic and typically assumes access to the system’s underlying dynamics, and the proposed guarantees are
valid only on the given model dynamics. In [118], the authors propose DeepReach, a method using neural
PDE solvers for Hamilton-Jacobi method-based reachability analysis in high-dimensional systems. While it
incorporates neural methods for reachability analysis, it still requires access to the system dynamics. In
[6], the authors identify Markovian stochastic dynamics from data through specific parametric models,
such as linear or polynomial, followed by reachability analysis on the identified models. In contrast, our
method employs neural networks, which are not confined to Markovian dynamics. The approach in
[183] is an algorithm that sequentially linearizes the dynamics and uses constrained zonotopes for set
representation and computation. In [31], the authors develop a method utilizing Gaussian Processes and
statistical techniques to compute reachable sets of dynamical systems with uncertain initial conditions
or parameters, providing confidence bounds for the reconstruction and bounding the reachable set with
probabilistic confidence, extending to uncertain stochastic models.
In [91], the authors introduce a scalable method utilizing Fourier transforms to compute forward
stochastic reach probability measures and sets for uncontrolled linear systems with affine disturbances.
Similar approaches are explored in [171, 173] for stochastic reachability analysis of linear, potentially
time-varying, discrete-time systems. A constructive method utilizing convex optimization to determine
and compute probabilistic reachable and invariant sets for linear discrete-time systems under stochastic
disturbances is introduced in [62]. We note that most existing techniques are for systems with linear
dynamics, while we permit arbitrary stochastic dynamics. In Thorpe et al. [161], a method utilizing
106
conditional distribution embeddings and random Fourier features is presented to efficiently compute
stochastic reachability safety probabilities for high-dimensional stochastic dynamical systems without prior
knowledge of system structure. We note that this work does not provide finite-data probability guarantees
as we do, but asymptotically converge to the exact reachset.
Probabilistic Guarantees and Reachability Analysis for unknown Stochastic Systems. Recent work has studied
computation of reachable sets with probabilistic guarantees directly from data. In [44], the authors employ
level sets of Christoffel functions [108, 127] to achieve probabilistic reach sets for general nonlinear systems.
Specifically, let vd(x) denote the vector of monomials up to degree d, and let M denote the empirical
moment matrix obtained by computing the expected value of vd(x)
⊤vd(x) by sampling over the set of
reachable states. An empirical inverse Christoffel function Λ
−1
(x) is then defined as vd(x)
⊤M−1vd(x).
The main idea in [43, 159] is to empirically determine Λ
−1
(x) and give probabilistic bounds using the
volume of the actual reachset contained in the sublevel sets of Λ
−1
(x). In [159], the authors extend the
method proposed in [44] by including conformal inference. A key challenge of this approach is estimating
the moment matrix M from data, which may not scale with increasing state dimension n and user-selected
degree d, as the dimension of M is
n+d
d
, and the approach requires inverting M.
In [43], the authors use a Gaussian process-based classifier to distinguish reachable from unreachable states and approximate the reachset. However, the approach requires adaptive sampling of initial
states, which may require solving high-dimensional optimization problems. They also propose an interval abstraction of the reachset, which, though it provides sample complexity bounds, can be overly
conservative and computationally costly in high-dimensional systems. The method in [63] assumes partial
knowledge of the model and leverages data to handle Lipschitz-continuous state-dependent uncertainty;
their reachability analysis combines probabilistic and worst-case analysis. Finally, the work presented
[55] combines simulation-guided reachability analysis with data-driven techniques, utilizing a discrepancy
function estimated from system trajectories, which can be challenging to obtain.
10
Reachability analysis for Neural Networks. Recent approaches have tackled the challenge of determining
the output range of a neural network. These methods aim to compute an interval or a box (a vector of
intervals) that encompasses the outputs of a given neural network. Katz et al. [98] introduced Reluplex, an
SMT-based approach that extends the simplex algorithm to handle ReLU constraints. Huang et al. [91]
employed a refinement-by-layer technique to verify the presence or absence of adversarial examples in
the vicinity of a specific input. Dutta et al. [46] proposed an efficient method using mixed-integer linear
programming to compute the range of a neural network featuring only ReLU activation functions. Tran et
al. [165] proposes star-sets that offer similar expressiveness as hybrid zonotopes and are used to provide
approximate and exact reachability of feed-forward ReLU neural networks.
Definition 5.1.1 (Star set [21]). A star set Y ⊂ R
d
is a tuple ⟨c, V, P⟩ where c ∈ R
d
is the center,
V = {v1, v2, · · · , vm} is a set of m vectors in R
d
called basis vectors, and P : R
m → {⊤, ⊥} is a predicate.
The basis vectors are arranged to form the star’s d × m basis matrix. The set of states represented by the
star is given as:
Y =
(
y | y = c +
Xm
ℓ=1
(µℓvℓ) s.t. P(µ1, · · · , µm) = ⊤
)
.
By the way n our setting, this method was the most applicable technique.
5.2 Problem Statement and Preliminaries
Notation. We use bold letters to represent vectors and vector-valued functions, while calligraphic letters
denote sets and distributions. The set {1, 2, · · · , n} is denoted as [n]. The Minkowski sum is indicated by
⊕. We use x ∼ X to denote that the random variable x is drawn from the distribution X .
Stochastic Dynamical Systems. We consider discrete-time stochastic dynamical systems. While it is
typical to describe such systems using symbolic equations that describe how the system evolves over time,
108
we instead simply model the system as a stochastic process. In other words, let S0, . . . , SK be a set of K + 1
random vectors indexed by times 0, . . . , K. We assume that for all times k, each Sk takes values from the
set of states S ⊆ R
n
. A realization of the stochastic process, or the system trajectory is a sequence of values
s0, . . . , sK, denoted as σ
real
s0
. The joint distribution over S0, . . . , SK is called the trajectory distribution Dreal
S,K
of the system, and the marginal distribution of S0 is called the initial state distribution W. We assume that
the initial state distribution W has support over a compact set of initial states I, i.e., we assume that W is
such that Pr[s0 ∈ I / ] = 0. For example, such a stochastic dynamical system could describe a Markovian
process, where for any k ≥ 1, the distribution of Sk only depends on the realization of Sk−1 and not the
values taken at any past time. However, it is worth noting that the techniques presented in this chapter
can be applied to systems with non-Markovian dynamics. In the rest of the chapter, we largely focus on
just the system trajectories, so we abuse notation to denote s0
W∼ I to signify that s0 is a value sampled
from I using the initial state distribution W.
∗ Similarly, σ
real
s0 ∼ Dreal
S,K is used to denote the sampling of a
trajectory from the trajectory distribution.
Quantification of Distribution Shift. In practice, we usually do not have knowledge of the distribution
Dreal
S,K. However, one may have access to trajectories sampled from a distribution Dsim
S,K that is “close” to
Dreal
S,K, e.g., a simulator. Given a distribution D, we use the notation P(D)to denote a set of distributionsclose
to D, where the notion of proximity is defined using a suitable divergence measure or metric quantifying
distance between distributions. Common examples include f-divergence measures (such as KL-divergence,
total variation distance) and metrics such as the Wasserstein distance [169, 151]. In this chapter, we assume
that Dsim
S,K comes from the ambiguity set P(Dsim
S,K) that is centered at Dsim
S,K using f-divergence balls around
∗W is assumed to be uniform or truncated Gaussian distributed in practice.
109
Dsim
S,K [151].† Given a convex function f : R → R satisfying f(1) = 0 and f(z) = +∞ for z < 0, the
f-divergence [39] between the probability distributions Dsim
S,K and Dreal
S,K that both have support Z is
Df (D
real
S,K ∥ Dsim
S,K) = Z
Z
f
dDreal
S,K
dDsim
S,K !
dD
sim
S,K.
Here, the argument of f is the Radon-Nikodym derivative of Dsim
S,K w.r.t. Dreal
S,K. We define the set Pf,τ (Dsim
S,K)
as a f-divergence ball of radius τ ≥ 0 around Dsim
S,K as
Pf,τ (D
sim
S,K) = n
D
real
S,K | Df (D
real
S,K ∥ Dsim
S,K) ≤ τ
o
.
The radius τ and the function f are both user-specified parameters that quantify the distribution shift
between Dreal
S,K and Dsim
S,K that we have to account for in our reachability analysis. Specifically, we have to
perform reachability analysis for random trajectories σ
real
s0 ∼ Dreal
S,K for all Dreal
S,K ∈ Pf,τ (Dsim
S,K).
Conformal Inference. Conformal inference [174, 111, 110] is a data-efficient statistical tool proposed for
quantifying uncertainty, particularly valuable for assessing the uncertainty in predictions made by machine
learning models [16, 124].
Consider a set of random variables z1, z2, ..., zm+1 where zi = (xi
, yi) ∈ R
n × R for i ∈ [m + 1].
Assume that z1, z2, ..., zm+1 are independent and identically distributed (i.i.d.). Let µ(xi) be a predictor
that estimates outputs yi from inputs xi
. With a pre-defined miscoverage level ϵ ∈ (0, 1), conformal
inference enables computation of a threshold d > 0 and a probabilistic prediction interval C(xm+1) =
[µ(xm+1)−d, µ(xm+1)+d] ⊆ R for ym+1 that guarantees that Pr[ym+1 ∈ C(xm+1)] ≥ 1−ϵ. To compute
the threshold d, we reason over the empirical distribution of the residual errors between the predictor and
the ground truth data. Let Ri
:= |yi − µ(xi)| be the residual error between yi and µ(xi) for i ∈ [m + 1].
†
Examples of f include f(z) = z log(z), which induces the KL-divergence and f(z) = 1
2
| z − 1 |, which induces the total
variation distance.
110
Since the random variables z1, z2, ..., zm+1 are i.i.d., the residuals R1, . . . , Rm+1 are also i.i.d. If m satisfies
ℓ := ⌈(m + 1)(1 − ϵ)⌉ ≤ m, then we take the ℓ
th smallest error among these m values which is equivalent
to
R
∗
1−ϵ = Quantilec
1−ϵ
{R1, . . . , Rm, ∞} , (5.1)
i.e., the (1 − ϵ)-quantile over R1, . . . , Rm, ∞, see [162].
Conformal inference uses this quantile to obtain the probability guarantee Pr[Rm+1 ≤ R∗
1−ϵ
] ≥ (1−ϵ),
see [162, 174]. For the choice of Ri
:= |yi − µ(xi)|, this can be rewritten as
Pr
ym+1 ∈ [µ(xm+1) − R
∗
1−ϵ
, µ(xm+1) + R
∗
1−ϵ
]
≥ 1 − ϵ. (5.2)
The guarantees in (5.2) are marginal‡
, i.e., over the randomness in Rm+1, R1, R2, . . . , Rm. Note that
R∗
1−ϵ
is a provable upper bound for the (1 − ϵ)-quantile§ of the error distribution.
Robust Conformal Inference. Unlike conformal inference, which assumes the data-point zm+1 is sampled
from the same distribution as the calibration samples zi
, i ∈ [m], robust conformal inference relaxes this
assumption and allows zm+1 to be sampled from a different distribution. Let us denote the distribution of zi
for i ∈ [m] as U and the distribution of zm+1 as V . As illustrated before, the residual Ri
is a distribution and
defined as a function of zi
. Let us denote the distribution of Ri for i ∈ [m] with P and the distribution of
Rm+1 with Q. Further, assume Q is in Pf,τ (P). Utilizing the results from [34] that assumes the distribution
‡The guarantees from conformal inference are marginal over all potentially sampled calibration sets. The guarantees over
some fixed calibration set can be shown to be a random variable that has distribution Beta(ℓ, m + 1 − ℓ) [16]. For example,
if m = 104
, we get tight probabilistic guarantees for any ϵ ∈ (0, 1) as the variance of the Beta distribution is bounded by
2.5 × 10−5
.
§
For any ϵ ∈ (0, 1), the (1 − ϵ)-quantile of a random variable R is defined as inf{z ∈ R|Pr[R ≤ z] ≥ 1 − ϵ}.
111
of residual Rm+1 is within a f-divergence ball of the distributions for R1, . . . , Rm with radius τ ≥ 0, for
the miscoverage level ϵ ∈ (0, 1), we obtain:
Pr[Rm+1 ≤ R∗
1−ϵ,τ ] ≥ 1 − ϵ
where R∗
1−ϵ,τ = Quantilec
(1−ϵ¯)
{R1, . . . , Rm, ∞} is a robust (1 − ϵ)-quantile that is equivalent to the
(1 − ϵ¯)-quantile. We refer to ϵ¯ as the adjusted miscoverage level which is computed as ϵ¯ = 1 − g
−1
f,τ (1 − ϵm)
where ϵm is obtained as the solution of a series of convex optimizations problems as¶
:
ϵm = 1 − gf,τ
1 + 1
m
g
−1
f,τ (1 − ϵ)
,
gf,τ (β) = inf n
z ∈ [0, 1]
βf
z
β
+ (1−β)f
1−z
1−β
≤ τ
o
g
−1
f,τ (γ) = sup {β ∈ (0, 1)| gf,τ (β) ≤ γ}
(5.3)
Computation of gf,τ and g
−1
f,τ is efficient since they are both solutions to one dimensional convex
optimization and therefore admit efficient binary search procedures. In some cases, we have also access to
a closed form solution [34].
Example 7. For the total variation, f(z) = 1
2
|z−1|, we have gf,τ (β) = max (0, β − τ ), g
−1
f,τ (γ) = γ+τ, γ ∈
(0, 1 − τ ). This implies that given radius τ ∈ [0, 1] an adjusted miscoverage level ϵ¯ is infeasible if ϵ ≤ τ ,
and ϵ¯ is computed as:
ϵ¯ = 1 −
1 +
1
m
(1 − ϵ + τ ), ϵ ∈ (τ, 1], τ ∈ [0, 1] (5.4)
¶
Following [34], Lemma A.2., we note that gf,τ is related to the worst-case CDF of any distribution with at most τ distribution
shift, and g
−1
is related to the inverse worst-case CDF.
11
Proposition 5.2.1. Assuming Df (Dreal
S,K ∥ Dsim
S,K) = τ
real < τ , then the coverage level of robust δ-quantile
R∗
δ,τ on Dreal
S,K ∈ Pf,τ (Dsim
S,K) is δ
real > ¯δ − τ − 1/m, where m is the number of recorded data, ¯δ = 1 − ϵ¯ and
ϵ¯ is adjusted miscoverage level.
Proof. Since Df (Dreal
S,K ∥ Dsim
S,K) = τ
real < τ , we can also claim Dreal
S,K ∈ Pf,τreal(Dsim
S,K). In this case, we
set δ
real = δ + τ − τ
real and we compute for its robust δ
real-quantile. As a direct result of this choice of
δ
real and the equation (5.4), this selection results in R∗
δ
real,τreal = R∗
δ,τ since the adjusted miscoverage level ϵ¯
will be the same. This implies the coverage level of R∗
δ,τ on Dreal
S,K is δ
real that is apparently larger than δ.
In addition, based on the equation (5.4) we have δ = ¯δ − τ − (δ + τ )/m. Since δ + τ < 1 we can claim
δ > ¯δ − τ − 1/m or in other words, δ
real > ¯δ − τ − 1/m.
Problem Definition. We are given a black-box stochastic dynamical system as the training environment
with the trajectory distribution Dsim
S,K. We assume that when this system is deployed in the real world, the
trajectories satisfy σ
real
s0 ∼ Dreal
S,K ∈ Pf,τ (Dsim
S,K). Given a user-specified failure probability ε ∈ (0, 1) and
an i.i.d. dataset of trajectories sampled from Dsim
S,K, the problem is to obtain a probabilistically guaranteed
flowpipe X that contains σ
real
s0 ∼ Dreal
S,K for all Dreal
S,K ∈ Pf,τ (Dsim
S,K) with a confidence of 1 − ε. Formally,
s0
W∼ I,
σ
real
s0 ∼ Dreal
S,K ∈ Pf,τ (D
sim
S,K)
=⇒ Pr h
σ
real
s0
∈ X
i
≥ 1 − ε (5.5)
In other words, we are interested in computing a probabilistically guaranteed flowpipe X from a set of
trajectories collected from Dsim
S,K so that X is valid for all trajectories Dreal
S,K ∈ Pf,τ (Dsim
S,K), i.e., despite a
potential distribution shift.
113
5.3 Learning A Surrogate Model Suitable for Probabilistic Reachability
Analysis
As we do not have access to the system dynamics in symbolic form, our approach to characterize the
trajectory distribution is to use a predictor, called the surrogate model.
Definition 5.3.1. A surrogate model F : X × Θ → Y is a function that approximates a given function
f : X → Y. Let dY be some metric on Y, then the surrogate model guarantees that for some value of
θ ∈ Θ, and for any x sampled from a distribution over X , the induced distribution over the random variable
dY (F(x; θ), f(x)) has good approximation properties, such as bounds on the moments of the distribution
(e.g. mean value) or bounds on the quantile of the distribution.
In our setting, the set X is the set of states S with the distribution over X being Dsim
S,K and Y is the
set of K-step trajectories S
K, i.e., F maps a given initial state (or an uncertain model parameter) to the
predicted K-step trajectory of the system. The metric dY can be any metric on the trajectory space. One
example surrogate model is a feedforward neural network (NN) with n inputs and Kn outputs, represented
as σ¯s0 = F(s0; θ) where θ is the set of trainable parameters. To train the surrogate model, we need to
define a specific residual error between a set of sampled trajectories and those predicted by the model.
While most surrogate models are trained using the cumulative squared loss across a training dataset [94],
we consider a loss function that helps us reduce conservatism in computing the probabilistic reach set of
the system.
Training a Lipschitz-Bounded NN Based Surrogate Model. Training is a procedure to identify the
parameter value θ which makes the surrogate model a good approximation; we use backpropagatin to train
the surrogate by sampling K-step trajectories from the simulator of the original model. We call this dataset
114
T
trn. The surrogate model predicts the trajectory σ
sim
s0
starting from an initial state sampled from s0
W∼ I.
We denote the predicted trajectory σ¯s0
corresponding to σ
sim
s0
as:
σ¯s0 = [s
⊤
0
, F(s0 ; θ)], where, F(s0 ; θ) =
h
F
1
(s0), · · · , F
n
(s0), · · · , F
(K−1)n+1(s0), · · · , F
nK(s0)
i⊤
.
Here, F
(i−1)n+r
(s0) is the r
th state component at the i
th time-step in the trajectory. In other words, we
stack the dimension and time in the trajectory into a single vector∥
. We remark that a trained surrogate
model with a non-restricted Lipschitz constant is problematic for reachability analysis, as approximation
errors can get uncontrollably magnified resulting in trivial bounds. As a result, we use techniques from
[69] to penalize the Lipschitz constant of the trained NN over the course of the training process.
Residual Error. For training neural network surrogate models, a common practice is to minimize a loss
function, representing the difference between the trajectory predicted by the surrogate model and the
actual trajectory. To formulate this difference, we formally define the notion of the residual error as follows.
Definition 5.3.2 (Residual Error). Let ei ∈ R
n denote the i-th basis vector of R
n
. For a trajectory (s0, σsim
s0
)
with σ
sim
s0
sampled from Dsim
S,K, and s0
W∼ I, we define:
R
j =
e
⊤
j+nσ
sim
s0 − F
j
(s0)
, j ∈ [nK]. (5.6)
∥The main advantage of training the trajectory as a long vector in one shot is that this approach eliminates the problem of
compounding errors in time series prediction; however, this comes with higher training runtimes.
115
Note that Rj
is a non-negative prediction error between the (j +n)
th component∗∗ of σ
sim
s0
and its prediction
F
j
(s0), j ∈ [nK]. The trajectory residual R is then defined as the largest among all scaled, component-wise
prediction errors with scaling factors αj > 0, j ∈ [nK], i.e., R is defined as
R = max
α1R
1
, α2R
2
, · · · , αnKR
nK
. (5.7)
Note that this definition is inspired by [38]
††. Compared to [81], utilizing the maximum of weighted
errors obviates the need to union bound component-wise probability guarantees to obtain a trajectory-level
guarantee. Let Ri = max(α1R1
i
, α2R2
i
, . . . , αnKRnK
i
) for i ∈ [|T trn|] denote the trajectory residual as in
(5.7) for the training dataset T
trn
.
Training Using a ¯δ-Quantile Loss. Let ¯δ = 1 − ϵ¯where ϵ¯ is the adjusted miscoverage level as defined
previously. The ultimate goal from training a surrogate model is to achieve a higher level of accuracy
in our reachability analysis. The mean squared error (MSE) loss function is a popular choice to train
surrogate models; however, we later show that our proposed flowpipe is generated based on the quantile of
the trajectory residual error. Although the MSE loss function is popular and efficient, it may result in a
heavy tailed distribution for the residual error which can imply a noticeably larger quantile and result in
conservative flowpipes. Thus, to improve overall statistical guarantees, we are interested in minimizing the
¯δ-quantile of the trajectory-wise residuals, for an appropriate ¯δ ∈ [0, 1); towards that end, we add a new
trainable parameter q. We can also setup the training process such that the scaling factors α1, . . . , αnK
become decision variables for the optimization problem. Thus, the set of trainable parameters includes the
NN parameters θ, the scaling factors α1, · · · , αnK and the parameter q that approximates the ¯δ-quantile of
the residual loss. We define two loss functions:
∗∗There is offset of n as the first n components of σ
sim
s0
are the initial state.
††In this definition, we consider component-wise residual for R
j
instead of a state-wise residual as the component e
⊤
j+nσ
sim
s0
in
σ
sim
s0 may represent different quantities like velocity or position. State-wise residuals may lead to a higher level of conservatism in
robust conformal inference, as the magnitude of error in different components of a state may be noticeably different.
11
1. The first loss function L1 is to set the trainable parameter q as the ¯δ-quantile of trajectory-wise residuals.
This loss function is inspired from literature on quantile regression [102], and it is a well-known result that
minimizing this function yields q to be the ¯δ-quantile of R1, . . . , R|T trn|
. Thus, given a batch of training
data points of size M < |T trn|, let
L1 =
X
M
i=1
¯δ ReLU(Ri − q) + (1 − ¯δ) ReLU(q − Ri). (5.8)
2. Assuming q as the ¯δ-quantile of the i.i.d. residuals Ri
, we let the second loss function L2 minimize
L2 = q
1
α1
+
1
α2
+ · · · +
1
αnK
. (5.9)
This is motivated by the fact that, for all j ∈ [nK], R
j
i ≤ Ri/αj by the definition of Ri
. Thus, the sum of
errors over the trajectory components is upper bounded by:
UBi = Ri
1
α1
+
1
α2
+ · · · +
1
αnK
, (5.10)
and the ¯δ-quantile of UBi
, i ∈ [|T trn|] is nothing but L2
‡‡
.
Therefore, we define the loss function as,
L = cL1 + L2, (5.11)
‡‡In case we replace L2 with q, the trivial solution for scaling factors is αj = 0, j ∈ [nK]. Therefore, the proposed secondary
loss function L2 also results in avoiding the trivial solution for scaling factors.
117
where c is a large number that penalizes L1 to make sure that q serves as a good approximation for the
¯δ-quantile. The training itself uses standard back-propagation methods for computing the gradient of the
loss function, and uses stochastic gradient descent to train the surrogate model.
Properties of Surrogate Model. We pick neural networks (NN) as surrogate models due to their computational advantages and the ability to fit arbitrary nonlinear functions with low effort in tuning hyperparameters. We note that the input layer of the NN is always of size n (the state dimension), and the output
layer is of size nK (the dimension of the predicted trajectory over K time-steps.) In our experiments, we
choose NNs with 2-3 hidden layers for which we observed good results; picking more hidden layers will
give better training accuracy, but may cause overfitting. In each hidden layer we pick an increasing number
of neurons between n and nK.
5.4 Scalable Data-Driven Reachability Analysis
In this section, we show how we can compute a robust probabilistically guaranteed reach set or flowpipe
X ⊂ R
n(K+1) for a stochastic dynamical system. Given a miscoverage level ϵ, we wish to be at least
(1 − ϵ)-confident about the reach-set that we compute. For brevity, we introduce δ = (1 − ϵ). In the
procedure that we describe, we compute a probabilistically guaranteed δ-confident flowpipe, defined as
follows:
Definition 5.4.1 (δ-Confident Flowpipe). For a given confidence probability δ ∈ (0, 1), a distribution
Dsim
S,K, the radius τ , and a f-divergence ball Pf,τ (Dsim
S,K), we say that X ⊆ R
n(K+1) is a δ-confident flowpipe
if we have Pr[σ
real
s0
∈ X] ≥ δ for any random trajectory σ
real
s0 ∼ Dreal
S,K ∈ Pf,τ (Dsim
S,K) with s0
W∼ I.
Our objective is to compute X while being limited to sample trajectories from the training environment
Dsim
S,K. We will demonstrate that we can compute X with formal probabilistic guarantees by combining
118
reachability analysis on the surrogate model trained from T
trn and error analysis on this model via robust
conformal inference.
Deterministic Reachsets for the Surrogate Models. Using the surrogate model from Section 5.3, we
show how to perform deterministic reachability analysis to get surrogate flowpipes.
Definition 5.4.2 (Surrogate flowpipe). The surrogate flowpipe X¯ ⊂ R
n(K+1) is defined as a superset of
the image of F(I ; θ). Formally, for all s0 ∈ I, we need that [s
⊤
0
, F(s0 ; θ)] ∈ X¯.
Thus, to compute the surrogate flowpipe, we essentially need to compute the image of I w.r.t. the F.
This can be accomplished by performing reachability analysis for neural networks, e.g., using tools such as
[164, 165, 184, 89].
Robust δ-Confident Flowpipes. In spite of training the surrogate model to maximize prediction accuracy,
it is still possible that a predicted trajectory is not accurate, especially when predicting the system trajectory
from a previously unseen initial state. Note also that we trained the surrogate model on trajectory data
from Dsim
S,K. We thus cannot expect the predictor to always perform well on trajectories drawn from Dreal
S,K.
We now show how to quantify this prediction uncertainty using robust conformal inference. To do so, we
first sample an i.i.d. set of trajectories from the training environment Dsim
S,K, which we again denote as the
calibration dataset.
Definition 5.4.3 (Calibration Dataset). The calibration dataset Rcalib is defined as:
Rcalib =
(s0,i, Ri)
s0,i
W∼ I, σsim
s0,i ∼ Dsim
S,K,
Ri = max
α1R1
i
, · · · , αnKRnK
i
.
Here, σ
sim
s0,i refers to the trajectory starting at the i
th initial state sampled from W and the resulting trajectory
from Dsim
S,K, and R
j
i
is as defined in equation (5.6).
11
Remark 5.4.4. It is worth noting that although the data points within a single trajectory may not be i.i.d.,
the trajectory σ
sim
s0
can be treated as an i.i.d. random vector in the R
n(K+1)-space, and subsequently the
residuals are also i.i.d. This is crucial to apply robust conformal inference, which requires that the calibration
set is exchangeable (a weaker form of i.i.d.).
Let J
sim
S,K be the distribution over trajectory-wise residuals for trajectories from σ
sim
s0 ∼ Dsim
S,K. However,
we wish to get information about the trajectory-wise residual R for a trajectory sampled from Dreal
S,K ∈
Pf,τ (Dsim
S,K). Let the distribution of R induced by Dreal
S,K be denoted by J
real
S,K. As a direct result from the data
processing inequality [26], the distribution shift between Dreal
S,K and Dsim
S,K is larger than the distribution
shift between J
real
S,K and J
sim
S,K so that we have J
real
S,K ∈ Pf,τ (J
sim
S,K).
Knowing that J
real
S,K ∈ Pf,τ (J
sim
S,K), we can utilize robust conformal inference in [34] to find a guaranteed
upper bound for the δ-quantile of R. We call this guaranteed upper bound as robust conformalized δquantile, and we denote it with R∗
δ,τ , where, Pr[R ≤ R∗
δ,τ ] ≥ δ. Specifically, we utilize equation (5.3) to
compute R∗
δ,τ from the calibration dataset Rcalib
.
Next, we show that our definition of residual error introduced in (5.7) allows us to use a single trajectorywise nonconformity score for applying robust conformal inference (instead of the component-wise conformal inference as in [81]).
Lemma 5.4.5. Assume R∗
δ,τ is the ¯δ-quantile computed over the residuals Ri from the calibration dataset Rcalib
.
For the residual R = max
α1R1
, α2R2
, · · · , αnKRnK
sampled from the distribution J
real
S,K ∈ Pf,τ (J
sim
S,K),
it holds that
Pr
nK
^
j=1
R
j ≤ R
∗
δ,τ /αj
≥ δ
where Rj
is again the component-wise residual for j ∈ [nK].
12
Proof. The proof follows as the residual R is the maximum of the scaled version of component-wise residuals
so that
R = max
α1R
1
, α2R
2
, · · · , αnKR
nK
⇐⇒
nK
^
j=1
R
j ≤
R
αj
.
Now, since Pr h
R ≤ R∗
δ,τ i
≥ δ as well as R < R∗
δ,τ ⇐⇒ Rj < R∗
δ,τ /αj for all j ∈ [nK], we can claim
that Pr
nK
^
j=1
R
j ≤ R
∗
δ,τ /αj
≥ δ
Next, we introduce the notion of an inflating zonotope to define the inflated flowpipe from the surrogate
flowpipe.
Definition 5.4.6 (Inflating Zonotope). A zonotope Zonotope(b, A) is defined as a centrally symmetric
polytope with b ∈ R
k
as its center, and A = {g1, . . . , gp} is a set of generators, where gi ∈ R
k
, that
represents the set {b + µigi
| µi ∈ [−1, 1]}. Here, we introduce the inflating zonotope with base vector,
A = diag
01×n,
R∗
δ,τ
α1
, · · · ,
R∗
δ,τ
αnK
,
and center, b is the vector 0 of length (n + 1)K; the notation diag(v) represents a diagonal matrix with
the elements of v along its diagonal and off-diagonal elements being zero.
Including this inflating zonotope in our probabilistic reachability analysis leads to the following result.
Theorem 5.4.7. Let X¯ be a surrogate flowpipe of the surrogate model F for the set of initial conditions I. Let
R∗
δ,τ be computed from the calibration dataset Rcalib, as shown before. If we use R∗
δ,τ to construct the inflated
surrogate flowpipe,
X = X¯ ⊕ Zonotope(0, diag([01×n, e)),
e =
R
∗
δ,τ /α1, · · · , R∗
δ,τ /αnK
,
then it holds that X is a δ-confident flowpipe for any σ
real
s0 ∼ Dreal
S,K ∈ Pf,τ (Dsim
S,K) with s0
W∼ I.
12
Proof. Assume again that σ
real
s0 ∼ Dreal
S,K ∈ Pf,τ (Dsim
S,K) with s0
W∼ I, and recall that
R = max[α1R
1
, . . . , αnKR
nK]
where Rj =
e
⊤
j+nσ
real
s0 − F
j
(s0)
. Applying Lemma 5.4.5 results in Pr hVnK
j=1
Rj ≤ R∗
δ,τ /αj
i ≥ δ. We
rephrase this as
Pr
nK
^
j=1
| e
⊤
j+nσ
real
s0 − F
j
(s0) |≤ R
∗
δ,τ /αj
≥ δ.
Next, we define the interval Cj (s0) as
Cj (s0) :=
F
j
(s0) − R
∗
δ,τ /αj , F
j
(s0) + R
∗
δ,τ /αj
and accordingly obtain the guarantee that
Pr
nK
^
j=1
e
⊤
j+nσ
real
s0
∈ Cj (s0)
≥ δ.
Based on this representation, we can now see that
Pr h
σ
real
s0
∈ Zonotope
[s
⊤
0
, F(s0 ; θ)], diag ([01×n, e])i ≥ δ. (5.12)
Finally, since Pr[s0 ∈ I / ] = 0 and X¯ is a surrogate flowpipe for the surrogate model F on I, i.e., s0 ∈ I
implies [s
⊤
0
, F(s0 ; θ)] ∈ X¯, we can conclude,
Zonotope
[s
⊤
0
, F(s0 ; θ)], diag ([01×n, e])
⊂ X¯ ⊕ Zonotope (0, diag ([01×n, e])) = X (5.13)
Consequently, we know that Pr[σ
real
s0
∈ X] ≥ δ holds.
122
We note that the surrogate reachability, and also use of the Minkowski sum in the reachability analysis,
results in some level of conservatism.
Remark 5.4.8. We note that we can even compute the minimum size of the calibration dataset required
to achieve a desired confidence probability δ ∈ (0, 1). Robust conformal inference [34] imposes two
constraints in this regard. The first constraint specifies a relation between the adjusted miscoverage level ϵ¯
and the size of the calibration dataset as ⌈(L + 1)(1 − ϵ¯)⌉ ≤ L. The second constraint is that the ranges
of gf,τ and g
−1
f,τ have to be within [0, 1]. Thus, we can impose (1 + 1/L)g
−1
f,τ (δ) < 1, or in other words
L > ⌈g
−1
f,τ (δ)/(1 − g
−1
f,τ (δ))⌉.
Tightening the Surface Area of the Flowpipe. The scaling factors αj are trained to minimize the sum
of errors over the trajectory components, see equation (5.9). The expression R∗
δ,τ
PnK
j=1 1/αj arising from
(5.9) can also be interpreted as the surface area of the inflating zonotope, see Definition 5.4.6. We now
show how we can update scaling factors after training to reduce the surface area to tighten the δ-confident
flowpipe further. Let us sample a new trajectory dataset T
LP and compute the prediction errors R
j
i
and
residuals Ri for i ∈ [|T LP|], and also their conformalized robust ¯δ-quantile R∗
δ,τ , using the trained scaling
factors αj and surrogate model.
The main idea for an efficient update of the trained scaling factors is as follows. Assume α
′
j
is the updated
version of αj . If this update is such that the updated trajectory residuals max(α
′
1Ri
1
, · · · , α
′
nKRi
nK), i ∈
[|T LP|] are the same as the trajectory residuals Ri under αj , then R∗
δ,τ under the updated α
′
j
remains
the same. By defining ω
′
j = 1/α′
j
, we see that the surface area R∗
δ,τ
PnK
j=1 ω
′
j
of the inflating zonotope
depends linearly on ω
′
j
. On the other hand the constraint Ri = max
R1
i
/ω′
1
, · · · , RnK
i
/ω′
nK
, is a linear
constraint. This constraint can be equivalently represented as
∀i ∈ [|T LP|], j ∈ [nK] Riω
′
j ≥ R
j
i
123
under the additional assumption that the updated scaling factors ω
′
j
are minimized. This means an efficient
update on scaling factors to reduce the surface area can be done via linear programming with decision
variables ω
′
j
, j ∈ [nK], i.e.,
minimize X
nK
j=1
ω
′
j
s.t. ∀i ∈ [|T LP|], j ∈ [nK] ω
′
j ≥ R
j
i
/Ri
, (5.14)
which has the analytical solution ω
′
j = maxi
h
R
j
i
/Ri
i
.
5.5 Experimental Results
To mimic real-world systems that can produce actual trajectory data, we use stochastic difference equationbased models derived from dynamical system models. In these difference equations, we assume additive
Gaussian noise that models uncertainty in observation, dynamics, or even modeling errors.
Our theoretical guarantees depend on knowledge of the distribution shift τ . In practice, however, τ is
usually not known a priori but can be estimated from the data. For the purpose of providing an empirical
examination of our results, we fix τ a priori to compute the δ-confident flowpipe and construct a system
Dreal
S,K from Dsim
S,K by varying system parameters such that J
real
S,K ∈ Pf,τ (J
sim
S,K). We ensure that this holds
by estimating the distribution shift, denoted by τ˜, as the f-divergence between J
sim
S,K and J
real
S,K and by
making sure that τ˜ ≤ τ . In our experiments, we used the total variation distance for f, and used 3 × 105
trajectories to estimate τ˜.
We use ReLU activation functions in our surrogate NN-based models motivated by recent advances in NN
verification with ReLU activations. We specifically use the NNV toolbox from [165] for reachability analysis
of the surrogate model. While other activation functions could be used, we expect more conservative
results in case we utilize non-ReLU activation functions. The approach in [165] uses star-sets (an extension
of zonotopes) to represent the reachable set and employs two main methods: (1) the exact-star method
124
that performs exact but slow computations, (2) the approx-star method that is conservative but faster. To
mitigate the runtime of the exact-star technique and the conservatism of the approx-star technique, set
partitioning can be utilized [163], where initial states are partitioned into sub-regions and reachability is
done on each sub-region in parallel.
As per Theorem 5.6.2, our results are guaranteed to be valid with a confidence of δ. To determine
how tight this bound is, we will empirically examine the computed probabilistic flowpipes. We do so by
sampling i.i.d. trajectories from Dreal
S,K
§§ and computing the ratio of the trajectories that are included in the
probabilistic flowpipes, which we denote by ∆˜ . Additionally, to check the coverage guarantee δ for R∗
δ,τ
directly, we also report the ratio of the trajectories that provide a residual less than R∗
δ,τ , which we denote
with ˜δ. We emphasize that ∆˜ and ˜δ are both expected to be greater than δ.
In the remainder, we first present a case study to compare between reachability with surrogate models
using the mean square error (MSE) and our proposed quantile loss function in (5.11). We show that the
quantile loss function results in tighter probabilistic flowpipes. After that, we present several case studies on
a 12-dimensional quadcopter and the time reversed van Der Pol dynamics. The results are also summarized
in Table 5.1. We visualize our flowpipes by their two-dimensional projection. Therefore, in case a trajectory
is included in all the visualized bounds, it does not necessary mean the trajectory is covered. We instead,
determine the inclusion of traces in our star-sets using the NNV toolbox which determines set inclusion by
solving a linear programming feasibility problem.
Comparison between MSE and Quantile Minimization. Experiment 1. Our first experiment will show
§§We use trajectories close to the worst case where τ˜ is close to τ .
125
the advantage of training a surrogate model with quantile loss function compared to training a surrogate
model using the MSE loss function. Therefore, we model Dsim
S,K as the non-linear system
xk+1 = 0.985yk + sin(0.5xk) − 0.6 sin(xk + yk) − 0.07 + 0.01v1
yk+1 = 0.985xk + cos(0.5yk) − 0.6 cos(xk + yk) − 0.07 + 0.01v2
that generates a periodic motion. Here, v1 and v2 denote random variables sampled from a normal
distribution. In this experiment, we do not consider a shifted stochastic system Dreal
S,K, and instead sample
trajectories from Dsim
S,K for comparison of our two surrogate models. The first surrogate model is trained
as proposed in Section 5.3 using quantile minimization, while the other surrogate model is trained with
the MSE loss function. Our results are shown upfront in Figure 5.1a where we compare the probabilistic
reachable sets of these two models.
In more detail, recall that the scaling factors α1, . . . , αnK of our proposed method in Section 5.3 are
jointly trained with the surrogate model. However, since we do not train these scaling factors jointly when
we use the MSE loss function, we instead compute them beforehand following [185]. In other words, we
normalize the component-wise residuals as
αj = 1/ωj where ωj = max
R
j
1
, Rj
2
, . . . , Rj
|T trn|
.
for each j ∈ [nK]. We utilized |T trn| = 105
random trajectories with K = 50 for training the surrogate
model. The initial states were uniformly sampled from the set of initial states I1 = [−0.5, 0.5]×[−0.5, 0.5].
In both case, we trained a ReLU surrogate model with structure [2, 20, 50, 90, 100] and we applied
approx-star from the NNV [165] toolbox for the reachability analysis. To lower the conservatism of approxstar, we partition the set of initial states into 400 partitions, and perform the surrogate reachability analysis
126
for every partition separately. The flowpipe is also computed for the confidence level of δ ≥ 95%. The
details of the experiment via quantile minimization are also provided in Table 5.1.
We additionally compare the surface area R∗
δ,τ
PnK
j=1 1/αj of the inflating zonotopes, see Definition
5.4.6, for both surrogate models. Note that this surface area is the L2 loss in equation (5.9) when q = R∗
δ,τ ,
which we enforce during training. The ¯δ-quantile of UBi as defined in (5.10) is the L2 loss, and hence
approximates the surface area of the inflating zonotope. To compare the distributions of UBi
, we simulate
3×105
trajectories and compute UBi/(nK) for both the MSE and the quantile loss-based NNs. We present
the histograms of UBi/(nK) for both loss functions in Figure 5.1b where we see that the quantile of UBi
for MSE is larger. This emphasizes the advantage of training via quantile loss function.
12-Dimensional Quadcopter. Next, we consider a 12 dimensional quadcopter model from the benchmarks
in [88] that is designed to hover around a pre-specified elevation. The ODE model for this system is
provided in Figure (3.10), where the state consists of the position and velocity of the quadrotor x1, x2, x3
and x4, x5, x6, respectively, as well as the Euler angles x7, x8, x9, i.e., roll, pitch, and yaw, and the angular
velocities x10, x11, x12.
The dynamics of the system is provided in Eq. (5.15) and the set of initial states is, I2 = {s0 | i ∈
[1, 6] : −0.2 ≤ s0(i) ≤ 0.2, i ≥ 7 : s0(i) = 0}. We have also added additive noise to the system that
is detailed in Table 5.2, and we generate data with time step δt = 0.05 seconds over 100 time steps
(i.e. 5 seconds). The controller is a neural network controller that was presented in [88] to perform the
task of hovering on a specific altitude. We present 3 experiments on this model. Learning a surrogate
model to map the 12-dimensional initial state to a 1200-dimensional trajectory is impractical. We thus
use an interpolation technique to resolve this issue. To that end, we select only certain time-steps of the
1200-dimensional trajectory in order to map the initial state to state values at the selected time steps,
while we take care of the remaining time steps via interpolation. If the trajectories are smooth, as is the
case in this case study, this is expected to work well. We here select every second time-step to extract a
127
Specification Reachability Analysis with Robust CI
Experiment #: Confidency Maximum Number of Conformal Reachability Overal Size of
Underlying of flowpipe, distribution Star-sets inference technique reachability calibration
dynamics i.e. δ shift’s radius from NNV runtime(sec) runtime(sec) dataset (|Rcalib|)
1: Periodic δ = 95% 0 400 0.0892 approx-star 22.5233 10, 000
2: Quadcopter δ = 99.99% 0 64 1.4299 approx-star 160.0486 20, 000
3: Quadcopter δ = 80% 0.15 64 0.4971 approx-star 148.9815 10, 000
4: Quadcopter δ = 70% 0.25 64 0.4971 approx-star 148.9815 10, 000
5: TRVDP δ = 99.99% 0 1 4.8218 exact-star 1.5761 30, 000
6: TRVDP δ = 77% 0.225 1 0.2876 exact-star 0.1404 10, 000
Training
Training Size of training Linear programming Size of dataset for
runtime dataset (|T trn|) runtime linear programming (|T LP|)
Experiment 1: 41 minutes 100, 000 1.7422 seconds 10, 000
Experiment 2: 124 minutes 40, 000 29.3255 seconds 2, 000
Experiment 3,4: 112 minutes 40, 000 21.1406 seconds 2, 000
Experiment 5: 25 minutes 40, 000 6.3559 seconds 50, 000
Experiment 6: 27 minutes 40, 000 2.8236 seconds 10, 000
Examination
Example of induced Coverage Estimation (i.e. ∆˜ ) Coverage Estimation (i.e. ˜δ)
distribution shift’s for flowpipe generated by: for R∗
δ,τ generated by:
radius (i.e. τ˜) Robust CI Vanilla CI Robust CI Vanilla CI
Experiment 1: 0 96.31% 96.31% 95.05% 95.05%
Experiment 2: 0 100% 100% 99.99% 99.99%
Experiment 3: 0.1445 100% 100% 88.58% 70.86%
Experiment 4: 0.2395 100% 100% 80.50% 49.64%
Experiment 5: 0 99.99% 99.99% 99.99% 99.99%
Experiment 6: 0.2085 95.91% 56.52% 95.87% 55.73%
Table 5.1: Shows the detail of our computation process to provide probabilistically guaranteed flowpipes.
The time horizon for experiments 1,5,6 is K = 50 time-steps and for the experiments 2,3,4 is K = 100 timesteps. The sampling time for quadcopter and TRVDP are 0.05 and 0.02 seconds, respectively. We examine
the results with a valid distribution shift (explained in detail in Table 5.2) that is less than the maximum
specified distribution shift in terms of total variation. This shift is estimated through the comparison
between 300, 000 trajectories from Dreal
S,K and Dsim
S,K. We also utilize 10, 000 trajectories (number of trials)
from this specific distribution Dreal
S,K to examine the coverage of flowpipes and 300, 000 trajectories for
examination of the coverage level for R∗
δ,τ (i.e.∆˜ ,
˜δ). To evaluate the contribution of robust conformal
inference, we also solve for the flowpipes again neglecting the distribution shift, i.e. ϵ¯ = ϵ, and show
the coverage guarantee for R∗
δ,τ and flowpipes may get violated, (˜δ < δ or ∆˜ < δ), in case the shifted
distribution (deployment distribution) is considered. The runtimes we report for reachability assumes no
parallel computing.
128
(a) Flowpipe for xk and yk over time steps. The red borders are for flowpipes generated by MSE loss function and the
blue ones are for quantile based loss function. The shaded region shows an approximation of flowpipe by recording
trajectories, and the darkness of the green color shows the density of the trajectories. The black lines are the borders
for the shaded region. The shaded area is generated via 300000 trajectories.
(b) Distribution of UB/(nK) for the MSE and the quantile-based NNs for 3 × 105
samples. The 95%-quantile of
variable UB/(nK) represents the surface area of the obtained inflating zonotope. The figure is cropped for better
visibility.
Figure 5.1: Figures (a) and (b) show a comparison between flowpipes and distributions of UB/(nK)
respectively for training via MSE and training via our proposed loss function (5.11).
600-dimensional trajectory (δt = 0.1, K = 50) to train a surrogate model of structure [12, 200, 400, 600].
Finally we interpolate the sampled 600-dimensional trajectory to approximate the original 1200-dimensional
trajectory (δt = 0.05, K = 100). This interpolation process is integrated in the model in an analytical way,
and is done by multiplying a weight matrix, W ∈ R
1200×600 to the last layer. This converts the model’s
structure to [12, 200, 400, 1200] which will be utilized for the surrogate reachability. The scaling factors
ωj , j ∈ [nK] will be also interpolated for un-sampled time-steps after the training and before the linear
programming.
129
Expt. Dist. Σ for added noise
Gaussian N (0, Σ)
Dsim
S,K
1 uni(I1) diag([0.01, 0.01])2
2 uni(I2) diag([0.05 · ⃗11×6, 0.01 · ⃗11×6])2
3 uni(I2) diag([0.05 · ⃗11×6, 0.01 · ⃗11×6])2
4 uni(I2) diag([0.05 · ⃗11×6, 0.01 · ⃗11×6])2
5,6 uni(I3) diag([0.1, 0.1])2
Dreal
S,K ∈ Pτ,f (Dsim
S,K)
3 uni(I2) Σ × 1.8
4 uni(I2) Σ × 2.2
6 uni(I3) diag([0.1378, 0.1378])2
Table 5.2: Initial state distribution and added Gaussian noise (mean:0, covariance:Σ) for the training and
the shifted environments; uni(I) denotes the uniform distribution over I.
Experiment 2. In comparison with [81], we provide a higher level of data efficiency. Consider a confidence
level of 99.99%, and no distribution shift. We assume a calibration dataset of size |Rcalib| = 2 × 104
to
compute R∗
δ,τ and the δ-confident flowpipe, and a ReLU neural network of structure [12, 20, 400, 1200]
to train the surrogate model. The methodology proposed in [81] requires a calibration dataset of at least
24×106 data-points¶¶ to provide the mentioned level of confidence. On the other hand, we only require 104
trajectories. Figure 5.2 shows the proposed reach set and Table 5.1 presents the detail of the computation
process. Our estimation shows that we achieve ˜δ = 0.9999 via 3 × 105
trials and ∆ = 1 ˜ via 104
trials,
which aligns with our expectations.
Experiments 3, 4. In this case study, we generate a 95% confident flowpipe for the trajectories from Dsim
S,K
and we utilize it to study the distribution shift on two different deployment environments Dreal
S,K. This
flowpipe is plotted in Figure 5.2 and the details of the computation process is included in Tables 5.1 and 5.2.
For this generated flowpipe, given a maximum distribution shift radius τ ∈ [0, 1], the flowpipe’s confidence
level δ for trajectories from Dreal
S,K has to satisfy δ ≥ 0.95 − τ . The bound δ ≥ ¯δ − τ can be derived from
equation (5.4). Therefore, we consider two different scenarios. In Experiment 3, we examine our flowpipe
for the case τ = 0.15. In this case, for a deployment environment with distribution shift, τ <˜ 0.15 we
¶¶Minimum data size in [81] is |Rcalib| > ⌈
1+γ
1−γ
⌉, where γ = 1 −
1−δ
nK .
130
Figure 5.2: This figure shows the proposed flowpipes computed for the quadcopter dynamics for each state
component over the time horizon of 100 time steps with δt = 0.05 that means 5 seconds operation of
quadcopter. The red borders show the flowpipe that contains trajectories from Dsim
S,K with provable coverage
of δ ≥ 99.99%. The green shaded area shows the density of a collection of 300, 000 of these trajectories,
and the darker color means the higher density of traces. The blue borders are also for a flowpipe that
contains the trajectories from distribution Dsim
S,K with δ ≥ 95%. The dotted black line also shows the border
of collected simulated trajectories.
numerically show that ∆˜ ,
˜δ > 0.95 − 0.15 = 0.8. In addition, in Experiment 4, we assume τ = 0.25 and
for a deployment environment with τ <˜ 0.25 we show that ∆˜ ,
˜δ > 0.95 − 0.25 = 0.7. Tables 5.1 and 5.2
show the detail of the experiments and distribution shift respectively.
Time Reversed Van Der Pol Oscillator Dynamics. The time-reversed van Der Pol (TRVDP) dynamics is
known for its inherent instability, which makes it a pernicious challenge for computing reach sets. The
SDE model for TRVDP is:
x˙ 1 x˙ 2
⊤
=
x2 µx2(1 − x
2
1
) − x1
⊤
+ v, µ = −1,
131
(a) (τ, δ) = (0, 0.9999) (b) (τ, δ) = (0.225, 0.77) (c)
Figure 5.3: Shows the density of trajectories starting from I3 versus their computed flowpipes. The green
color-bar represents the density of traces from, Dsim
S,K and the blue color-bar is for traces from Dreal
S,K. The
shaded areas are generated via 3 × 105 different trajectories, and the dotted lines represents their border. a)
Shows two different flowpipes for TRVDP dynamics with confidence level of 0.9999 on Dsim
S,K. The tighter
flowpipe (blue color) utilizes the linear programming (5.14) while the looser one (red color) does not. b)
Shows a flowpipe that covers trajectories from Dreal
S,K with the confidence level of 77% and also covers
the traces from Dsim
S,K with the confidence level of 99.5%. The blue shaded area is for Dreal
S,K and the green
shaded area is for Dsim
S,K. c) Shows the vector field of TRVDP dynamics that illustrates the instability of the
system.
x˙ 1 = cos(x8) cos(x9)x4 + (sin(x7) sin(x8) cos(x9) − cos(x7) sin(x9))x5
+(cos(x7) sin(x8) cos(x9) + sin(x7) sin(x9))x6 + v1
x˙ 2 = cos(x8) sin(x9)x4 + (sin(x7) sin(x8) sin(x9) + cos(x7) cos(x9))x5
+(cos(x7) sin(x8) sin(x9) − sin(x7) cos(x9))x6 + v2
x˙ 3 = sin(x8)x4 − sin(x7) cos(x8)x5 − cos(x7) cos(x8)x6 + v3
x˙ 4 = x12x5 − x11x6 − 9.81 sin(x8) + v4
x˙ 5 = x10x6 − x12x4 + 9.81 cos(x8) sin(x7) + v5
x˙ 6 = x11x4 − x10x5 + 9.81 cos(x8) cos(x7) − 9.81 − u1/1.4 + v6
x˙ 7 = x10 + (sin(x7)(sin(x8)/ cos(x8)))x11 + (cos(x7)(sin(x8)/ cos(x8)))x12 + v7
x˙ 8 = cos(x7)x11 − sin(x7)x12 + v8
x˙ 9 = (sin(x7)/ cos(x8))x11 + (cos(x7)/ cos(x8))x12 + v9
x˙ 10 = −0.9259x11x12 + 18.5185u2 + v10
x˙ 11 = 0.9259x10x12 + 18.5185u3 + v11
x˙ 12 = v12
(5.15)
132
here, v is an additive Gaussian noise, detailed in Table 5.2. We generate data from this dynamics with
sampling time δt = 0.02 seconds, and we target reachability for K = 50 time step. We use a limited set of
initial states I3 = {s0 | [−1.2, −1.2] ≤ s0 ≤ [−1.195, −1.195]} to investigate the instability of the system
dynamics. Our analysis centers on discerning how this instability manifests as a divergence in trajectories
originating from this restricted set of initial states. We also assume a model with structure [2, 50, 90, 100]
to train the surrogate model. We perform two experiments on this system, explained below.
Experiment 5. In this experiment, we target the flowpipe computation for the TRVDP dynamics for the
confidence probability of δ ≥ 99.99% and no distribution shift. Figure 5.3a shows the resulting flowpipe and
Table 5.1 shows the details of the process. In this experiment, we also generate another 0.9999-confident
flowpipe excluding the linear programming (proposed in equation (5.14)) from the process. Figure 5.3a also
compares these flowpipe and shows removing the linear programming increases the level of conservatism.
Experiment 6. We target an arbitrary confidence level of δ ≥ 0.77 for the flowpipe, despite distribution
shifts within radius τ < 0.225 measured in total variation. As suggested by robust conformal inference, we
should target a flowpipe with confidence level of 99.5% = 77% + 22.5% on Dsim
S,K to ensure the confidence
level of 77% on Dreal
S,K. Figure 5.3b shows our probabilistically guaranteed flowpipe, and Tables 5.1,5.2
present the detail of the experiment. These tables also show that, in case we set ϵ¯ = ϵ in reachability
analysis (Vanilla CI) then our flowpipe, violates the guarantee (i.e. δ ≥ 0.77). This emphasizes on the
contribution of robust conformal inference.
5.6 Extension to Longer Horizon Trajectories & Improving Accuracy
The probabilistic guarantee on the inflating hypercube provided in section 5.4 is based on conformal
inference (CI). However, the methods used to integrate CI are overly conservative. The first contribution
here in this section is to address this issue, by combining CI with Principal Component Analysis (PCA),
133
resulting in tighter inflating hypercubes. This approach has proven effective in reducing the conservatism
of reachability analysis.
The approache outlined in 5.4 also encounters scalability challenges when dealing with longer horizon
trajectories. This problem stems from the training strategy for the surrogate model. For example, section
5.4 proposes a single model to map the initial state to the entire trajectory∗∗∗. However, this results in a
model with large width, limiting scalability for extended horizons due to several factors:
• Training such a model is impractical for longer time horizons or higher system dimensions.
• This large width makes exact methods for surrogate reachability infeasible, requiring us to use
conservative techniques.
In response, our second contribution here is to address these challenges, by presenting a new training
strategy that effectively resolves for all of these listed scalability issues.
Remark 5.6.1. Unlike section 5.4 that defines Rj
, j ∈ [nK] as the component-wise residual that is the
absolute value of the prediction error, henceforth we assume it as the prediction error itself.
R
j = e
⊤
j+nσ
sim
s0 − F
j
(s0), j ∈ [nK], (5.16)
and we call it prediction error. In other words, we let this value obtain negative values as well.
The methodology proposed in section 5.4 can be outlined as follows:
Proposition 5.6.2. Let X¯ be a surrogate flowpipe of the surrogate model F for the set of initial conditions I.
Let PE :=
R1
, R2
, · · · , RnK
be the sequence of prediction errors for σ
real
s0 ∼ Dreal
S,K, where s0 ∼ W, and let
δX be the inflating hypercube for PE such that Pr[PE ∈ δX] > δ. Then the inflated reachset X = X¯ ⊕ δX
is a δ-confident flowpipe for σ
real
s0 ∼ Dreal
S,K where s0 ∼ W.
∗∗∗The rationale for this choice is that it is preferable to training a single one-step model that maps the current state to the next
state and then unrolls it over the trajectory, as the one-step model accumulates prediction errors over time.
134
Figure 5.4: This figure shows the division of the trajectory into N different segments σ
sim,q
s0
, q ∈ [N]
As we discussed before, the primary sources of conservatism and inaccuracy in this methodology stem
from the training process for the surrogate model F(s0 ; θ) and the method used to compute the inflating
hypercube δX. In the following sections, we address both issues and propose solutions to improve the
accuracy and scalability of this approach for the reachability analysis.
5.6.1 Scaling Training Strategy for Reachability
In this section, we introduce a new training strategy for the model F(s0 ; θ) that avoids the existing
scalability issues. Figure 5.4 illustrates a realization of a trajectory σ
sim
s0
:= s1, . . . , sK over the horizon K.
In this figure, we divide the time horizon into N segments, each with length Tq, where q ∈ [N]. We denote
each trajectory segment as σ
sim,q
s0
, q ∈ [N], defined as:
σ
sim,q
s0
:= stq+1, stq+2, . . . , stq+Ti
, tq =
X
q−1
ℓ=1
Tℓ
, t1 = 0.
The key idea is to directly link each trajectory segment σ
sim,q
s0
, q ∈ [N] to its initial state s0 ∈ I. Thus,
we can train a separate and independent model Fq(s0 ; θq), q ∈ [N] for each segment, which predicts
σ
sim,q
s0 directly based on the initial state s0. This model is also used to compute surrogate flowpipes for the
trajectory segments X¯
q, q ∈ [N], representing the image of set I through the model Fq(I ; θq).
Here are the reasons why this new training strategy for the trajectory σ
sim
s0
resolves all the scalability
issues listed here:
135
1. Since all surrogate models Fq(s0 ; θq), q ∈ [N] are directly connected to the initial state, we don’t
need unrolling them over the time horizon for prediction of states in σ
sim,q
s0
, q ∈ [N], thus eliminating
the problem of cumulative errors over the time horizon.
2. In this setting, the size of the models Fq(s0 ; θq), q ∈ [N] can be small. The small size of the models
allows for efficient computation of surrogate flowpipes X¯
q := Fq(I ; θq) for each segment via
exact-star reachability analysis†††
.
3. The smaller models Fq(s0 ; θq), q ∈ [N] enable efficient training of accurate models for each trajectory
segment.
4. It eliminates the constraints identified in section 5.4 for performing reachability analysis over long
horizon, making it a better option for real-world applications.
Finally, we can concatenate all the surrogate flowpipes X¯
q, q ∈ [N] to provide a single star-set as the
surrogate flowpipe for the entire trajectory. We later inflate this set with the inflating hypercube to obtain
the δ-confident flowpipe.
5.6.2 PCA Based Inflating Hypercube
As we provide a new definition for residual, henceforth we denote the residual by ρ ∈ R>0. In section 5.4
we proposed the definition of the residual as,
ρ = max
α1|R
1
|, α2|R
2
|, · · · , αnK|R
nK|
. (5.17)
and used that to compute its corresponding inflating hypercube δX. However, this definition imposes two
conservative constraints on the inflating hypercube:
†††However, if the set of initial states I is large and the partitioning of I is not scalable (high dimensional states), we remain
limited to using approx star. Nevertheless, even for this case, the small size of the model significantly reduces the conservatism of
approx star.
13
Figure 5.5: The figure shows the projection of prediction errors for two-dimensional states over a horizon
of K = 2. The left figure illustrates the projection on the (R1
, R2
) axes (e.g., k = 1), and the right figure
displays the projection on the (R3
, R4
) axes (e.g., k = 2). This figure provides a comparison between the
inflating hypercubes for a confidence level δ ∈ (0, 1), generated by the PCA approach (red hypercubes)
and the method proposed in [78] (green hypercubes). It clearly demonstrates the superior accuracy of the
PCA technique compared to the other method. The principal axes for k = 1, 2 are (r
1
, r2
) and (r
3
, r4
),
respectively.
1. The center of the hypercube is always located at the origin.
2. The edges of the hypercube are restricted to be aligned with the direction of the trajectory state
components.
To address these limitations, we propose a new definition for the residual ρ that enables us to overcome
these issues. We integrate the concepts of Conformal Inference (CI) and Principal Component Analysis
(PCA) in our new definition for the residual. This approach provides the principal axes as the orientation of
inflating hypercube and noticeably reduces its size. In other words, our approach enhances the accuracy of
conformal inference by manipulating the coordinate system, inspired by PCA. However, in the context of
CI, altering the coordinate system has also been addressed in other works, such as [167, 152].
To obtain the principal axes, given the simulated trajectories from the training dataset σ
sim
s0,i ∈ T trn, i ∈
|T trn|, for each segment q ∈ [N], we use the trajectory segment σ
sim,q
s0,i and its corresponding surrogate
model Fq(s0,i; θq) to compute the corresponding set of prediction errors. Specifically, for each segment
137
q ∈ [N] and data index i ∈ [|T trn|], we collect: PEq
i =
h
R
tqn+1
i
, Rtqn+2
i
, . . . , R(tq+Tq)n
i
i
, and approximate
the average and covariance as follows:
PE¯
q =
P|T trn|
i=1 PEq
i
|T trn|
, Σ
q =
P|T trn|
i=1
PEq
i− PE¯
q
⊤
PEq− PE¯
q
|T trn|
.
We then apply spectral decomposition on the covariance matrix Σ
q
to obtain the array of eigenvectors
V
q ∈ R
Tqn×Tqn
. Here, the principal axes for the trajectory segment q ∈ [N] are centered on PE¯
q
, and are
aligned with the eigenvectors V
q
ℓ
, ℓ ∈ [Tqn], which are the ℓ-th columns of the matrix V
q
.
Given the initial state, s0 ∼ W, assume a trajectory s1, . . . , sK that is not necessarily sampled from
Dsim
S,K and also is not necessarily a member of the training dataset. For any segment q ∈ [N] of this
trajectory, we map its vector of prediction errors PEq =
Rtqn+1, Rtqn+2, . . . , R(tq+Tq)n
to the principal
axes. We do this with a linear map, as,
h
r
tqn+1, rtqn+2, . . . , r(tq+Tq)n
i
= V
q⊤(PEq − PE¯
q
), (5.18)
and utilize the parameters r
j
, tqn + 1 ≤ j ≤ (tq + Tq)n to define the residual. Collecting the mapped
prediction errors for all segments q ∈ [N], we propose our definition for residual as follows:
ρ := max
|r
1
|
ω1
,
|r
2
|
ω2
, . . . ,
|r
nK|
ωnK
(5.19)
where the scaling factors ωj , j ∈ [nK] are the maximum magnitude of parameters r
j
i
, i ∈ |T trn|, that are
obtained from the training dataset. In other words,
ωj = max(|r
j
1
|, |r
j
2
|, . . . , |r
j
|T trn|
|), j ∈ [nK]. (5.20)
1
Remark 5.6.3. In contrast to Section 5.4, which also trains the scaling factors, we do not train them here
and instead use the fixed values provided by Eq.(5.20). This decision is motivated by the need to avoid
incorporating the mentioned spectral decomposition in the training process, as it can introduce inefficiencies
in the training process. In other word, we don’t use the quantile based loss function as it requires including
the spectral decomposition in the gradient computation.
Although we use the training dataset T
trn to determine hyperparameters, V
q
, PE¯
q
, and ωj for defining
the residual, reusing T
trn to generate the inflating hypercube with robust conformal inference violates CI
rules. Thus, in order to generate the inflating hypercube, we first sample a new i.i.d. set of trajectories from
the training environment Dsim
S,K, which we denote as the calibration dataset.
Definition 5.6.4 (Calibration Dataset). The calibration dataset Rcalib is defined as:
Rcalib =
(s0,i, ρi)
s0,i ∼ W, σsim
s0,i ∼ Dsim
S,K,
ρi = max(|r
1
i
|
ω1
, . . . ,
|r
nK
i
|
ωnK
)
.
Here, σ
sim
s0,i , i ∈ |Rcalib| refers to the trajectory starting at the i
th initial state sampled from W, generated
from Dsim
S,K. The parameters r
j
i
are also as defined in equation (5.18).
Consider sorting the i.i.d. residuals ρi ∼ J sim
S,K collected in the calibration dataset Rcalib by their
magnitude: ρ1 < ρ2 < . . . < ρ|Rcalib|
. Our goal is to provide a provable upper bound for the δ-quantile of a
residual ρ ∼ J real
S,K, given knowledge of a radius τ > 0 such that the total variation TV(J
real
S,K,J
sim
S,K) < τ . In
this case, robust conformal inference [34] suggests using a new rank ℓ
∗
and selecting ρ
∗
δ,τ := ρℓ
∗ as an upper
bound for the residual’s δ-quantile. In other words, for a residual ρ ∼ J real
S,K, we have Pr[ρ < ρ∗
δ,τ ] > δ.
139
Proposition 5.6.5. Assume ρ
∗
δ,τ is the δ-quantile of ρ ∼ J real
S,K, computed over the residuals ρi ∼ J sim
S,K from the
calibration dataset Rcalib where TV(J
real
S,K,J
sim
S,K) < τ . For the residual ρ = max
|r
1
|
ω1
,
|r
2
|
ω2
, . . . ,
|r
nK|
ωnK
sampled from the distribution J
real
S,K, and the trajectory division setting, Tq, q ∈ [N], it holds that,
Pr
P(r
1
, · · · , rnK) = ⊤
> δ
where the predicate P(r
1
, · · · , rnK) is,
P(r
1
, · · · , rnK) = ^
N
q=1
Pq(r
tqn+1
, · · · , r(tq+Tq)n
),
Pq(r
tqn+1
, · · · , r(tq+Tq)n
) :=
(tq+
^
Tq)n
j=tqn+1
−ωjρ
∗
δ,τ ≤ r
j ≤ ωjρ
∗
δ,τ
,
(5.21)
and r
j
is the mapped version of prediction errors Rj
, j ∈ [nK] on principal axes.
Proof. The proof follows as the residual ρ is the maximum of the normalized version of parameters r
j
, j ∈
nK so that
ρ = max
|r
1
|
ω1
,
|r
2
|
ω2
, . . . ,
|r
nK|
ωnK
⇐⇒
nK
^
j=1
|r
j
| ≤ ρωj
.
Now, since we have Pr[ρ ≤ ρ
∗
δ,τ ] ≥ δ and also we have ρ < ρ∗
δ,τ ⇐⇒ |r
j
| < ρ∗
δ,τωj , then for all j ∈ [nK],
we can claim:
Pr
nK
^
j=1
|r
j
| ≤ ρ
∗
δ,τωj
≥ δ.
The guarantee proposed in (5.21) is the reformulation of this results in terms of the division setting,
Tq, q ∈ [N].
14
Referring to Def. 5.1.1, and using the predicates Pq, q ∈ [N] from Proposition 5.6.5 we can introduce
the inflating hypercubes, δXq, q ∈ [N] as star sets. In other words, from equation (5.18), for any q ∈ [N]
we can compute the prediction errors as,
PEq = PE¯
q + V
q
h
r
tqn+1, rtqn+2, . . . , r(tq+Tq)n
i
and based on the definition of starset, (see Def. 5.1.1), this implies, the concatenation of the star sets
δXq = ⟨PE¯
q
, V
q
, Pq⟩, serves as an inflating hypercube for PE.
Figure 5.5 shows the necessity of the PCA approach by illustrating prediction errors of a 2-dimensional
state over 2 consecutive time steps‡‡‡. This figure also provides a schematic of the inflating hypercubes
generated by our residual definition and those generated by the definition proposed in (5.17). Here, we can
also interpret the residual parameters PE¯
q
and V
q
, q ∈ [N] as follows. The primary function of the vectors
PE¯
q
, q ∈ [N] is to reposition the surrogate reachsets X¯
q to locations that require minimal inflation, and the
main role of V
q
is to further reduce the necessary level of inflation.
5.7 Numerical Evaluation
To simulate real-world systems capable of producing actual trajectory data, we employ stochastic difference
equation-based models with additive Gaussian noise to account for uncertainties in observations, dynamics,
and potential modeling errors. Our theoretical guarantees apply to any real-world distribution σ
real
s0
∈ Dreal
S,K,
provided that the residual distribution shift TV(J
sim
S,K,J
real
S,K) is below a given threshold τ . Here we evaluate
our results on three different case studies.
‡‡‡Division setting: K = 2, n = 2, N = 2, and T1 = T2 = 1.
141
Figure 5.6: Shows the comparison with section 5.4. The blue and red borders are projections of our and
their δ-confident flowpipes respectively with δ = 99.99%. The shaded regions show the density of the
trajectories from T
trn
.
Figure 5.7: Shows the projection of our δ-confident flowpipe on each component of the trajectory state.
The shaded area are the simulation of trajectories from T
trn
.
142
Figure 5.8: Shows the projection of our δ-confident flowpipe on the first 8 components of the trajectory state. There is a shift between the distribution of deployment and training environments. The shaded area are the trajectories sampled
from the deployment environment.
0 0.5 1 1.5 2
15
20
25
30
35
40
45
Figure 5.9: Shows the
comparison of angular velocity of the last
rotating mass in presence and absence of
the process noise.
Specification Training Surrogate Reachability Inflating Hypercube
Exp #: δ τ # avg runtime | T trn | # avg runtime(method) runtime | Rcalib |
1 99.99% 0 100 39.6 sec 42, 000 100 1.43 sec (E) 2.08 sec 20, 000
2 99.99% 0 451 33.65 sec 20, 000 4501 0.030 sec (E) 116.58 sec 20, 000
3 95% 4% 400 40.6 sec 10, 000 4000 0.064 sec (A) 142.02 sec 10, 000
Table 5.3: Shows details of the experiments. The models are trained in parallel with 18 CPU workers. Thus,
the average training runtime may vary by selecting different number of workers. The words E, and A
represent exact-star and approx-star, respectively.
5.7.1 12-Dimensional Quadcopter
We consider the 12-dimensional quadcopter system under stochastic conditions for two different case studies.
Trajectories are simulated using two ODE models from Eq. 5.15 and Eq. 3.10 as our simulators with their
introduced set of initial states. The state variables include the quadcopter’s position (x1, x2, x3), velocity
(x4, x5, x6), Euler angles (x7, x8, x9) representing roll, pitch, and yaw angles, and angular velocities
(x10, x11, x12). We also include zero mean additive Gaussian process noise v ∼ N (012×1, Σv) to the
simulators with covariance Σv = diag
[0.05 ×⃗11×6, 0.01 ×⃗11×6]]
2
to provide a stochastic simulation
environment. In both examples, the initial states follow the distribution s0 ∼ W being uniform.
143
5.7.1.1 Experiment 1:[Comparison with Section 5.4]
Here we address Experiment 2 from section 5.5 for comparison of the results. In this experiment, a
quadcopter hovers at a specific elevation, and its trajectories are simulated over a horizon of K = 100 time
steps, with a sampling time of δt = 0.05. The δ-confident flowpipe has a confidence level of δ = 99.99%.
Compared to section 5.5, our approach achieves a higher level of accuracy. This improvement is due to our
training strategy, which allows us to use exact-star for surrogate reachability, and our PCA-based technique,
which results in smaller inflating hypercubes. In this experiment, we use a trajectory division setting of
N = 100, Tq = 1, for q ∈ [N], with ReLU neural network surrogate models structured as [12, 24, 12].
Figure 5.6 shows the projection of the flowpipe on each state in comparison with the results of section 5.5,
and Table 5.3 shows the detail of the experiment.
5.7.1.2 Experiment 2: [Sequential Goal Reaching Task]
In this example, we consider the quadcopter scenario described in section 3.8.1, where a controller is
designed to ensure that the quadcopter accomplishes a sequential goal-reaching task. We also include
the previously mentioned process noise in the simulator to include stochasticity. Given the quadcopter’s
tendency for unpredictable behavior, we significantly reduce the sampling time in this instance. The
trajectories are sampled at a frequency of 1 KHz over a 5-second horizon, resulting in 5000 time steps. Our
objective is to perform reachability analysis for time steps 500 through 5000 with the level of confidence
δ = 99.99%. We propose a trajectory division setting of N = 5000 with Tq = 1 for q ∈ [N]. To reduce the
runtime for model training, we employ analytical interpolation. Specifically, we select every tenth time step
for model training, and for i ∈ 50, 51, . . . , 500, and j ∈ [10], we regenerate all the ReLU neural network
surrogate models using the following formula:
F10i+j = (1 − 0.1j)F10i + 0.1jF10(i+1) (5.22)
144
where the models F10i have a structure of [12, 24, 12]. We then utilize these regenerated ReLU neural
network surrogate models for surrogate reachability through exact star reachability analysis, as well as
error analysis using PCA-based conformal inference. Figure 5.7 shows the resulting flowpipe and Table 5.3
shows the detail of the experiment.
5.7.2 27-Dimensional Powertrain
We use the powertrain system proposed by [9] as our simulator, which is a hybrid system with three modes.
To introduce stochastic conditions, we add zero-mean Gaussian process noise, v ∼ N (⃗027×1, Σv), where
Σv = diag
10−5 ×⃗11×27
, to their simulator, defining the distribution σ
sim
s0 ∼ Dsim
S,K. This system is
highly sensitive to noise, which is a key reason we addressed it in this chapter. For example, Figure 5.9
shows the angular velocity of the last rotating mass, x27, both with and without noise. Following [9],
we simulate trajectories with a sampling time of δt = 0.0005 over a horizon of 2 seconds (K = 4000),
and consider their set of initial states I
§§§. We also define the trajectory division setting as N = 4000,
Tq = 1, q ∈ [N]. The ReLU NN models are with structure [27, 54, 27]. To reduce the training runtime, we
again follow the analytical interpolation strategy we introduced for Experiment 2.
5.7.2.1 Experiment 3: [Reachability with Distribution Shift]
Let’s assume the real world trajectories σ
real
s0 ∼ Dreal
S,K are such that its covariance of process noise is 20%
larger than Σv. In this case, the threshold τ = 0.04 is a valid upper-bound for TV(J
sim
S,K,J
real
S,K). In this
experiment, given the threshold, τ we generate a δ-confident flowpipe for σ
real
s0 with δ = 95%. Figure 5.8
shows the projection of our computed flowpipe on the first 8 components of states, and Table 5.3 shows the
detail of the experiment.
§§§In this case the set I proposed in [9] is a large and high dimensional set, thus the exact star does not scale, and we are
restricted to utilized approx star for surrogate reachability.
145
Chapter 6
Conclusions
The main contribution of this thesis, as the title suggests, is to provide scalable verification and synthesis
techniques in autonomy, focusing on satisfying temporal tasks. To that end, we first introduce the first
formal deterministic verification framework for Signal Temporal Logic (STL) and then enhance control
synthesis scalability to handle increasingly complex temporal tasks. STL specifications capture complex task
requirements with spatial and temporal constraints on the system. We subsequently develop a probabilistic
reachability analysis method, which remains data-efficient and scales well to real-world applications. Below
is a summary of each chapter in this thesis.
Chapter 2: We introduce the STL2NN, an end-to-end vectorized computation graph provided as a
feedforward neural network that represents the quantitative semantics of STL properties. This encoding
reduces STL verification in discrete-time neural dynamical systems to a forward image computation
(i.e., reachability), which opens the door to leveraging advanced tools. For neural network models with
general activation functions, we propose sound and complete verification methods that utilizes reachability
for verification and also estimates the Lipschitz constant and applies sampling to verify the temporal
specification. This approach is the first deterministic verification framework for temporal logic specifications
and demonstrates promising scalability for both model sizes and specifications.
146
Chapter 3: We present LB4TL, a smooth computation graph designed to lower-bound the robustness
degree of a discrete-time STL specification. The scalability that STL2NN provides for a training algorithm,
reduces the training time from hours to minutes and LB4TL is a smooth under-approximation for STL2NN
that also provides the scalability in training. Our neurosymbolic algorithm employs informative gradients from LB4TL to design neural network controllers that meet discrete-time STL requirements. The
proposed training algorithm demonstrates its effectiveness across multiple case studies, showing notable
improvements over existing methods. Neural network feedback controllers offer robustness against noise
and uncertainties, making them advantageous over open-loop controllers. On the other hand, the repetition
of these NN structures over the trajectory imposes challenges such as exploding gradients, particularly
over long time horizons or high-dimensional systems. To address this, we developed a gradient sampling
method inspired by dropout[155] and stochastic depth [90] and validated it on various challenging control
synthesis problems. We also showed incorporating critical time in our sampling technique helps us to boost
the training efficiency, and reducing the convergence rate significantly.
Chapter 4: In this chapter, we formalize the problem of adapting the policy of a stochastic dynamical
system to distribution shifts in its dynamics across its training and deployment environments. We propose
an approach based on learning a neural network-based surrogate model for the deployment environment
and finding modified actions that guarantee that the system tracks the desired optimal trajectory (obtained
during training) with minimal error. The problem of finding modified actions can be formulated as a general
nonlinear optimization problem that can be solved using heuristic techniques. However, we show that we
can convexify the problem and combine it with propagating ellipsoidal uncertainty sets through neural
networks to scalably obtain adapted policies that perform better.
Chapter 5: This chapter addresses challenges in data-driven reachability analysis for stochastic dynamical systems, specifically focusing on distribution shifts between training and test environments. By
leveraging a dataset of K-step trajectories, the approach constructs a probabilistic flowpipe, ensuring
147
that the probability of trajectory violation remains below a user-defined threshold even in the presence
of distribution shifts. We propose the reliable guarantees with higher data-efficiency compared to the
existing techniques assuming knowledge of an upper bound for distribution shift. The methodology relies
on three key principles: surrogate model learning, reachability analysis using the surrogate model, and
robust conformal inference for probabilistic guarantees. We illustrated the efficacy of our approach via
reachability analysis on high-dimensional systems like a 12-dimensional quadcopter and unstable systems
like the time-reversed van Der Pol oscillator. We also introduced a scalable technique for reachability in
real-world settings. Our results demonstrate that integrating PCA with Conformal inference significantly
enhances the accuracy of error analysis. We validated the effectiveness of our approach across three distinct
high-dimensional environments.
148
Bibliography
[1] Alessandro Abate, Saurabh Amin, Maria Prandini, John Lygeros, and Shankar Sastry.
“Computational approaches to reachability analysis of stochastic hybrid systems”. In: Proc. of HSCC.
2007, pp. 4–17.
[2] Alessandro Abate, Maria Prandini, John Lygeros, and Shankar Sastry. “Probabilistic reachability
and safety for controlled discrete time stochastic hybrid systems”. In: Automatica 44.11 (2008),
pp. 2724–2734.
[3] Houssam Abbas and Georgios Fainekos. “Computing descent direction of MTL robustness for
non-linear systems”. In: 2013 American Control Conference. IEEE. 2013, pp. 4405–4410.
[4] Dieky Adzkiya, Bart De Schutter, and Alessandro Abate. “Computational techniques for
reachability analysis of max-plus-linear systems”. In: Automatica 53 (2015), pp. 293–302.
[5] Takumi Akazaki and Ichiro Hasuo. “Time robustness in MTL and expressivity in hybrid system
falsification”. In: International Conference on Computer Aided Verification. Springer. 2015,
pp. 356–374.
[6] Amr Alanwar, Anne Koch, Frank Allgöwer, and Karl Henrik Johansson. “Data-driven reachability
analysis from noisy data”. In: IEEE Transactions on Automatic Control (2023).
[7] Ethem Alpaydin. Introduction to machine learning. MIT press, 2020.
[8] Matthias Althoff. “An Introduction to CORA 2015.” In: ARCH@ CPSWeek 34 (2015), pp. 120–151.
[9] Matthias Althoff and Bruce H Krogh. “Avoiding geometric intersection operations in reachability
analysis of hybrid systems”. In: Proceedings of the 15th ACM international conference on Hybrid
Systems: Computation and Control. 2012, pp. 45–54.
[10] Rajeev Alur. Techniques for automatic verification of real-time systems. stanford university, 1991.
[11] Rajeev Alur, Tomás Feder, and Thomas A Henzinger. “The benefits of relaxing punctuality”. In:
Journal of the ACM (JACM) 43.1 (1996), pp. 116–146.
149
[12] Aaron D Ames, Jessy W Grizzle, and Paulo Tabuada. “Control barrier function based quadratic
programs with application to adaptive cruise control”. In: 53rd IEEE Conference on Decision and
Control. IEEE. 2014, pp. 6271–6278.
[13] Aaron D Ames, Xiangru Xu, Jessy W Grizzle, and Paulo Tabuada. “Control barrier function based
quadratic programs for safety critical systems”. In: IEEE Transactions on Automatic Control 62.8
(2016), pp. 3861–3876.
[14] Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané.
“Concrete problems in AI safety”. In: arXiv preprint arXiv:1606.06565 (2016).
[15] Erling D Andersen and Knud D Andersen. “The MOSEK interior point optimizer for linear
programming: an implementation of the homogeneous algorithm”. In: High performance
optimization. Springer, 2000, pp. 197–232.
[16] Anastasios N Angelopoulos and Stephen Bates. “A gentle introduction to conformal prediction and
distribution-free uncertainty quantification”. In: arXiv preprint arXiv:2107.07511 (2021).
[17] Yashwanth Annpureddy, Che Liu, Georgios Fainekos, and Sriram Sankaranarayanan. “S-taliro: A
tool for temporal logic falsification for hybrid systems”. In: International Conference on Tools and
Algorithms for the Construction and Analysis of Systems. Springer. 2011, pp. 254–257.
[18] Kendall Atkinson, Weimin Han, and David E Stewart. Numerical solution of ordinary differential
equations. John Wiley & Sons, 2011.
[19] Trevor Avant and Kristi A Morgansen. “Analytical bounds on the local lipschitz constants of
affine-relu functions”. In: arXiv preprint arXiv:2008.06141 (2020).
[20] Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. “Layer normalization”. In: arXiv preprint
arXiv:1607.06450 (2016).
[21] Stanley Bak and Parasara Sridhar Duggirala. “Simulation-equivalent reachability of large linear
systems with inputs”. In: International Conference on Computer Aided Verification. Springer. 2017,
pp. 401–420.
[22] Anand Balakrishnan and Jyotirmoy V Deshmukh. “Structured reward shaping using signal
temporal logic specifications”. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and
Systems (IROS). IEEE. 2019, pp. 3481–3486.
[23] Somil Bansal, Mo Chen, Sylvia Herbert, and Claire J Tomlin. “Hamilton-jacobi reachability: A brief
overview and recent advances”. In: Proc. of CDC. 2017, pp. 2242–2253.
[24] Ezio Bartocci, Jyotirmoy V. Deshmukh, Alexandre Donzé, Georgios E. Fainekos, Oded Maler,
Dejan Nickovic, and Sriram Sankaranarayanan. “Specification-based Monitoring of Cyber-Physical
Systems: A Survey on Theory, Tools and Applications”. In: Springer, 2017.
[25] Randal Beard. “Quadrotor dynamics and control rev 0.1”. In: (2008).
150
[26] Normand J Beaudry and Renato Renner. “An intuitive proof of the data processing inequality”. In:
arXiv preprint arXiv:1107.0740 (2011).
[27] Calin Belta, Boyan Yordanov, and Ebru Aydin Gol. Formal methods for discrete-time dynamical
systems. Vol. 89. Springer, 2017.
[28] Luigi Berducci, Edgar A Aguilar, Dejan Ničković, and Radu Grosu. “Hierarchical Potential-based
Reward Shaping from Task Specifications”. In: arXiv e-prints (2021), arXiv–2110.
[29] Dimitri Bertsekas and Steven E Shreve. Stochastic optimal control: the discrete-time case. Vol. 5.
Athena Scientific, 1996.
[30] Roderick Bloem, Krishnendu Chatterjee, and Barbara Jobstmann. “Graph games and reactive
synthesis”. In: Handbook of model checking (2018), pp. 921–962.
[31] Luca Bortolussi and Guido Sanguinetti. “A statistical approach for computing reachability of
non-linear and stochastic dynamical systems”. In: International Conference on Quantitative
Evaluation of Systems. Springer. 2014, pp. 41–56.
[32] Alper Kamil Bozkurt, Yu Wang, Michael M Zavlanos, and Miroslav Pajic. “Control synthesis from
linear temporal logic specifications using model-free reinforcement learning”. In: 2020 IEEE
International Conference on Robotics and Automation (ICRA). IEEE. 2020, pp. 10349–10355.
[33] Maxime Cauchois, Suyash Gupta, Alnur Ali, and John C Duchi. “Robust validation: Confident
predictions even when distributions shift”. In: Journal of the American Statistical Association (2024),
pp. 1–66.
[34] Maxime Cauchois, Suyash Gupta, Alnur Ali, and John C Duchi. “Robust validation: Confident
predictions even when distributions shift”. In: Journal of the American Statistical Association (2024),
pp. 1–66.
[35] Paul Christiano, Zain Shah, Igor Mordatch, Jonas Schneider, Trevor Blackwell, Joshua Tobin,
Pieter Abbeel, and Wojciech Zaremba. “Transfer from simulation to real world through learning
deep inverse dynamics model”. In: arXiv preprint arXiv:1610.03518 (2016).
[36] Kurtland Chua, Roberto Calandra, Rowan McAllister, and Sergey Levine. “Deep reinforcement
learning in a handful of trials using probabilistic dynamics models”. In: Advances in neural
information processing systems 31 (2018).
[37] Ignasi Clavera, David Held, and Pieter Abbeel. “Policy transfer via modularity and reward guiding”.
In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE. 2017,
pp. 1537–1544.
[38] Matthew Cleaveland, Insup Lee, George J Pappas, and Lars Lindemann. “Conformal prediction
regions for time series using linear complementarity programming”. In: Proceedings of the AAAI
Conference on Artificial Intelligence. Vol. 38. 19. 2024, pp. 20984–20992.
[39] Imre Csiszár. “A class of measures of informativity of observation channels”. In: Periodica
Mathematica Hungarica 2.1-4 (1972), pp. 191–213.
151
[40] Herbert A David and Haikady N Nagaraja. Order statistics. John Wiley & Sons, 2004.
[41] Marc Peter Deisenroth, Dieter Fox, and Carl Edward Rasmussen. “Gaussian processes for
data-efficient learning in robotics and control”. In: IEEE transactions on pattern analysis and
machine intelligence 37.2 (2013), pp. 408–423.
[42] Jyotirmoy V Deshmukh, James Kapinski, Tomoya Yamaguchi, and Danil V Prokhorov. “Learning
Deep Neural Network Controllers for Dynamical Systems with Safety Guarantees”. In: ICCAD.
2019, pp. 1–7.
[43] Alex Devonport and Murat Arcak. “Data-driven reachable set computation using adaptive Gaussian
process classification and Monte Carlo methods”. In: Proc. of ACC. 2020, pp. 2629–2634.
[44] Alex Devonport, Forest Yang, Laurent El Ghaoui, and Murat Arcak. “Data-driven reachability
analysis with Christoffel functions”. In: Proc. of CDC. 2021, pp. 5067–5072.
[45] Alexandre Donzé and Oded Maler. “Robust satisfaction of temporal logic over real-valued signals”.
In: International Conference on Formal Modeling and Analysis of Timed Systems. Springer. 2010,
pp. 92–106.
[46] Souradeep Dutta, Xin Chen, Susmit Jha, Sriram Sankaranarayanan, and Ashish Tiwari. “Sherlock-a
tool for verification of neural network feedback systems: demo abstract”. In: Proc. of HSCC. 2019,
pp. 262–263.
[47] Souradeep Dutta, Xin Chen, and Sriram Sankaranarayanan. “Reachability analysis for neural
feedback systems using regressive polynomial rule inference”. In: Proceedings of the 22nd ACM
International Conference on Hybrid Systems: Computation and Control. 2019, pp. 157–168.
[48] Souradeep Dutta, Susmit Jha, Sriram Sanakaranarayanan, and Ashish Tiwari. “Output range
analysis for deep neural networks”. In: arXiv preprint arXiv:1709.09130 (2017).
[49] Ruediger Ehlers. “Formal verification of piece-wise linear feed-forward neural networks”. In:
International Symposium on Automated Technology for Verification and Analysis. Springer. 2017,
pp. 269–286.
[50] Benjamin Eysenbach, Swapnil Asawa, Shreyas Chaudhari, Ruslan Salakhutdinov, and
Sergey Levine. “Off-Dynamics Reinforcement Learning: Training for Transfer with Domain
Classifiers”. In: arXiv preprint arXiv:2006.13916 (2020).
[51] Georgios Fainekos and George J. Pappas. “Robustness of Temporal Logic Specifications”. In: Formal
Approaches to Testing and Runtime Verification. Vol. 4262. LNCS. Springer, 2006, pp. 178–192.
[52] Georgios E Fainekos, Antoine Girard, Hadas Kress-Gazit, and George J Pappas. “Temporal logic
motion planning for dynamic robots”. In: Automatica 45.2 (2009), pp. 343–352.
[53] Georgios E Fainekos and George J Pappas. “Robustness of temporal logic specifications”. In: Formal
approaches to software testing and runtime verification. Springer, 2006, pp. 178–192.
152
[54] Georgios E Fainekos and George J Pappas. “Robustness of temporal logic specifications for
continuous-time signals”. In: Theoretical Computer Science 410.42 (2009), pp. 4262–4291.
[55] Chuchu Fan, Bolun Qi, Sayan Mitra, and Mahesh Viswanathan. “DryVR: Data-driven verification
and compositional reasoning for automotive systems”. In: International Conference on Computer
Aided Verification. Springer. 2017, pp. 441–461.
[56] Bin Fang, Shidong Jia, Di Guo, Muhua Xu, Shuhuan Wen, and Fuchun Sun. “Survey of imitation
learning for robotic manipulation”. In: International Journal of Intelligent Robotics and Applications 3
(2019), pp. 362–369.
[57] Samira S Farahani, Vasumathi Raman, and Richard M Murray. “Robust model predictive control for
signal temporal logic synthesis”. In: IFAC-PapersOnLine 48.27 (2015), pp. 323–328.
[58] Mahyar Fazlyab, Manfred Morari, and George J Pappas. “Probabilistic Verification and Reachability
Analysis of Neural Networks via Semidefinite Programming”. In: arXiv preprint arXiv:1910.04249
(2019).
[59] Mahyar Fazlyab, Manfred Morari, and George J Pappas. “Safety verification and robustness analysis
of neural networks via quadratic constraints and semidefinite programming”. In: CoRR
abs:1903.01287 (2019).
[60] Mahyar Fazlyab, Manfred Morari, and George J Pappas. “Safety verification and robustness analysis
of neural networks via quadratic constraints and semidefinite programming”. In: IEEE Transactions
on Automatic Control (2020).
[61] Mahyar Fazlyab, Alexander Robey, Hamed Hassani, Manfred Morari, and George Pappas. “Efficient
and accurate estimation of lipschitz constants for deep neural networks”. In: Advances in Neural
Information Processing Systems 32 (2019).
[62] Mirko Fiacchini and Teodoro Alamo. “Probabilistic reachable and invariant sets for linear systems
with correlated disturbance”. In: Automatica 132 (2021), p. 109808.
[63] Jaime F Fisac, Anayo K Akametalu, Melanie N Zeilinger, Shahab Kaynama, Jeremy Gillula, and
Claire J Tomlin. “A general safety framework for learning-based control in uncertain robotic
systems”. In: IEEE Transactions on Automatic Control 64.7 (2018), pp. 2737–2752.
[64] Jie Fu and Ufuk Topcu. “Probably approximately correct MDP learning and control with temporal
logic constraints”. In: arXiv preprint arXiv:1404.7073 (2014).
[65] Ting Gan, Mingshuai Chen, Yangjia Li, Bican Xia, and Naijun Zhan. “Reachability analysis for
solvable dynamical systems”. In: IEEE Transactions on Automatic Control 63.7 (2017), pp. 2003–2018.
[66] Carlos E Garcia, David M Prett, and Manfred Morari. “Model predictive control: Theory and
practice—A survey”. In: Automatica 25.3 (1989), pp. 335–348.
[67] Yann Gilpin, Vince Kurtz, and Hai Lin. “A smooth robustness measure of signal temporal logic for
symbolic control”. In: IEEE Control Systems Letters 5.1 (2020), pp. 241–246.
153
[68] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016.
[69] Henry Gouk, Eibe Frank, Bernhard Pfahringer, and Michael J Cree. “Regularisation of neural
networks by enforcing lipschitz continuity”. In: Machine Learning 110 (2021), pp. 393–416.
[70] Meng Guo and Michael M Zavlanos. “Probabilistic motion planning under temporal tasks and soft
constraints”. In: IEEE Transactions on Automatic Control 63.12 (2018), pp. 4051–4066.
[71] Dylan Hadfield-Menell, Smitha Milli, Pieter Abbeel, Stuart J Russell, and Anca Dragan. “Inverse
reward design”. In: Advances in neural information processing systems 30 (2017).
[72] Sofie Haesaert, Sadegh Soudjani, and Alessandro Abate. “Temporal logic control of general Markov
decision processes by approximate policy refinement”. In: IFAC-PapersOnLine 51.16 (2018),
pp. 73–78.
[73] Nathaniel Hamilton, Preston K Robinette, and Taylor T Johnson. “Training agents to satisfy timed
and untimed signal temporal logic specifications with reinforcement learning”. In: International
Conference on Software Engineering and Formal Methods. Springer. 2022, pp. 190–206.
[74] Mohammadhosein Hasanbeig, Alessandro Abate, and Daniel Kroening. “Logically-constrained
reinforcement learning”. In: arXiv preprint arXiv:1801.08099 (2018).
[75] Navid Hashemi, Mahyar Fazlyab, and Justin Ruths. “Performance Bounds for Neural Network
Estimators: Applications in Fault Detection”. In: 2021 American Control Conference (ACC). 2021,
pp. 3260–3266. doi: 10.23919/ACC50511.2021.9482752.
[76] Navid Hashemi, Bardh Hoxha, Danil Prokhorov, Georgios Fainekos, and Jyotirmoy Deshmukh.
“Scaling Learning based Policy Optimization for Temporal Tasks via Dropout”. In: arXiv preprint
arXiv:2403.15826 (2024).
[77] Navid Hashemi, Bardh Hoxha, Tomoya Yamaguchi, Danil Prokhorov, Georgios Fainekos, and
Jyotirmoy Deshmukh. “A Neurosymbolic Approach to the Verification of Temporal Logic
Properties of Learning-enabled Control Systems”. In: ICCPS. 2023, pp. 98–109.
[78] Navid Hashemi, Lars Lindemann, and Jyotirmoy V Deshmukh. “Statistical reachability analysis of
stochastic cyber-physical systems under distribution shift”. In: arXiv preprint arXiv:2407.11609
(2024).
[79] Navid Hashemi, Xin Qin, Jyotirmoy V Deshmukh, Georgios Fainekos, Bardh Hoxha,
Danil Prokhorov, and Tomoya Yamaguchi. “Risk-awareness in learning neural controllers for
temporal logic objectives”. In: 2023 American Control Conference (ACC). IEEE. 2023, pp. 4096–4103.
[80] Navid Hashemi, Xin Qin, Jyotirmoy V. Deshmukh, Georgios Fainekos, Bardh Hoxha,
Danil Prokhorov, and Tomoya Yamaguchi. “Risk-Awareness in Learning Neural Controllers for
Temporal Logic Objectives”. In: (ACC). 2023, pp. 4096–4103.
[81] Navid Hashemi, Xin Qin, Lars Lindemann, and Jyotirmoy V Deshmukh. “Data-driven reachability
analysis of stochastic dynamical systems with conformal inference”. In: 2023 62nd IEEE Conference
on Decision and Control (CDC). IEEE. 2023, pp. 3102–3109.
154
[82] Navid Hashemi, Justin Ruths, and Jyotirmoy V Deshmukh. “Convex Optimization-based Policy
Adaptation to Compensate for Distributional Shifts”. In: 2023 62nd IEEE Conference on Decision and
Control (CDC). IEEE. 2023, pp. 5376–5383.
[83] Navid Hashemi, Justin Ruths, and Mahyar Fazlyab. “Certifying incremental quadratic constraints
for neural networks via convex optimization”. In: Learning for Dynamics and Control. PMLR. 2021,
pp. 842–853.
[84] Navid Hashemi, Samuel Williams, Bardh Hoxha, Danil Prokhorov, Georgios Fainekos, and
Jyotirmoy Deshmukh. “LB4TL: A Smooth Semantics for Temporal Logic to Train Neural Feedback
Controllers”. In: IFAC-PapersOnLine 58.11 (2024), pp. 183–188.
[85] Wataru Hashimoto, Kazumune Hashimoto, and Shigemasa Takai. “Stl2vec: Signal temporal logic
embeddings for control synthesis with recurrent neural networks”. In: IEEE Robotics and
Automation Letters 7.2 (2022), pp. 5246–5253.
[86] Hsi-Ming Ho, Joël Ouaknine, and James Worrell. “Online monitoring of metric temporal logic”. In:
International Conference on Runtime Verification. Springer. 2014, pp. 178–192.
[87] Chao Huang, Jiameng Fan, Xin Chen, Wenchao Li, and Qi Zhu. “POLAR: A Polynomial Arithmetic
Framework for Verifying Neural-Network Controlled Systems”. In: arXiv preprint arXiv:2106.13867
(2021).
[88] Chao Huang, Jiameng Fan, Xin Chen, Wenchao Li, and Qi Zhu. “POLAR: A polynomial arithmetic
framework for verifying neural-network controlled systems”. In: International Symposium on
Automated Technology for Verification and Analysis. Springer. 2022, pp. 414–430.
[89] Chao Huang, Jiameng Fan, Wenchao Li, Xin Chen, and Qi Zhu. “Reachnn: Reachability analysis of
neural-network controlled systems”. In: ACM Transactions on Embedded Computing Systems (TECS)
18.5s (2019), pp. 1–22.
[90] Gao Huang, Yu Sun, Zhuang Liu, Daniel Sedra, and Kilian Q Weinberger. “Deep networks with
stochastic depth”. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The
Netherlands, October 11–14, 2016, Proceedings, Part IV 14. Springer. 2016, pp. 646–661.
[91] Xiaowei Huang, Marta Kwiatkowska, Sen Wang, and Min Wu. “Safety verification of deep neural
networks”. In: International conference on computer aided verification. Springer. 2017, pp. 3–29.
[92] Radoslav Ivanov, James Weimer, Rajeev Alur, George J Pappas, and Insup Lee. “Verisig: verifying
safety properties of hybrid systems with neural network controllers”. In: Proceedings of the 22nd
ACM International Conference on Hybrid Systems: Computation and Control. 2019, pp. 169–178.
[93] Achin Jain, Francesco Smarra, and Rahul Mangharam. “Data predictive control using regression
trees and ensemble learning”. In: 2017 IEEE 56th annual conference on decision and control (CDC).
IEEE. 2017, pp. 4446–4451.
[94] William James and Charles Stein. “Estimation with quadratic loss”. In: Breakthroughs in statistics:
Foundations and basic theory. Springer, 1992, pp. 443–460.
155
[95] Matt Jordan and Alexandros G Dimakis. “Exactly computing the local lipschitz constant of relu
networks”. In: Advances in Neural Information Processing Systems 33 (2020), pp. 7344–7353.
[96] Krishna C Kalagarla, Rahul Jain, and Pierluigi Nuzzo. “Synthesis of discounted-reward optimal
policies for Markov decision processes under linear temporal logic specifications”. In: arXiv
preprint arXiv:2011.00632 (2020).
[97] Parv Kapoor, Anand Balakrishnan, and Jyotirmoy V Deshmukh. “Model-based reinforcement
learning from signal temporal logic specifications”. In: arXiv preprint arXiv:2011.04950 (2020).
[98] Guy Katz, Clark Barrett, David L Dill, Kyle Julian, and Mykel J Kochenderfer. “Reluplex: An
efficient SMT solver for verifying deep neural networks”. In: International conference on computer
aided verification. Springer. 2017, pp. 97–117.
[99] Guy Katz, Derek A Huang, Duligur Ibeling, Kyle Julian, Christopher Lazarus, Rachel Lim,
Parth Shah, Shantanu Thakoor, Haoze Wu, Aleksandar Zeljić, et al. “The marabou framework for
verification and analysis of deep neural networks”. In: International Conference on Computer Aided
Verification. Springer. 2019, pp. 443–452.
[100] James Kennedy and Russell Eberhart. “Particle swarm optimization”. In: Proceedings of
ICNN’95-international conference on neural networks. Vol. 4. IEEE. 1995, pp. 1942–1948.
[101] IS Khalil, JC Doyle, and K Glover. Robust and optimal control. Prentice hall, 1996.
[102] Roger Koenker. Quantile regression. Vol. 38. Cambridge Univ. Press, 2005.
[103] Panagiotis Kouvaros and Alessio Lomuscio. “Formal verification of cnn-based perception systems”.
In: arXiv preprint arXiv:1811.11373 (2018).
[104] Ron Koymans. “Specifying real-time properties with metric temporal logic”. In: Real-time systems
2.4 (1990), pp. 255–299.
[105] Hadas Kress-Gazit, Georgios E Fainekos, and George J Pappas. “Temporal-logic-based reactive
mission and motion planning”. In: IEEE transactions on robotics 25.6 (2009), pp. 1370–1381.
[106] Vidisha Kudalkar, Navid Hashemi, Shilpa Mukhopadhyay, Swapnil Mallick, Christof Budnik,
Parinitha Nagaraja, and Jyotirmoy V Deshmukh. “Sampling-Based and Gradient-Based Efficient
Scenario Generation”. In: International Conference on Runtime Verification. Springer. 2024, pp. 70–88.
[107] Vincent Kurtz and Hai Lin. “Mixed-integer programming for signal temporal logic with fewer
binary variables”. In: IEEE Control Systems Letters 6 (2022), pp. 2635–2640.
[108] Jean B Lasserre and Edouard Pauwels. “The empirical Christoffel function with applications in data
analysis”. In: Advances in Computational Mathematics 45 (2019), pp. 1439–1468.
[109] Fabian Latorre, Paul Rolland, and Volkan Cevher. “Lipschitz constant estimation of neural networks
via sparse polynomial optimization”. In: arXiv preprint arXiv:2004.08688 (2020).
156
[110] Jing Lei, Max G’Sell, Alessandro Rinaldo, Ryan J Tibshirani, and Larry Wasserman.
“Distribution-free predictive inference for regression”. In: Journal of the American Statistical
Association 113.523 (2018), pp. 1094–1111.
[111] Jing Lei and Larry Wasserman. “Distribution-free prediction bands for non-parametric regression”.
In: Journal of the Royal Statistical Society: Series B: Statistical Methodology (2014), pp. 71–96.
[112] Karen Leung, Nikos Arechiga, and Marco Pavone. “Back-Propagation Through Signal Temporal
Logic Specifications: Infusing Logical Structure into Gradient-Based Methods”. In: Algorithmic
Foundations of Robotics XIV. Ed. by Steven M. LaValle, Ming Lin, Timo Ojala, Dylan Shell, and
Jingjin Yu. Springer, 2021, pp. 432–449.
[113] Karen Leung, Nikos Aréchiga, and Marco Pavone. “Backpropagation for parametric STL”. In: 2019
IEEE Intelligent Vehicles Symposium (IV). IEEE. 2019, pp. 185–192.
[114] Karen Leung, Nikos Aréchiga, and Marco Pavone. “Backpropagation through signal temporal logic
specifications: Infusing logical structure into gradient-based methods”. In: The International Journal
of Robotics Research 42.6 (2023), pp. 356–370.
[115] Jianlin Li, Jiangchao Liu, Pengfei Yang, Liqian Chen, Xiaowei Huang, and Lijun Zhang. “Analyzing
deep neural networks with symbolic propagation: Towards higher precision and faster verification”.
In: International static analysis symposium. Springer. 2019, pp. 296–319.
[116] Shihui Li, Yi Wu, Xinyue Cui, Honghua Dong, Fei Fang, and Stuart Russell. “Robust multi-agent
reinforcement learning via minimax deep deterministic policy gradient”. In: Proceedings of the
AAAI Conference on Artificial Intelligence. Vol. 33. 2019, pp. 4213–4220.
[117] Xiao Li, Cristian-Ioan Vasile, and Calin Belta. “Reinforcement learning with temporal logic
rewards”. In: Proc. of IROS. IEEE. 2017, pp. 3834–3839.
[118] Albert Lin and Somil Bansal. “Generating formal safety assurances for high-dimensional
reachability”. In: 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE. 2023,
pp. 10525–10531.
[119] Lars Lindemann and Dimos V Dimarogonas. “Control barrier functions for signal temporal logic
tasks”. In: IEEE control systems letters 3.1 (2018), pp. 96–101.
[120] Changliu Liu, Tomer Arnon, Christopher Lazarus, Christopher Strong, Clark Barrett,
Mykel J Kochenderfer, et al. “Algorithms for verifying deep neural networks”. In: Foundations and
Trends® in Optimization 4.3-4 (2021), pp. 244–404.
[121] Wenliang Liu, Noushin Mehdipour, and Calin Belta. “Recurrent neural network controllers for
signal temporal logic specifications subject to safety constraints”. In: IEEE Control Systems Letters 6
(2021), pp. 91–96.
[122] Johan Lofberg. “YALMIP: A toolbox for modeling and optimization in MATLAB”. In: 2004 IEEE
international conference on robotics and automation (IEEE Cat. No. 04CH37508). IEEE. 2004,
pp. 284–289.
157
[123] Alessio Lomuscio and Lalit Maganti. “An approach to reachability analysis for feed-forward relu
neural networks”. In: arXiv preprint arXiv:1706.07351 (2017).
[124] Rachel Luo, Shengjia Zhao, Jonathan Kuck, Boris Ivanovic, Silvio Savarese, Edward Schmerling,
and Marco Pavone. “Sample-efficient safety assurances using conformal prediction”. In: Algorithmic
Foundations of Robotics XV: Proceedings of the Fifteenth Workshop on the Algorithmic Foundations of
Robotics. Springer. 2022, pp. 149–169.
[125] Bethany Lusch, J Nathan Kutz, and Steven L Brunton. “Deep learning for universal linear
embeddings of nonlinear dynamics”. In: Nature communications 9.1 (2018), pp. 1–10.
[126] Oded Maler and Dejan Nickovic. “Monitoring temporal properties of continuous signals”. In: Formal
Techniques, Modelling and Analysis of Timed and Fault-Tolerant Systems. Springer, 2004, pp. 152–166.
[127] Swann Marx, Edouard Pauwels, Tillmann Weisser, Didier Henrion, and Jean Bernard Lasserre.
“Semi-algebraic approximation using Christoffel–Darboux kernel”. In: Constructive Approximation
(2021), pp. 1–39.
[128] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare,
Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. “Human-level control
through deep reinforcement learning”. In: nature 518.7540 (2015), pp. 529–533.
[129] Alexander Pan, Kush Bhatia, and Jacob Steinhardt. “The Effects of Reward Misspecification:
Mapping and Mitigating Misaligned Models”. In: International Conference on Learning
Representations. 2022.
[130] Yash Vardhan Pant, Houssam Abbas, and Rahul Mangharam. “Smooth operator: Control using the
smooth robustness of temporal logic”. In: 2017 IEEE Conference on Control Technology and
Applications (CCTA). IEEE. 2017, pp. 1235–1240.
[131] Yash Vardhan Pant, Houssam Abbas, Rhudii A. Quaye, and Rahul Mangharam. “Fly-by-logic:
control of multi-drone fleets with temporal logic objectives”. In: Proc. of ICCPS. 2018, pp. 186–197.
[132] Yash Vardhan Pant, He Yin, Murat Arcak, and Sanjit A Seshia. “Co-design of control and planning
for multi-rotor uavs with signal temporal logic specifications”. In: 2021 American Control Conference
(ACC). IEEE. 2021, pp. 4209–4216.
[133] Athanasios Papoulis and S Unnikrishna Pillai. Probability, random variables and stochastic processes.
2002.
[134] Supratik Paul, Michael A Osborne, and Shimon Whiteson. “Fingerprint policy optimisation for
robust reinforcement learning”. In: International Conference on Machine Learning. 2019,
pp. 5082–5091.
[135] Lerrel Pinto, James Davidson, Rahul Sukthankar, and Abhinav Gupta. “Robust Adversarial
Reinforcement Learning”. In: International Conference on Machine Learning. 2017, pp. 2817–2826.
[136] Amir Pnueli. “The temporal logic of programs”. In: 18th Annual Symposium on Foundations of
Computer Science (sfcs 1977). ieee. 1977, pp. 46–57.
158
[137] Luca Pulina and Armando Tacchella. “An abstraction-refinement approach to verification of
artificial neural networks”. In: International Conference on Computer Aided Verification. Springer.
2010, pp. 243–257.
[138] Aniruddh G Puranic, Jyotirmoy V Deshmukh, and Stefanos Nikolaidis. “Learning performance
graphs from demonstrations via task-based evaluations”. In: IEEE Robotics and Automation Letters
8.1 (2022), pp. 336–343.
[139] Xin Qin, Navid Hashemi, Lars Lindemann, and Jyotirmoy V Deshmukh. “Conformance Testing for
Stochastic Cyber-Physical Systems.” In: FMCAD. 2023, pp. 294–305.
[140] Aditi Raghunathan, Jacob Steinhardt, and Percy S Liang. “Semidefinite relaxations for certifying
robustness to adversarial examples”. In: Advances in Neural Information Processing Systems 31
(2018).
[141] Prajit Ramachandran, Barret Zoph, and Quoc V Le. “Searching for activation functions”. In: arXiv
preprint arXiv:1710.05941 (2017).
[142] Vasumathi Raman, Alexandre Donzé, Mehdi Maasoumy, Richard M Murray,
Alberto Sangiovanni-Vincentelli, and Sanjit A Seshia. “Model predictive control with signal
temporal logic specifications”. In: 53rd IEEE Conference on Decision and Control. IEEE. 2014,
pp. 81–87.
[143] Vasumathi Raman, Alexandre Donzé, Dorsa Sadigh, Richard M Murray, and Sanjit A Seshia.
“Reactive synthesis from signal temporal logic specifications”. In: Proceedings of the 18th
international conference on hybrid systems: Computation and control. 2015, pp. 239–248.
[144] Carl Edward Rasmussen. “Gaussian processes in machine learning”. In: Summer school on machine
learning. Springer. 2003, pp. 63–71.
[145] Douglas A Reynolds et al. “Gaussian mixture models.” In: Encyclopedia of biometrics 741.659-663
(2009).
[146] Alexander Robey, Lars Lindemann, Stephen Tu, and Nikolai Matni. “Learning robust hybrid control
barrier functions for uncertain systems”. In: IFAC-PapersOnLine 54.5 (2021), pp. 1–6.
[147] Alëna Rodionova, Lars Lindemann, Manfred Morari, and George J Pappas. “Combined left and right
temporal robustness for control under stl specifications”. In: IEEE Control Systems Letters (2022).
[148] Dorsa Sadigh and Ashish Kapoor. “Safe control under uncertainty with probabilistic signal
temporal logic”. In: Proceedings of Robotics: Science and Systems XII. 2016.
[149] Dorsa Sadigh, Eric S Kim, Samuel Coogan, S Shankar Sastry, and Sanjit A Seshia. “A learning based
approach to control synthesis of markov decision processes for linear temporal logic specifications”.
In: 53rd IEEE Conference on Decision and Control. IEEE. 2014, pp. 1091–1096.
[150] Sadra Sadraddini and Calin Belta. “Robust temporal logic model predictive control”. In: 2015 53rd
Annual Allerton Conference on Communication, Control, and Computing (Allerton). IEEE. 2015,
pp. 772–779.
159
[151] Soroosh Shafieezadeh Abadeh, Peyman M Mohajerin Esfahani, and Daniel Kuhn. “Distributionally
robust logistic regression”. In: Advances in Neural Information Processing Systems 28 (2015).
[152] Apoorva Sharma, Sushant Veer, Asher Hancock, Heng Yang, Marco Pavone, and
Anirudha Majumdar. “PAC-Bayes generalization certificates for learned inductive conformal
prediction”. In: Advances in Neural Information Processing Systems 36 (2024).
[153] Joar Skalse, Nikolaus Howe, Dmitrii Krasheninnikov, and David Krueger. “Defining and
characterizing reward gaming”. In: Advances in Neural Information Processing Systems 35 (2022),
pp. 9460–9471.
[154] Jonathan Sorg, Richard L Lewis, and Satinder Singh. “Reward design via online gradient ascent”. In:
Advances in Neural Information Processing Systems 23 (2010).
[155] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov.
“Dropout: a simple way to prevent neural networks from overfitting”. In: The journal of machine
learning research 15.1 (2014), pp. 1929–1958.
[156] Xiaowu Sun, Haitham Khedr, and Yasser Shoukry. “Formal verification of neural network
controlled autonomous systems”. In: Proceedings of the 22nd ACM International Conference on
Hybrid Systems: Computation and Control. 2019, pp. 147–156.
[157] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow,
and Rob Fergus. “Intriguing properties of neural networks”. In: arXiv preprint arXiv:1312.6199
(2013).
[158] Yoshinari Takayama, Kazumune Hashimoto, and Toshiyuki Ohtsuka. “Signal temporal logic meets
convex-concave programming: A structure-exploiting sqp algorithm for stl specifications”. In: 2023
62nd IEEE Conference on Decision and Control (CDC). IEEE. 2023, pp. 6855–6862.
[159] Abdelmouaiz Tebjou, Goran Frehse, et al. “Data-driven Reachability using Christoffel Functions and
Conformal Prediction”. In: Conformal and Probabilistic Prediction with Applications. PMLR. 2023,
pp. 194–213.
[160] Adam J Thorpe and Meeko MK Oishi. “Model-free stochastic reachability using kernel distribution
embeddings”. In: IEEE Control Systems Letters 4.2 (2019), pp. 512–517.
[161] Adam J Thorpe, Vignesh Sivaramakrishnan, and Meeko MK Oishi. “Approximate stochastic
reachability for high dimensional systems”. In: 2021 American Control Conference (ACC). IEEE. 2021,
pp. 1287–1293.
[162] Ryan J Tibshirani, Rina Foygel Barber, Emmanuel Candes, and Aaditya Ramdas. “Conformal
prediction under covariate shift”. In: Advances in neural information processing systems 32 (2019).
[163] Hoang-Dung Tran, Feiyang Cai, Manzanas Lopez Diego, Patrick Musau, Taylor T Johnson, and
Xenofon Koutsoukos. “Safety verification of cyber-physical systems with reinforcement learning
control”. In: ACM Transactions on Embedded Computing Systems (TECS) 18.5s (2019), pp. 1–22.
160
[164] Hoang-Dung Tran, Diago Manzanas Lopez, Patrick Musau, Xiaodong Yang, Luan Viet Nguyen,
Weiming Xiang, and Taylor T Johnson. “Star-based reachability analysis of deep neural networks”.
In: International symposium on formal methods. Springer. 2019, pp. 670–686.
[165] Hoang-Dung Tran, Xiaodong Yang, Diego Manzanas Lopez, Patrick Musau, Luan Viet Nguyen,
Weiming Xiang, Stanley Bak, and Taylor T Johnson. “NNV: the neural network verification tool for
deep neural networks and learning-enabled cyber-physical systems”. In: International Conference on
Computer Aided Verification. Springer. 2020, pp. 3–17.
[166] Jana Tumová, Boyan Yordanov, Calin Belta, Ivana Černá, and Jiří Barnat. “A symbolic approach to ˘
controlling piecewise affine systems”. In: 49th IEEE Conference on decision and control (CDC). IEEE.
2010, pp. 4230–4235.
[167] Renukanandan Tumu, Matthew Cleaveland, Rahul Mangharam, George Pappas, and
Lars Lindemann. “Multi-modal conformal prediction regions by optimizing convex shape
templates”. In: 6th Annual Learning for Dynamics & Control Conference. PMLR. 2024, pp. 1343–1356.
[168] George Vachtsevanos, Liang Tang, Graham Drozeski, and Luis Gutierrez. “From mission planning
to flight control of unmanned aerial vehicles: Strategies and implementation tools”. In: Annual
Reviews in Control 29.1 (2005), pp. 101–115.
[169] SS Vallender. “Calculation of the Wasserstein distance between probability distributions on the
line”. In: Theory of Probability & Its Applications 18.4 (1974), pp. 784–786.
[170] Harish Venkataraman, Derya Aksaray, and Peter Seiler. “Tractable reinforcement learning of signal
temporal logic objectives”. In: Learning for Dynamics and Control. PMLR. 2020, pp. 308–317.
[171] Abraham P Vinod, Joseph D Gleason, and Meeko MK Oishi. “SReachTools: a MATLAB stochastic
reachability toolbox”. In: Proc. of HSCC. 2019, pp. 33–38.
[172] Abraham P Vinod, Baisravan HomChaudhuri, and Meeko MK Oishi. “Forward stochastic
reachability analysis for uncontrolled linear systems using fourier transforms”. In: Proc. of HSCC.
2017, pp. 35–44.
[173] Abraham P Vinod and Meeko MK Oishi. “Stochastic reachability of a target tube: Theory and
computation”. In: Automatica 125 (2021), p. 109458.
[174] Vladimir Vovk, Alexander Gammerman, and Glenn Shafer. Algorithmic learning in a random world.
Vol. 29. Springer, 2005.
[175] Shiqi Wang, Kexin Pei, Justin Whitehouse, Junfeng Yang, and Suman Jana. “Efficient formal safety
analysis of neural networks”. In: Advances in Neural Information Processing Systems 31 (2018).
[176] Robert D Windhorst, Todd A Lauderdale, Alexander V Sadovsky, James Phillips, and
Yung-Cheng Chu. “Strategic and tactical functions in an autonomous air traffic management
system”. In: AIAA AVIATION 2021 FORUM. 2021, p. 2355.
161
[177] Eric M Wolff, Ufuk Topcu, and Richard M Murray. “Optimization-based control of nonlinear
systems with linear temporal logic specifications”. In: Proc. of Int. Conf. on Robotics and Automation.
2014, pp. 5319–5325.
[178] Tichakorn Wongpiromsarn, Ufuk Topcu, and Richard M Murray. “Receding horizon temporal logic
planning”. In: IEEE Transactions on Automatic Control 57.11 (2012), pp. 2817–2830.
[179] Markus Wulfmeier, Ingmar Posner, and Pieter Abbeel. “Mutual Alignment Transfer Learning”. In:
Conference on Robot Learning. 2017, pp. 281–290.
[180] Xiangru Xu, Paulo Tabuada, Jessy W Grizzle, and Aaron D Ames. “Robustness of control barrier
functions for safety critical control”. In: IFAC-PapersOnLine 48.27 (2015), pp. 54–61.
[181] Shakiba Yaghoubi and Georgios Fainekos. “Gray-box adversarial testing for control systems with
machine learning components”. In: Proceedings of the 22nd ACM International Conference on Hybrid
Systems: Computation and Control. 2019, pp. 179–184.
[182] Shakiba Yaghoubi and Georgios Fainekos. “Worst-case Satisfaction of STL Specifications Using
Feedforward Neural Network Controllers: A Lagrange Multipliers Approach”. In: ACM Transactions
on Embedded Computing Systems 18.5S (2019).
[183] Liren Yang, Hang Zhang, Jean-Baptiste Jeannin, and Necmiye Ozay. “Efficient backward
reachability using the minkowski difference of constrained zonotopes”. In: IEEE Transactions on
Computer-Aided Design of Integrated Circuits and Systems 41.11 (2022), pp. 3969–3980.
[184] Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel. “Efficient neural
network robustness certification with general activation functions”. In: Advances in neural
information processing systems 31 (2018).
[185] Yiqi Zhao, Bardh Hoxha, Georgios Fainekos, Jyotirmoy V Deshmukh, and Lars Lindemann. “Robust
Conformal Prediction for STL Runtime Verification under Distribution Shift”. In: arXiv preprint
arXiv:2311.09482 (2023).
162
Appendices
Appendix A: Conservatism for Exact Reachability
In this section, we discuss an example that illustrates our proposed verification framework on ODE models
lacks completeness, even if we compute the exact reach-tube and apply the exact reachability analysis
on STL2NN. Assume for a given set of initial conditions s0 ∈ I, an exact reach-tube is computed for a
trajectory σs0
, with horizon of K = 15 time steps. Lets’s also assume there exist a region E where 50% of
trajectories visit in time steps, k = 7, 8, 9 and avoid at k < 7 and k > 9. In addition, the rest of trajectories
visit E, within time steps k = 10, 11, 12 and avoid it at k < 10 and k > 12. This implies the trajectories
satisfy the following STL specification,
φ = F[7,12] (s ∈ E).
Given the exact boundary of reach-tube we can define imaginary and nonexistent trajectories that lie in
the second group at time steps k = 7, 8, 9 and lie in the first group on k = 10, 11, 12. This imaginary
trajectory lies in the exact reach-tube but violates the STL specification. This implies in case we apply exact
reachability analysis on STL2NN the left-bound of robustness range would be negative while there is no
negative robustness for real trajectories. This example clearly demonstrates our verification framework
on ODE models lacks completeness. However, we will show that, our verification framework can propose
sound and complete verification on DNN models.
163
Nonlinear Linear Nonlinear Linear
. . . Nonlinear Linear
=
ReLU Layer Nonlinear Linear ℓିଵ
, ℓିଵ
ℓ
,ℓ
ே
, ே ேାଵ
ேାଵ
ଶ
,ଶ
ଵ
,ଵ Nonlinear Linear ReLU Layer Linear
STL2NN
= 1 = 2 = ℓ − 2 = ℓ − 1 = ℓ = − 1 =
ଵ
ଶ
ℓିଶ
ℓିଵ
ℓ
ேିଵ
ேேିଵ
ே
ℓ
ℓିଵℓିଶ
ଶ
ଵ
ℛ
ఝ
(
) Nonlinear Linear
=
. . . . . .
ଶ
ଶ
ℓିଶ ℓିଶ ℓିଵ ℓିଵ
ℓ
ℓ
ேିଵ ேିଵ
ଵ
ଵ
Figure 6.1: Shows the ENFN structure. Here N is the number of layers on ENFN. [tℓ
, zℓ
]
⊤ presents the
activation vector for ℓ-th layer and [tℓ
, pℓ
]
⊤ presents its pre-activation on ENFN. The role of a linear
activation function is to copy its input
Appendix B: Lipschitz Constant Analysis for ENFN
Upper bound for local Lipschitz constant of a FFNN is derived in [83, 61]. The presence of linear activation functions in STL2NN must not impose computational complexity but if we include them in the
proposed procedure in [83, 61] the optimization process faces memory problems as the size of LMI increases
unnecessarily. Thus, we slightly modify the proposed solution. We call this slightly modified version
as ENFN−Lip−SDP(). Here we propose a summary of the convex programming approach from [83, 61]
including the slight changes we apply on it.
Let’s define the SDP variable ρ = ρ
2
1
. We can reformulate the Lipschitz inequality ∥f(x1) − f(x2)∥2 ≤
√ρ∥x1 − x2∥2 in the form of linear quadratic constraint as follows:
x1 − x2
f(x1) − f(x2)
⊤
ρIn 0n×1
01×n −1
x1 − x2
f(x1) − f(x2)
≥ 0
and we can conclude if,
ρIn 0n×1
01×n −1
≥ 0 (positive semi-definite)
164
then ρ1 =
√ρ is certainly the desired certificate. Unfortunately due to presence of the negative scalar −1,
this constraint is infeasible and we attempt to provide feasibility with provision of new linear information
about function f. Thus the basic idea of convex programming technique is to provide the best symmetric
linear matrix Qinfo and transformation matrix T that bring feasibility for,
Qinfo − T
⊤
ρIn 0n×1
01×n −1
T ≤ 0, (6.1)
where Qinfo is a linear combination of quadratic constraints (QC), where every single QC represents a
linear information about function f. In this constraint ρ = ρ
2
1
, where ρ1 is the certificate introduced in
Theorem 2.5.1 for f : [ℓ, u] → [a, b], ℓ, u ∈ R
n
and a, b ∈ R. We add new information utilizing s-procedure
technique proposed in [61, 83]. A thorough introduction for computation of Qinfo is provided in [61, 83].
Provision of high quality information results in feasibility and tightness, but the presence of insufficient
information results in infeasibility.
QC for Non-linearities in ENFN:
Figure 6.1 shows the layers of ENFN. The layers of ENFN are entitled with n = 1, · · · , N. These layers are
departed into nonlinear and linear portions. The pre-activation of nonlinear portion, pℓ
is fed in nonlinear
portion and results in post-activation zℓ
.
Assume s
1
0
, s
2
0
are two initial states. They provide the pre-activations p
1
ℓ
, p2
ℓ ∈ R
nℓ on the ENFN. The
post-activations are also z
1
ℓ
, z2
ℓ ∈ R
nℓ respectively. We denote δpℓ
:= p
1
ℓ − p
2
ℓ
, δzℓ
:= z
1
ℓ − z
2
ℓ
. We inform
the convex programming about nonlinearities in Rφ(s0) through the following quadratic constraints:
- The nonlinearity is a vector of differentiable activation functions.
Lemma 6.0.1. [83]: Let ϕ(w) = (σ(w1), · · · , σ(wn)), w ∈ X ⊆ R
m, where σ is differentiable. Define
e
⊤
i
⃗α = infw∈X σ
′
(wi) and e
⊤
i β⃗ = supw∈X σ
′
(wi). Then ϕ satisfies the δQC defined by (X , Q), where
165
Q = {Q | Q =
−2 diag(⃗α ◦ β⃗ ◦ λ) diag((⃗α + β⃗) ◦ λ)
diag((⃗α + β⃗) ◦ λ) −2 diag(λ)
, λ ∈ R
m
+ }. (6.2)
Thus, we firstly compute vector of slope bounds through the pre-activation bound computation and as an
example given the slope bounds ⃗αℓ and β⃗
ℓ on the ℓ-th layer of ENFN we claim:
δpℓ
δzℓ
⊤
−2 diag(⃗αℓ ◦ β⃗
ℓ ◦ λℓ) diag((⃗αℓ + β⃗
ℓ) ◦ λℓ)
diag((⃗αℓ + β⃗
ℓ) ◦ λℓ) −2 diag(λℓ)
| {z }
Qℓ
δpℓ
δzℓ
≥ 0, λℓ ∈ R
nℓ
+
- The nonlinearity is a vector of non-differentiable activation functions.
Lemma 6.0.2. [83]: Let ϕ(w) = max(αw, βw), w ∈ X ⊆ R
m, 0 ≤ α ≤ β < ∞ and define I
+, I
−,
and I
± as the set of activations that are known to be always active, always inactive, or unknown on X , i.e.,
I
+ = {i | wi ≥ 0, ∀w ∈ X }, I
− = {i | wi < 0, ∀w ∈ X }, and I
± = {1, · · · , m} \ (I
+ ∪ I−). Define
α = [α+(β−α)1I+ (1), · · · , α+(β−α)1I+ (m)] and β = [β−(β−α)1I− (1), · · · , β−(β−α)1I− (m)].
Then ϕ satisfies the δQC defined by (X , Q), where
Q = {Q | Q =
−2 diag(α ◦ β ◦ λ) diag((α + β) ◦ λ)
diag((α + β) ◦ λ) −2 diag(λ)
,
e
⊤
i λ ∈ R+ for i ∈ I±}.
(6.3)
166
Therefore, to capture the slope bounds, we firstly determine I
+, I
− through the pre-activation bound
computation and as an example given the slope bounds αℓ and βℓ on the ℓ-th layer of ENFN we claim for,
e
⊤
j λℓ ∈ R+ and j ∈ I±:
δpℓ
δzℓ
⊤
−2 diag(αℓ ◦ βℓ ◦ λℓ) diag((αℓ + βℓ
) ◦ λℓ)
diag((αℓ + βℓ
) ◦ λℓ) −2 diag(λℓ)
| {z }
Qℓ
δpℓ
δzℓ
≥ 0,
S-Procedure for Qinfo:
Consider the weight matrices on ENFN in Figure 6.1. This weigh matrices are built from 4 sub-blocks,
• Wll
ℓ
: This sub-block connects the linear portion of layer ℓ − 1 to linear portion of layer ℓ.
• Wln
ℓ
: This sub-block connects the linear portion of layer ℓ − 1 to non-linear portion of layer ℓ.
• Wnl
ℓ
: This sub-block connects the non-linear portion of layer ℓ − 1 to linear portion of layer ℓ.
• Wnn
ℓ
: This sub-block connects the non-linear portion of layer ℓ − 1 to non-linear portion of layer ℓ.
Therefore the pre-activation pℓ
is computed as a linear combination of previous activations zi
, i = 0, · · · , ℓ−
1 through the following iterative formula.
t1
p1
=
Wln
1
Wnn
1
| {z }
W1
z0
|{z}
s0
+b1,
tℓ
pℓ
=
Wll
ℓ Wln
ℓ
Wnl
ℓ Wnn
ℓ
| {z }
Wℓ, ℓ=2,··· ,N
tℓ−1
zℓ−1
+ bℓ
,
167
then the difference of pre-activations δpℓ are,
δp1 = Wnn
1
(δz0), δp2 = Wnl
2 Wln
1
δz1 + Wnn
2
δz0
δpℓ = Wnl
ℓ
X
ℓ−3
k=0
ℓ−Y
k−2
j=1
Wll
ℓ−j
Wln
k+1(δzk)
+ Wnl
ℓ Wln
ℓ−1
(δzℓ−2) + Wnn
ℓ
δzℓ−1, ℓ ≥ 3
We also follow the process of [83, 61] and concatenate the non-linear activation vectors in the base vector
δZ := [δz⊤
0
, · · · , δz⊤
N ]
⊤. Given the proposed relation between δpℓ and δZ we define the transformation
matrix Eℓ
, T as,
δpℓ
δzℓ
= Eℓ δZ, ℓ = 1, · · · , N,
δz0
WN+1δzN
= T δZ
Finally based on the idea in [83, 61], we claim if,
X
N
ℓ=1
δpℓ
δzℓ
⊤
Qℓ
δpℓ
δzℓ
−
δz0
WN+1δzN
⊤
ρIn 0n×1
01×n −1
δz0
WN+1δzN
≤ 0 (6.4)
Then,
Rφ(s
1
0
) − Rφ(s
2
0
)
2
≤
√ρ
s
1
0 − s
2
0
2
and this is because equation (6.4) implies,
δz0
WN+1δzN
⊤
ρIn 0n×1
01×n −1
δz0
WN+1δzN
≥ 0
This result certifies ρ1 =
√ρ to be a true certificate as an upper bound of Lipschitz constant. On the other
hand equation (6.4) can be rephrased based on the base vector δZ as,
δZ⊤
X
N
ℓ=1
E
⊤
ℓ QℓEℓ − T
⊤
ρIn 0n×1
01×n −1
T
δZ ≤ 0 (6.5)
168
and proposing Qinfo =
PN
ℓ=1 E⊤
ℓ QℓEℓ a sufficient condition to satisfy (6.5) is,
Qinfo − T
⊤
ρIn 0n×1
01×n −1
T ≤ 0
169
Appendix C: Generic Computational Error
We present a brief summary of [58] and [75] in Appendix E and here we directly focus on the necessary steps
for the proof. We denote Mt
: (S × A) → Rk+1 as the neural network where represents the residual. This
network is equivalent with µNN but the last bias vector is shifted with τopt(t + 1). Assume Rk+1, Rc
k+1
are the residual reachsets when (Sk × At) and (Sk × Ac
t
) are introduced in, Mk respectively (Ac
t ⊂ At
).
This implies, if at ∈ Ac
t
, then at ∈ At and therefore, for Ak = E(µak
, Ωat
),
at
1
⊤
−Ω
−1
at Ω
−1
at
µak
µ
⊤
ak
Ω
−1
at −µ
⊤
ak
Ω
−1
at
µak + 1
at
1
≥ 0,
which suffices to say, regarding the optimization (6.11), the optimal upper-bound for the Rk+1 produced
by (Sk × At) is in fact a feasible solution for the upper-bound of Rc
k+1 obtained from (Sk × Ac
t
) (see the
summary of [58, 75] in Appendix E). This implies the optimal objective function of optimization (6.11) in the
second problem (taking (Sk×Ac
t
) as input) is less than the optimal objective function of optimization(6.11) in
the first problem (taking (Sk × At) as input). Therefore, we conclude Logdet(ΩRc
t+1
) < Logdet(ΩRt+1 ).
In another word, we conclude, replacing Ak with Ac
k
results in smaller upper-bound on the residual reachset.
Now assume the optimal confidence region for modified action, Ak in optimization (4.7) contains
nonempty subset, Ac
k ⊂ Ak. However, we know the ellipsoidal bound of residual reachset is still reducible
by replacing Ak with Ac
k
. This is a contradiction because we have already concluded Ak results in smallest
upper-bound for the residual reachset. Thus, the optimal confidence region Ak contains no subset and is a
singleton. In another word tr(Ωat
) = 0.
170
Appendix D: Brief Summary of [58]
We present the solution summary with parameters of our specific problem for more clarification. Assume
the confidence regions for state and actions sk ∈ Sk, ak ∈ Ak are fixed and known. Then the tool provided
in [58, 75] proposes a convex optimization for tightest ellipsoidal upper-bound over residual’s reachset
Rk+1. In this research, we add another constraint and fix the center of the mentioned upper-bound on the
origin to make it certain that the residual decreases in Euclidean norm. Therefore, the tool [58, 75] is utilized
to present the tightest upper-bound such that, Rk+1 ⊂ E(⃗0, Ω
−1
Rk+1
). We know ak ∈ Ak := E(µat
, Ωat
)
therefore:
at
1
⊤
−Ω
−1
at Ω
−1
at
µak
µ
⊤
ak
Ω
−1
at −µ
⊤
ak
Ω
−1
at
µak + 1
Q1
at
1
≥ 0 (6.6)
We also know sk ∈ Sk := E(µsk
, ρnΣsk
) therefore:
1
ρn
st
1
⊤
−Σ
−1
sk
Σ
−1
sk
µsk
µ
⊤
sk
Σ
−1
sk −µ
⊤
sk
Σ
−1
sk
µsk + ρn
Q2
st
1
≥ 0, (6.7)
In the next attempt [58] suggests us to concatenate all the post-activations, in the residual’s model Mk and
generate a vector x = [z
1⊤, z2⊤, · · · , zL−1 ⊤]
⊤. Then they propose a symmetric matrix Qϕ which satisfies
the quadratic constraint,
x
1
⊤
Qϕ
x
1
≥ 0. (6.8)
The ultimate goal is to prove the residual ∆k+1 ∈ E(⃗0, ΩRk+1 ). Therefore, defining Ω = Ω−1
Rk+1
we should
propose a constraint that implies,
rk+1
1
⊤
−Ω 0
0 1
rk+1
1
≥ 0 (6.9)
171
To propose such a constraint authors in [58] suggest defining the base vector z = [s
⊤
k
, a
⊤
k
, x
⊤, 1]⊤ and
define the linear transformation matrices E1, E2, E3 and matrix C as,
sk
1
= E1z,
ak
1
= E2z,
x
1
= E3z,
rk+1
1
=
C b
0 1
z,
and C =
0 0 · · · Wℓ
, b = bℓ − τopt(t+ 1), (bℓ
, Wℓ) ∈ θµ represent the bias vector and the weights
of the last layer in µNN , and add the left side of equations (6.6), (6.7), (6.8) which provides the following
inequality:
z
⊤
τ1E
⊤
1 Q1E1Msk
+τ2E
⊤
2 Q2E2Mak
+ E
⊤
3 QϕE3Mϕ
z ≥ 0, (6.10)
for some τ1, τ2 ≥ 0. Thus, if the inequality,
z
⊤ (τ1Msk + τ2Mak + Mϕ) z −
rk+1
1
⊤
−Ω 0
0 1
rk+1
1
≤ 0
holds, then the constraint (6.9) is satisfied. This constraint can be reformulated as,
z
⊤
τ1Msk + τ2Mak + Mϕ −
C bL
0 1
⊤
−Ω 0
0 1
C bL
0 1
Mout
z≤ 0
thus applying the assumption, τ1Msk + τ2Mak + Mϕ − Mout ≤ 0, is sufficient but not necessary to claim
(6.9) is satisfied. Therefore, the convex optimization:
min
Mϕ,τ1,τ2
−Logdet(Ω)
s.t. τ1Msk + τ2Mak + Mϕ − Mout ≤ 0
(6.11)
172
presents the suboptimal tightest ellipsoidal upper-bound that is centered on the origin over the residual
reachset Rk+1.
173
Abstract (if available)
Abstract
Cyber-Physical Systems (CPS) form the backbone of essential infrastructure today, significantly impacting our quality of life. From remote patient monitoring and robotic surgery to multi-robot systems like drone fleets and self-driving cars, as well as smart grids and smart home technologies, CPS applications are widespread. Ensuring the safety of these systems by verifying them against potential faults and designing them to meet rigorous safety standards is a critical area of research. With the increased availability of affordable, portable communication and computing devices, these systems are now more interconnected, forming complex and interactive networks. Beyond remaining connected and synchronized, they must also meet specific, often complex, requirements both as individual components and as part of a broader network. For example, a drone fleet may be required to inspect designated areas within set timeframes and adjust its formation as needed, all while upholding safety standards.
The main challenge in these interconnected systems is verifying that they can operate safely and accurately under complex requirements. This dissertation proposes planning and control algorithms to meet this challenge, focusing on neural network-controlled systems in closed-loop configurations that must adhere to specific requirements over time, known as signal temporal logic specifications.
The dissertation is structured as follows: the first part presents deterministic formal verification algorithms for general signal temporal logic specifications, while the second part introduces planning and feedback control strategies to guide neural network-controlled systems in meeting individual and collective time-sensitive goals. The third section addresses distribution shifts in planning, which adapt learned policies using path-tracking techniques. Finally, the last section focuses on reachability analysis under distribution shifts in stochastic CPS. These proposed algorithms enhance system reliability and have been tested through simulations and experiments on highly nonlinear, high-dimensional systems with complex temporal specifications, demonstrating their effectiveness.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Learning logical abstractions from sequential data
PDF
Sample-efficient and robust neurosymbolic learning from demonstrations
PDF
Verification, learning and control in cyber-physical systems
PDF
Dispersed computing in dynamic environments
PDF
Scaling up temporal graph learning: powerful models, efficient algorithms, and optimized systems
PDF
Assume-guarantee contracts for assured cyber-physical system design under uncertainty
PDF
Data-driven and logic-based analysis of learning-enabled cyber-physical systems
PDF
Differential verification of deep neural networks
PDF
Scientific workflow generation and benchmarking
PDF
Automatic detection and optimization of energy optimizable UIs in Android applications using program analysis
PDF
Silicon photonics integrated circuits for analog and digital optical signal processing
PDF
Theoretical foundations and design methodologies for cyber-neural systems
PDF
Accelerating reinforcement learning using heterogeneous platforms: co-designing hardware, algorithm, and system solutions
PDF
Detection, localization, and repair of internationalization presentation failures in web applications
PDF
Responsible AI in spatio-temporal data processing
PDF
Advancing distributed computing and graph representation learning with AI-enabled schemes
PDF
Event-centric reasoning with neuro-symbolic networks and knowledge incorporation
PDF
Learning at the local level
PDF
Side-channel security enabled by program analysis and synthesis
PDF
Empirical methods in control and optimization
Asset Metadata
Creator
Hashemi, Navid
(author)
Core Title
Scaling control synthesis and verification in autonomy using neuro-symbolic methods
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Degree Conferral Date
2024-12
Publication Date
12/10/2024
Defense Date
10/23/2024
Publisher
Los Angeles, California
(original),
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Control,CPS,neural networks,neuro-symbolic,OAI-PMH Harvest,reachability,Safety,signal temporal logics,Verification
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Deshmukh, Jyotirmoy (
committee chair
), Fainekos, Georgios (
committee member
), Krishnamachari, Bhaskar (
committee member
), Lindemann, Lars (
committee member
), Wang, Chao (
committee member
)
Creator Email
navid.hashemi1998@gmail.com,navidhas@usc.edu
Unique identifier
UC11399ERJS
Identifier
etd-HashemiNav-13684.pdf (filename)
Legacy Identifier
etd-HashemiNav-13684
Document Type
Dissertation
Format
theses (aat)
Rights
Hashemi, Navid
Internet Media Type
application/pdf
Type
texts
Source
20241213-usctheses-batch-1228
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
CPS
neural networks
neuro-symbolic
reachability
signal temporal logics