Close
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Data-driven learning for dynamical systems in biology
(USC Thesis Other)
Data-driven learning for dynamical systems in biology
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
DATA-DRIVEN LEARNING FOR DYNAMICAL SYSTEMS IN BIOLOGY
by
Xiaojun Wu
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(COMPUTATIONAL BIOLOGY AND BIOINFORMATICS)
May 2024
Copyright 2024 Xiaojun Wu
Acknowledgements
I would like to express my sincere gratitude to my advisor, Dr. Adam L. MacLean, for his invaluable advice
and support during my doctoral studies. I would like to thank Dr. Roy Wollman for his collaboration which
produced Chapter 2 of this work. I would like to extend my appreciation to Dr. Andrew J. Viterbi for his
support through the Andrew J. Viterbi Fellowship in Computational Biology and Bioinformatics. Special
thanks to my parents, my wife, and my wife’s family, for their support and encouragement during my long
academic journey.
ii
Table of Contents
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Ordinary differential equation models for dynamical systems in biology . . . . . . . . . . . 2
1.2 Time series methods for data-driven learning of dynamical systems . . . . . . . . . . . . . 2
1.3 Parameter inference for data-driven learning of known ordinary differential equation models 5
1.4 Data-driven model discovery for unknown or partially known ordinary differential equations 6
1.5 Multimodal learning for biological data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.6 Dissertation outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Chapter 2: Efficient data-driven parameter inference for Ca2+ signaling . . . . . . . . . . . . . . . . 9
2.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.1 A model of Ca2+ dynamics in response to ATP . . . . . . . . . . . . . . . . . . . . . 13
2.3.2 Data collection and preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.3 Generating cell chains via cell-cell similarity . . . . . . . . . . . . . . . . . . . . . . 16
2.3.4 Bayesian parameter inference with posterior-informed prior distributions . . . . . 17
2.3.5 Constructing and constraining prior distributions . . . . . . . . . . . . . . . . . . . 19
2.3.6 Dimensionality reduction and sensitivity analyses . . . . . . . . . . . . . . . . . . 19
2.3.7 Correlation analysis and cell clustering of MERFISH data . . . . . . . . . . . . . . . 20
2.3.8 Clustering of cell posterior parameter distributions . . . . . . . . . . . . . . . . . . 21
2.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.1 Single-cell priors informed by cell predecessors enable computationally efficient
parameter inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.2 Analysis of single-cell posteriors reveals divergent intracellular and intercellular
sources of variability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4.3 Quantifying the sensitivity of Ca2+ responses in a population of heterogeneous
single cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
iii
2.4.4 Variability in gene expression is associated with variability in Ca2+ dynamics . . . 30
2.4.5 Similarity-based posterior cell clustering reveals distinct transcriptional states
underlying Ca2+ dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Chapter 3: Robust data-driven model discovery methods for biological systems . . . . . . . . . . . 39
3.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.3.1 Formulation of hybrid dynamical model . . . . . . . . . . . . . . . . . . . . . . . . 42
3.3.2 Data generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.3.3 Data preprocessing for learning hybrid dynamical models . . . . . . . . . . . . . . 44
3.3.4 Gradient-based learning of hybrid dynamical models . . . . . . . . . . . . . . . . . 44
3.3.5 Model selection for hybrid dynamical models . . . . . . . . . . . . . . . . . . . . . 45
3.3.6 Sparse regression for model recovery from hybrid dynamics . . . . . . . . . . . . . 45
3.3.7 Model selection for sparse regression . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4.1 Robust model selection scheme recovers dynamical models using a hybrid
formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4.2 Prior knowledge incorporated into hybrid formulation improves SINDy learning . 54
3.4.3 Model selection enables discovery of dynamical models from complex dynamics
for biological networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Chapter 4: Conclusions and future works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.1 Future works for Bayesian parameter inference for large ordinary differential equation
systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.2 Future works for data-driven model discovery for complex dynamical systems . . . . . . . 69
4.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Appendix A: Supplementary material for Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . 83
A.1 Supplementary results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
A.1.1 Analysis of scaling and clipping of prior distributions along cell chains . 83
A.1.2 Parameter inference for a single cell with multiple epochs . . . . . . . . 84
A.1.3 Parameter inference for a cell chain induced by Ca2+ signaling similarity 84
A.2 Supplementary tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
A.3 Supplementary figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
A.4 Data availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Appendix B: Supplementary material for Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . 112
B.1 Supplementary tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
iv
List of Tables
2.1 Definition and description of parameters in ODE model for intracellular Ca2+ signaling
pathway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1 Model selection configuration for Lotka-Volterra data . . . . . . . . . . . . . . . . . . . . . 49
3.2 Best (according to AICc) ODE models recovered from Lotka-Volterra datasets using hybrid
formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3 Lowest AICcs of correct models recovered from Lotka-Volterra datasets using different
methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.4 Model selection configuration for repressilator data . . . . . . . . . . . . . . . . . . . . . . 59
3.5 Best (according to AICc) ODE models recovered from repressilator datasets using hybrid
formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
A.1 Summary of inference runs referred to in Chapter 2 . . . . . . . . . . . . . . . . . . . . . . 86
A.2 Mean log posterior for ten cells from the similarity-based chain (Similar-r1) with the same
cells sampled individually using uninformative prior (Indiv. cells run 1 and Indiv. cells run 2) 87
A.3 Sampling performance of the similarity-based chain (Similar-r1) and two runs in which
the same cells were sampled individually using uninformative prior (Indiv. cell run 1 and
Indiv. cell run 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
A.4 Comparison between cell chains sampled using similarity-informed priors (Reduced-3) or
randomly ordered (Random-1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
A.5 Comparison between cell chains run with different hyperparameters, all based on
similarity-informed cell chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
A.6 Comparison between cell chains run on the full model (Similar-r1) and models reduced by
setting one or two parameters to a constant . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
A.7 Statistics of posterior distributions for cells fit to the similarity-based chain and mean
parameter sensitivities, reduced model (Reduced-3) . . . . . . . . . . . . . . . . . . . . . . . 92
v
A.8 Gene-parameter pairs ranked by correlation coefficient in a similarity-based chain
(Reduced-3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
A.9 Gene-parameter pairs ranked by correlation coefficient in a randomly ordered chain
(Random-2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
B.1 Hyperparameters used to attain lowest mean squared losses for hybrid dynamical models
from Lotka-Volterra datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
B.2 Best (according to AICc) correct ODE models recovered from Lotka-Volterra datasets
using hybrid formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
B.3 Hyperparameters used to attain lowest AICcs for SINDy regression on Lotka-Volterra
datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
B.4 Best (according to AICc) correct ODE models recovered from Lotka-Volterra datasets
using base SINDy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
B.5 Best (according to AICc) correct ODE models recovered from Lotka-Volterra datasets
using pure neural network formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
B.6 Top 10 (according to AICc) models recovered from repressilator data with 5% additive noise 117
B.7 Top 10 (according to AICc) models recovered from repressilator data with 10% additive noise 119
B.8 Top 10 (according to AICc) models recovered from repressilator data with 10% multiplicative noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
B.9 Top 10 (according to AICc) models recovered from repressilator data with 20% additive noise 121
B.10 Top 10 (according to AICc) models recovered from repressilator data with 20% multiplicative noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
vi
List of Figures
2.1 Cell chains improve performance of single-cell parameter inference . . . . . . . . . . . . . 22
2.2 Parameter dependencies revealed by the analysis of marginal posterior distributions along
cell chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3 Sensitivity of Ca2+ responses to parameter perturbations . . . . . . . . . . . . . . . . . . . 28
2.4 Variability in Ca2+ model dynamics is associated with variability in gene expression . . . . 32
2.5 Clustering of cell posterior distributions reveals marker genes for Ca2+ states . . . . . . . . 34
3.1 Pipeline of data-driven model discovery for dynamical systems with model selection . . . 43
3.2 Data generated to study the Lotka-Volterra model and hybrid dynamical model fits . . . . 48
3.3 ODE models recovered from Lotka-Volterra datasets . . . . . . . . . . . . . . . . . . . . . . 52
3.4 Data generated to study the repressilator model and base SINDy fits . . . . . . . . . . . . . 57
3.5 ODE models recovered from repressilator datasets . . . . . . . . . . . . . . . . . . . . . . . 62
A.1 Ca2+ dynamic response trajectories to stimulus by ATP . . . . . . . . . . . . . . . . . . . . 95
A.2 Analysis of scaling and clipping of prior distributions along cell chains . . . . . . . . . . . 96
A.3 Comparison of stiff and non-stiff solvers for Ca2+ response simulation . . . . . . . . . . . . 97
A.4 Comparison of parameter inference of cells in a chain vs. multiple inference runs on the
same cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
A.5 Comparison of replicate runs of random cell chains . . . . . . . . . . . . . . . . . . . . . . 99
A.6 Comparison of model reduction for insensitive parameters . . . . . . . . . . . . . . . . . . 100
A.7 Distribution of Pearson correlations between genes and parameters from the chain
Reduced-3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
vii
A.8 Comparison of global similarity via PCA using min-max normalization for chain Reduced-3 102
A.9 Comparison of global similarity via PCA using min-max normalization for chain Random-2 103
A.10 Comparison of cell-cell similarity along cell chains . . . . . . . . . . . . . . . . . . . . . . . 104
A.11 Comparison of Ca2+ dynamics for various cell clusterings of chain Reduced-3 . . . . . . . . 105
A.12 Comparison of model dynamics in clusters from Reduced-3 by posterior parameter clustering 106
A.13 Comparison of parameter distributions across clusters for Reduced-3 . . . . . . . . . . . . . 107
A.14 Comparison of model dynamics in clusters from Random-2 by posterior parameter clustering 108
A.15 Clustering cells by Ca2+ response profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
A.16 Posterior parameter clustering for a cell chain constructed via Ca2+ signaling similarity . . 110
viii
Abbreviations
AIC: Akaike information criterion
AICc: Akaike information criterion with correction
AR: autoregressive
ARIMA: autoregressive integrated moving average
ARMA: autoregressive moving average
ATP: adenosine triphosphate
BVP: boundary value problem
CARMA: continuous autoregressive moving average
CNN: convolutional neural network
DAG: diacylglycerol
DFS: depth-first search
DPGP: Dirichlet process Gaussian process mixture model
ER: endoplasmic reticulum
HMC: Hamiltonian Monte Carlo
IP3: inositol 1,4,5-trisphosphate
IVP: initial value problem
MA: moving average
MCMC: Markov chain Monte Carlo
ix
MERFISH: multiplexed error-robust fluorescence in situ hybridization
MLP: multilayer perceptron
MSE: mean squared error
NUTS: No-U-Turn Sampler
ODE: ordinary differential equation
PDE: partial differential equation
PINN: physics-informed neural network
PIP2: phosphatidylinositol 4,5-bisphosphate
PLC: phospholipase C
RNN: recurrent neural network
RSS: residual sum of squares
SERCA: sarco/ER Ca2+-ATPase
SINDy: sparse identification of nonlinear dynamics
SIR: susceptible-infected-removed
SMC: sequential Monte Carlo
STLSQ: sequentially thresholded least squares
SR3: sparse relaxed regularized regression
UDE: universal differential equation
x
Abstract
Dynamics of biological processes are often modeled by ordinary differential equations (ODEs). Data-driven
methods for learning properties of known ODE models have evolved for decades, yet growing amount of
biological data necessitates the invention of novel solutions. In this work, we develop methods for learning
cell-specific parameters for an intracellular Ca2+ signaling pathway model from single-cell Ca2+ signaling
dynamics. The method effectively accelerates parameter inference for a sequence of arbitrarily ordered
cells via transfer learning. Moreover, if the transfer of information between cells are guided by cell-cell
similarity according to single-cell transcriptomic data, both Ca2+ signaling responses and gene expression
profiles are embedded within the inferred parameters. This lets us create a map from transcriptional states
to cell dynamics. We go on to propose a framework that recovers the symbolic form of ODEs from data,
developed from existing model discovery methods. In our framework, we first learn a hybrid formulation of
dynamical model, with symbolic terms given by finite prior knowledge and neural network approximators
for latent dynamics. We then perform sparse regression on learned latent dynamics to identify missing
symbolic terms in the hybrid dynamical model. Using this framework, we successfully recover ODE models
for two complex biological systems given only limited data with high levels of noise. Both methods show
the benefit of knowledge integration for data-driven learning of dynamical systems in biology.
xi
Chapter 1
Introduction
Biological systems can be considered as dynamical systems if their internal states change with one another
over time. For centuries, ordinary differential equations (ODEs) have been a powerful mathematical tool for
modeling continuous time dynamical systems. They were first used to describe physics and astronomical
phenomena and later adopted in mathematical biology [1, 2, 3]. Having an ODE model for a dynamical
system allows us to elucidate interactions between model states and understand how the system may
evolve over time. Many ODE models have been proposed to characterize dynamical systems in biology of
different scales, from predator-prey relationship in ecosystems [1, 2], to gene regulatory networks in single
cells [4, 5]. Yet, finding a working ODE model that fits data may still require extensive knowledge of and
reasonable assumptions for the underlying system. Depending on our understanding of the dynamical
system of interest, here we consider two classes of data-driven problems. The first class is parameter
inference for ODE models that are already constructed. The second class is referred to as model discovery
or system identification. Problems in this class emerge when only few terms in the ODE models can be
specified from prior knowledge and remaining terms need to be determined from data.
1
1.1 Ordinary differential equation models for dynamical systems in biology
Applications of ODEs in studying biological dynamical systems date back to the early twentieth century.
One of the earliest examples is the Lotka-Volterra model for predator-prey dynamics in an ecosystem,
named after two mathematicians Alfred J. Lotka and Vito Volterra who published relevant works separately
in the 1920s [1, 2]. In the same decade, two scientists in collaboration proposed a theory for infectious disease transmission: Kermack and McKendrick [3] introduced the susceptible-infected-removed (SIR) model,
which describes the dynamics of three populations, namely susceptible, infected, and removed, after an
outbreak of infectious diseases. Variants of the SIR model are still widely used today, for example, during
the recent COVID-19 pandemic [6, 7, 8]. ODE models have also been developed for many different biological systems, from cell cycle [9, 10, 11] to a host of intracellular signaling pathways [12, 13, 14, 15]. Li and
Rinzel [16] and Lemon et al. [17] developed ODE models for intracellular Ca2+ signaling pathways, which
will be studied in Chapter 2. Elowitz and Leibler [4] designed a synthetic gene regulatory network called
the repressilator system, which can also be modeled by ODEs and will be studied in Chapter 3.
1.2 Time series methods for data-driven learning of dynamical systems
Although ODEs are the to-go option for studying dynamical systems, in some cases a method for studying
general time series is sufficient, since observations from dynamical systems are essentially time series.
Whereas ODEs describe how states in a dynamical system evolve on continuous time, many methods for
learning time series may assume either discrete time points like days and months or continuous time.
In this section, we will survey methods for three major learning tasks for time series data, which are
smoothing, forecasting, and clustering.
2
Time series smoothing is a common topic in the field of signal processing, which refers to smoothing
methods as filters. Since the purpose of smoothing is often to reduce noise in data, the term denoising
is also used in place of smoothing. The simplest example of filters is moving average, which estimates
the value at each time point by the average value inside a time window [18]. Methods built upon simple
moving average include weighted moving average and exponential smoothing [18]. Data that conforms
to a linear or polynomial model can be fit by least square methods, from which smoothed values can be
predicted. However, linear or polynomial models may fail to capture the complexity of observations from
biological processes. Local regression is a class of smoothing algorithms which combine the idea of moving
average and polynomial regression by performing regression on local segments of time series data [19],
which can perform better on complex dynamics.
Many classic time series forecasting methods are developed from the autoregressive moving average
(ARMA) model, initially proposed by Whittle [20]. In an ARMA(p, q) process, the value at each time point
is the sum of three components: an autoregressive process of order p (AR(p)) of values from past p time
points, a moving average process of order q (MA(q)) of white noise from past q time points, and white
noise at current time point [18]. Once data is fit to an ARMA(p, q) model, it can be used to predict values on
future time points in an autoregressive manner. The autoregressive integrated moving average (ARIMA)
process generalizes the ARMA process by incorporating a differencing process [18]. There also exist other
variants of ARMA models that may be more suitable for observations from complex dynamical systems,
such as multivariate ARMA [18] and continuous ARMA (CARMA) [21]. Some methods for smoothing,
such as exponential smoothing, are also capable of generating future values autoregressively [22].
Modern machine learning methods, including linear models and neural networks, have been adopted
for time series forecasting in recent decades [23, 24]. Neural networks are particularly powerful because
they are the so-called universal approximators. It has been shown that many neural network architectures,
such as multilayer perceptron (MLP), convolutional neural network (CNN), and recurrent neural network
3
(RNN), can approximate any function arbitrarily well [25, 26, 27, 28, 29, 30]. Those results are commonly
known as the universal approximation theorem. It follows that, if a neural network trained from data
approximates the true underlying function sufficiently well, it can accurately predict future values for a
time series. Parmezan et al. [31] evaluated classic methods and machine learning methods for time series
forecasting on synthetic and real datasets, and showed that several machine learning methods indeed
outperformed classic methods like ARIMA.
Time series clustering refers to the task of separating multiple time series samples into smaller subgroups. Methods for time series clustering can be divided into two major categories, model-free and modelbased, each with more subcategories. Subcategories of model-free methods are agglomerative clustering,
k-means clustering, and fuzzy c-means clustering [32], which are also commonly used methods for nonsequential data. Subcategories of models are ARMA, Markov chain, hidden Markov model, and mixture
model [32], most of which are designed for sequential data. Markov chain models assume that the state at
each time point depends only on the state at the last time point. Hidden Markov models assume that the
state observed at each time point is emitted from a hidden state at the same time point and hidden states
on all time points form a Markov chain. Mixture models describe how clusters and samples inside each
cluster are generated with probabilistic processes. For example, under a Dirichlet process Gaussian process
mixture model (DPGP), Gaussian process generates trajectories of gene expression level over time within
a cluster, and parameters for cluster-specific Gaussian process are derived from a Dirichlet process [33].
4
1.3 Parameter inference for data-driven learning of known ordinary
differential equation models
Parameters of a dynamical system with known ODEs can be learned from data via either optimization
or Bayesian inference. One way to learn parameter values through optimization is to simulate the target ODE system iteratively as an initial value problem (IVP) and update parameters using nonlinear least
square from simulated trajectories [34, 35, 36]. An alternative optimization-based approach is called multiple shooting, which solves separate boundary value problems (BVPs) on nonoverlapping subintervals
instead of solving one IVP on the full time span [36]. Unlike optimization-based methods which yield
point estimation of parameters, Bayesian approaches would instead treat model parameters as random
variables and approximate their posterior distribution from a prior distribution. In many previous works
on Bayesian inference for ODE models in systems biology, posterior distributions are sampled using either
a Monte Carlo Markov chain (MCMC) method [37] or a sequential Monte Carlo (SMC) method [38, 39,
40, 41]. Metropolis-Hastings [42, 43] and Gibbs sampling [44] are two classic MCMC methods. In recent
years, a class of MCMC methods called Hamiltonian Monte Carlo (HMC) has been widely adopted due
to its efficiency and robustness [45]. The No-U-Turn Sampler (NUTS) is an HMC implementation which
simplifies HMC by automatically tuning two HMC parameters [46].
A novel and efficient data-driven Bayesian parameter inference method developed for a single-cell
Ca2+ signaling pathway model will be presented in Chapter 2. The model uses an ODE system to describe
the rate of change of four quantities, including Ca2+ concentration, in an intracellular Ca2+ signaling pathway triggered by binding of adenosine triphosphate (ATP) on purinergic receptors. The method will be
demonstrated on data collected by Foreman and Wollman [47], in which single-cell Ca2+ response data was
paired with single-cell transcriptomic data from multiplexed error-robust fluorescence in situ hybridization
(MERFISH). By transferring knowledge learned from one cell to another, Bayesian inference of cell-specific
5
model parameters for a sequence of cells is accelerated. When the knowledge transfer is guided by MERFISH data, the sampled posterior distributions reveal meaningful relationships between Ca2+ signaling
responses and Ca2+-induced gene expressions.
1.4 Data-driven model discovery for unknown or partially known ordinary differential equations
For the last decade or so, artificial neural networks have become increasingly popular for data-driven
learning of biological systems as they excel at tasks like protein structure prediction [48]. As for datadriven learning of dynamical systems, neural networks have been proven useful when no ODE model is
available [49, 50]. Using modern machine learning software packages, we can now approximate states in
dynamical systems and simulate differential equations with neural networks. Chen et al. [49] introduced
the concept of neural ordinary differential equations (neural ODEs) and released a PyTorch package which
can simulate any ODE system of which the right-hand side is defined as some neural network. Raissi et al.
[51] proposed physics-informed neural network (PINN) which can approximate solutions of partial differential equations (PDEs). Rackauckas et al. [50] unified all types of differential equations under the term
universal differential equation (UDE), which include differential equations that are partially or completely
defined by neural networks. Besides replacing variables or expressions in differential equations, neural
networks can also be used to emulate ODE integration schemes, as shown by Qin et al. [52].
However, for dynamical systems, neural networks have a notable drawback when compared to ODE
models: neural networks are black-box models, whereas ODE models are mechanistic models that describe
the precise interactions between states in a dynamical system. Thus, finding an ODE model from scratch
is still desirable and preferred in certain circumstances. Fortunately, a framework called sparse identification of nonlinear dynamics (SINDy) has been developed to reconstruct ODE system in closed-form
6
expression [53]. Instead of learning a black-box model, a SINDy algorithm learns the possible symbolic
terms in the ODE model from data. Simply put, SINDy treats the derivatives of data as the product of a set
of basis functions and a coefficient matrix, and finds the coefficients using a sparse regression algorithm.
Such algorithm usually employs a combination of regularization and thresholding techniques to make
coefficient matrix sparse. For example, the STLSQ (sequentially thresholded least squares) algorithm iteratively performs ridge regression and eliminates basis functions with coefficients below a threshold [53].
Besides SINDy, there exists another type of model discovery methods, known as symbolic regression. A
typical symbolic regression method builds expression trees for the right-hand side of an ODE system and
selects the one that best fits data [54, 55].
A robust data-driven method for discovering ODE models using neural networks and SINDy will be
presented in Chapter 3. The method first learns a hybrid dynamical model from data, which is made up
of a known part with closed-form terms and a latent part approximated by a neural network. Once the
hybrid dynamical model is learned, the method recovers the closed-form expression for the latent part
using SINDy. To reduce generalization error of recovered models, a model selection strategy is designed
for both hybrid dynamical model learning and SINDy learning.
1.5 Multimodal learning for biological data
We are seeing greater usage of multimodal data in biological research today thanks to innovations in
technologies for collecting biological data. The most commonly seen type of multimodal biological data
is multi-omics data; that could include measurements from transcriptomics, epigenomics, proteomics, and
metabolomics [56, 57, 58, 59]. These omics technologies are capable of sampling data at bulk and single-cell
levels. Depending on the problem of interest, different combinations of omics data may be used. Tatarakis
et al. [60] used both bulk RNA-seq data and single-cell RNA-seq data to investigate cranial neural crest
development in zebrafish. Xiong et al. [61] revealed sexual dimorphism in mouse kidney using integrated
7
single-nuclear RNA/ATAC-seq data which contains a pair of measurements — gene expression level from
RNA-seq and chromatin accessibility from ATAC-seq — for each cell.
There also exists multimodal biological data with non-omics data. As mentioned in Section 1.3, we will
learn cell-specific parameters for an ODE model characterizing intracellular Ca2+ signaling activities from
single-cell Ca2+ response trajectories plus single-cell transcriptomic data from the MERFISH (multiplexed
error-robust fluorescence in situ hybridization) protocol. MERFISH is a type of single-cell spatial transcriptomics technologies, which simultaneously detect the expression level of mRNA transcripts and their
positions by fluorescence imaging [62]. The use of MERFISH enabled measurement of gene expression
levels for a large number of cells right after the recording of their ATP-induced Ca2+ responses [47].
1.6 Dissertation outline
This chapter serves as a general introduction to works that will be presented in Chapter 2 and Chapter 3.
In Chapter 2, we will outline a transfer learning scheme for more efficient Bayesian inference of ODE
model parameters in an intracellular Ca2+ signaling pathway model. Model parameters learned by the
method will be analyzed with Ca2+ signaling data and transcriptomic data. In Chapter 3, we will introduce
a robust model discovery method and showcase with two dynamical systems from biology. We show how
the correct ODE models could be recovered with data on limited time span and restricted prior knowledge
about the systems. Chapter 4 will summarize the previous chapters and discuss future directions.
8
Chapter 2
Efficient data-driven parameter inference for Ca2+ signaling
Parts of this chapter are included in a manuscript published in [63].
2.1 Abstract
Single-cell genomic technologies offer vast new resources with which to study cells, but their potential
to inform parameter inference of cell dynamics has yet to be fully realized. Here we develop methods
for Bayesian parameter inference with data that jointly measure gene expression and Ca2+ dynamics in
single cells. We propose to share information between cells via transfer learning: for a sequence of cells,
the posterior distribution of one cell is used to inform the prior distribution of the next. In application to
intracellular Ca2+ signaling dynamics, we fit the parameters of a dynamical model for thousands of cells
with variable single-cell responses. We show that transfer learning accelerates inference with sequences
of cells regardless of how the cells are ordered. However, only by ordering cells based on their transcriptional similarity can we distinguish Ca2+ dynamic profiles and associated marker genes from the posterior
distributions. Inference results reveal complex and competing sources of cell heterogeneity: parameter
covariation can diverge between the intracellular and intercellular contexts. Overall, we discuss the extent
to which single-cell parameter inference informed by transcriptional similarity can quantify relationships
between gene expression states and signaling dynamics in single cells.
9
2.2 Introduction
Models in systems biology span systems from the scale of protein/DNA interactions to cellular, organ,
and whole organism phenotypes. Their assumptions and validity are assessed through their ability to
describe biological observations, often accomplished by simulating models and fitting them to data [64,
65, 66, 67]. Under the framework of Bayesian parameter inference and model selection, the available data
is used along with prior knowledge to infer a posterior parameter distribution for the model [38]. The
posterior distribution characterizes the most likely parameter values to give rise to the data as well as the
uncertainty that we have regarding those parameters. Thus, parameter inference provides a map from the
dynamic phenotypes that we observe in experiments to the parameters of a mathematical model.
Single-cell genomics technologies have revealed a wealth of information about the states of single
cells that was not previously accessible [68]. This ought to assist with the characterization of dynamic
phenotypes. However, it is much less clear how to draw maps between dynamic phenotypes of the cell and
single-cell states as quantified via genomic measurements. The challenge in part lies in the combinatorial
complexity: even if a small fraction of genes contain information regarding the phenotype of interest, say
a few hundred, this is more than enough to characterize any feasible number of states of an arbitrarily
complex dynamical process.
This leads us to a central question: can the integration of single-cell gene expression data into a framework for parameter inference improve our understanding of the cellular phenotypes of interest? Here,
various sources of transcriptional noise must be taken into account [69, 70, 71], which we propose to address by taking a global view and comparing cells first by their similarity across many genes, and, after
inference, by their similarity in posterior parameter distributions. Our previous work provides the ideal
data for this approach: we jointly measured dynamics and gene expression in the same single cells [47].
Here, we apply our new parameter inference framework to study Ca2+ signaling dynamics and signal
transduction in response to adenosine triphosphate (ATP) in human mammary epithelial (MCF10A) cells.
10
Ca2+ signaling regulates a host of cellular responses in epithelial cells, from death and division to migration and molecular secretion, as well as collective behaviors, such as organogenesis and wound healing [72, 73, 74]. In response to ATP binding to purinergic receptors, a signaling cascade is initiated whereby
phospholipase C (PLC) is activated and in turn hydrolyzes phosphatidylinositol 4,5-bisphosphate (PIP2),
producing inositol 1,4,5-trisphosphate (IP3) and diacylglycerol (DAG). The endoplasmic reticulum (ER) responds to IP3 by the activation of Ca2+ channels: the subsequent release of calcium from the ER into the
cytosol produces a spiked calcium response. To complete the cycle and return cytosolic calcium levels to
steady state, the sarco/ER Ca2+-ATPase (SERCA) channel pumps the Ca2+ from the cytosol back into the
ER [75, 76]. Ca2+ signaling is highly conserved, regulating cell phenotypic responses across mammals,
fish, and flies [73, 77, 78] as well as in prokaryotes [79]. Since the Ca2+ response to ATP occurs quickly in
epithelial cells: on a timescale that is almost certainly faster than gene transcription, we work under the
assumption that the transcriptional state of the cell does not change in the duration of the experiment.
Our ability to measure gene expression in thousands of single cells has not only led to new discoveries
but has also fundamentally changed how we identify and characterize cell states [80]. Technologies used
to quantify gene expression in single cells include sequencing and fluorescent imaging. The latter permits
the measurement of hundreds of genes in spatially resolved populations of single cells. Small molecule
fluorescence in situ hybridization (smFISH) can be multiplexed to achieve this high resolution by protocols
such as MERFISH [62] and seqFISH [81]. Moreover, by coupling multiplexed smFISH with fluorescent
imaging of Ca2+ dynamics using a GFP reporter in MCF10A cells, we are able to jointly capture the dynamic
cell responses and the single-cell gene expression in the same single cells [47]. These data offer new
potential to study the relationships between transcriptional states of cells and the dynamic phenotypes
these may produce.
Models of gene regulatory networks and cellular signaling pathways described by ordinary differential
equations (ODEs) capture the interactions between gene transcripts, proteins, or other molecular species
11
and their impact on cellular dynamics. Well-established dynamical systems theory offers a range of tools
with which to analyze transient and equilibrium behavior of ODE models [82]; it remains an open question
whether or not such it is appropriate to make equilibrium assumptions of living cells [83]. Constraining
dynamic models of cellular/molecular processes with single-cell data via inference offers much potential
to gain new insight into dynamics, albeit coming with many challenges given, among other things, the
complex sources of noise in these data and the lack of explicit temporal information in (“snapshot”) datasets
gathered at one time point [84]. Parameter inference has provided insight into clonal relationships of
single cells [85, 86] and stem cell differentiation/cell state transitions [87, 88]. Inference methods have also
been applied to single-cell data for the discovery of new properties of single-cell oscillations [89, 90] and
cell-cell variability [91, 92, 93], as well as to study cell-cell communication [94]. New methods to infer the
parameters of models of stochastic gene expression provide means to study single-cell dynamics in greater
depth [95, 96].
Here, we model Ca2+ dynamics via ODEs based on previous work [17, 41]. We develop a parameter
inference framework to fit Ca2+ response dynamics in many single cells. We perform inference of multiple
cells sequentially, through the construction of “cell chains.” A cell chain is an ordering of cells, which can
be random or directed by some measure, e.g. by similarity of gene expression or of Ca2+ dynamic response.
Given a cell chain, we propose to infer the parameters of the Ca2+ ODE model in a single cell via a transfer
of information from its cell predecessor in the chain. We achieve this by setting the prior of the current
cell in the chain informed by the posterior of its predecessor. We will use this framework to assess the
extent to which transcriptional cell states inform dynamic cell responses.
In the next section, we present the model and the methods implemented for parameter inference using
Hamiltonian Monte Carlo in Stan [97]. We go on to study the results of inference: we discover that priors
informed by cell predecessors accelerate parameter inference, but that cell chains with randomly sampled
predecessors perform as well as those with transcriptional similarity-informed predecessors. Analysis of
12
hundreds of fitted single cells reveals that cell-intrinsic vs. cell-extrinsic posterior parameter relationships
can differ widely, indicative of fundamentally different sources of underlying variability. By perturbing the
posterior distributions, we assess model parameter sensitivities in light of Ca2+ dynamics. We also find that
variability in single-cell gene expression is associated with variability in posterior parameter distributions,
both for individual gene-parameter pairs and globally, via principal component analysis. We go on to
cluster cells by their posterior parameter distributions, and discover that only for gene expression-based
cell chains are there clear relationships between gene expression states and dynamic cell phenotypes.
2.3 Materials and methods
2.3.1 A model of Ca2+ dynamics in response to ATP
We model Ca2+ signaling pathway responses in MCF10A human epithelial cells using nonlinear ordinary
differential equations (ODEs), as previously developed [17, 41]. The model consists of four state variables:
phospholipase C (PLC), inositol 1,4,5-trisphosphate (IP3), the fraction of IP3-activated receptor (h), and
cytoplasmic Ca2+. The four variables are associated with a system four nonlinear ODEs describing the
13
rates of change of the Ca2+ pathway species following ATP stimulation, to characterize dynamic responses
in MCF10A cells. The equations are given by:
d[PLC]
dt = ATP · e
−KATPt − Koff, ATP[PLC]
d[IP3]
dt = VPLC
[PLC]
2
K2
IP3 + [PLC]
2
− Koff, IP3[IP3]
dh
dt = a([Ca2+] + dinh)
dinh
[Ca2+] + dinh
− h
d[Ca2+]
dt = β
ε(η1m3
∞h
3 + η2)(c0 − (1 + ε)[Ca2+]) − η3
[Ca2+]
2
k
2
3 + [Ca2+]
2
(2.1)
β =
1 +
Ke[Be]
(Ke + [Ca2+])2
−1
m∞ =
[IP3]
d1 + [IP3]
[Ca2+]
d5 + [Ca2+]
.
The equations describe a chain of responses following ATP binding to purinergic receptors: the activations
of PLC, IP3, the IP3R channel on the surface of the ER, and finally the release of Ca2+ from the ER into
the cytoplasm [41]. Ca2+ may also enter the ER through the IP3R channel and the SERCA pump [41]. Our
model differs from Yao et al. [41] in that we combine the product of two parameters in the previous model,
Kon, ATP and ATP, into a single parameter, ATP. This reduction of the model parameter space removed the
redundancy that would otherwise exist in the distributions of Kon, ATP and ATP. A description of each of
the parameters in the model is given in Table 2.1, where reference values for each of the model parameters
are found in Lemon et al. [17] and Yao et al. [41].
2.3.2 Data collection and preprocessing
The data consist of a joint assay measuring Ca2+ dynamics and gene expression via multiplexed errorrobust fluorescence in situ hybridization (MERFISH) [62]. Ca2+ dynamics in a total of 5128 human MCF10A
cells are measured via imaging for 1000 seconds (ATP stimulation at 200 seconds) using a GCaMP5 biosensor. Immediately following this step, 336 genes are measured by MERFISH [47]. The Ca2+ trajectories are
14
Name Description Prior distribution Unit
ATP Concentration of ATP that activates PLC N (5, 4) s
-1
KATP ATP decay rate N (0.0083, 0.0025) s
-1
Koff, ATP PLC degradation rate N (1.25, 1) s
-1
VPLC Maximum velocity for IP3 generation N (1, 1) µM·s
-1
KIP3 Equilibrium constant for IP3 generation
through PLC
N (0.5, 0.01) µM
Koff, IP3 IP3 degradation rate N (1.25, 1) s
-1
a Time constant of IP3 channel N (1, 1) s
-1
dinh Dissociation constant for IP3 channel calcium
inhibiting subunit
N (0.4, 0.01) µM
Ke Dissociation constant for calcium buffer N (10, 4) µM
Be Concentration of calcium buffer N (150, 25) µM
d1 Dissociation constant for IP3 channel IP3
activating subunit
N (0.13, 0.01) µM
d5 Dissociation constant for IP3 channel calcium
activating subunit
N (0.0823, 0.01) µM
ϵ ER to cytosolic volume N (0.185, 0.01) –
η1 IP3 channel permeability constant N (575, 625) s
-1
η2 ER leak permeability constant N (5.2, 1) s
-1
η3 Ca2+ pump permeability constant N (45, 25) s
-1
c0 Concentration of free Ca2+ in the ER N (4, 1) µM
k3 SERCA pump dissociation constant N (0.4, 0.01) µM
Table 2.1: Definition and description of the ODE model parameters. Prior distributions are derived
from [17, 16, 41].
15
smoothed using a moving average filter with a twenty-second window size (Figure A.1). After smoothing,
data points occurring before ATP stimulation are removed. Data points for each Ca2+ trajectory after t=300
are downsampled by a factor of 10; the trajectories are at or close to steady state by this time. After removing the data for the first 200 seconds and downsampling for the last 700 seconds, each processed trajectory
consists of Ca2+ response data on 171 time points (t = 200, 201, . . . , 298, 299, 300, 310, 320, . . . , 1000). All
numerical experiments in this chapter will evaluate Ca2+ response on those same 171 time points. Singlecell gene expression data is collected using MERFISH after the Ca2+ imaging as previously described [47,
62].
2.3.3 Generating cell chains via cell-cell similarity
Cell-cell similarity is quantified via single-cell transcriptional states, i.e. by comparing x
i
m and x
j
m, the
expression of m genes in cells i and j. We obtain a symmetric cell-cell similarity matrix, W, from the logtransformed MERFISH expression data via optimization in SoptSC [98]: entries Wi,j denote the similarity
between cells i and j. To create a chain of cells linked through their similarity in gene expression space,
we:
1. Construct a graph G = (V, E); each node is a cell and an edge is placed between two cells if they
have a similarity score above zero;
2. For a choice of initial (root) cell, traverse G and record the order of cells traversed.
Ideally, each cell would be visited exactly once; however this amounts to finding a Hamiltonian path in
G, an NP-complete problem. Therefore, as a heuristic solution we use a depth-first search (DFS), which
can be completed in linear time. From the current node, we randomly select an unvisited neighbor node
and set this as the next current node, recording it once visited (pre-order DFS). If the current node has
no unvisited neighbors, it backtracks until a node with unvisited neighbors is found. When there is no
16
unvisited node left, every node in the graph has been visited exactly once. Given cases where the similarity
matrix is sparse (as we have here), the DFS generates a tree that is very close to a straight path.
2.3.4 Bayesian parameter inference with posterior-informed prior distributions
We seek to infer dynamic model parameters in single cells, informed by cell-cell similarity via the position
of a cell in a cell chain. We use the Markov chain Monte Carlo (MCMC) implementation: Hamiltonian
Monte Carlo (HMC) and the No-U-Turn Sampler (NUTS) in Stan [46, 97]. HMC improves upon the efficiency of other MCMC algorithms by treating the sampling process as a physical system and employing
conservation laws [45]. From an initial distribution, the algorithm proceeds through intermediate phase of
sampling (warmup) until (one hopes) convergence to the stationary distribution. During warmup, NUTS
adjusts the HMC hyperparameters automatically [46].
The prior distribution over parameters is a multivariate normal distribution, with dimensions θj , j =
1, . . . , m, where m is the number of parameters. This choice of prior makes it straightforward to pass
information from the inferred posterior distribution of one cell to the next cell in line to be sampled,
which will be described in Section 2.3.5. Let f be a numerical solution of the ODE model, and y0 be the
initial condition. Then, in each single cell, the Ca2+ response to ATP is generated by the following process:
θj ∼ N (µθj
, σθj
)
yb(t) = f(y0, t; θ)
σ ∼ Cauchy(0, 0.05)
y(t) ∼ N (yb(t), σ),
17
where we truncate the prior so that each θi
is bounded by 0 from below. The Cauchy distribution is chosen
to generate the noise for observed Ca2+ response as it contains greater probability mass in its tails, thus
encouraging NUTS to explore extreme values of the parameter space more frequently.
For the first cell in a chain, we use a relatively uninformative prior, the “Lemon” prior (Table 2.1),
derived from parameter value estimates in previous work [17, 16, 41]. For the i
th cell in a chain (i > 1),
the prior distribution is constructed from the posterior distribution of the (i − 1)th cell (Section 2.3.5). For
each cell, NUTS is run for four independent chains with the same initialization. To simulate yb(t) during
sampling, we use the implementation of fourth- and fifth-order Runge-Kutta in Stan [97]. For each output
trajectory y, its error is the Euclidean distance between y and data y
∗
for all 171 data points:
ϵ(y, y∗
) := X
170
k=0
(y(tk) − y
∗
(tk))2
!1/2
The error of a posterior sample for a cell is the mean error of trajectories simulated from all draws in the
sample:
ϵsample :=
1
N
X
N
i=1
ϵ(yi
, y∗
)
where N is the sample size and yi
is the output trajectory from the i
th draw in the sample.
Convergence of NUTS chains is evaluated using the Rb statistic: the ratio of between-chain variance
to within-chain variance [97, 99]. A typical heuristic used is Rb between 0.9 and 1.1, indicating that for
this set of chains the stationary distributions reached for a given parameter are well-mixed. There are two
caveats on our use of Rb in practice:
1. For our model, we observe that well-fit (i.e. not overfit) Ca2+ trajectories did not require Rb ∈
(0.9, 1.1) for all parameters. Thus we assess Rb only for the log posterior, using a more tolerant
upper bound of 4.0.
18
2. There are cases where one chain diverges but three of the four are well-mixed. In such cases, we
choose to retain the three well-mixed chains as a sufficiently successful run. Thus if Rb is above the
threshold, before discarding the run, we compute Rb for all three-wise combinations of chains, and
retain the run if there exist three well-mixed chains.
2.3.5 Constructing and constraining prior distributions
We construct the prior distribution of the i
th cell from the posterior of the (i − 1)th cell. The prior mean
for each parameter θj for the i
th cell is set to µ
(i−1)
j
, the posterior mean of θj from the (i − 1)th cell. The
variance of the prior for θj is derived from σ
(i−1)
j
, the posterior variance of θj from the (i − 1)th cell. To a)
sufficiently explore the parameter space, and b) prevent instabilities (rapid growth or decline) in marginal
parameter posterior values along the cell chain, we scale each σ
(i−1)
j
by a factor of 1.5 and clip the scaled
value to be between 0.001 and 5. The scaled and clipped value is then set as the prior variance for θj for
the i
th cell.
2.3.6 Dimensionality reduction and sensitivity analyses
To compare posterior samples from different cells, we use principal component analysis (PCA). Posterior
samples are projected onto a subspace by first choosing a cell (the focal cell) and normalizing the posterior
samples from other cells against the focal cell, either by min-max or z-score normalization. Min-max normalization transforms a vector x to x−xmin
xmax−xmin
, where xmin is the minimum and xmax the maximum of x.
z-score normalization transforms x to x−µx
σx
, where µx is the mean and σx is the standard deviation of x.
Normalizing to the focal cell amounts to setting xmin, xmax, µx, σx to be the values corresponding to the
focal cell for all cells normalized. We perform PCA (implemented by scikit-learn 0.24 [100]) on the normalized focal cell posterior samples and project them into the subspace spanned by the first two principal
19
components. The normalized samples from all other cells are projected onto the PC1-PC2 subspace of the
focal cell.
We develop methods for within-posterior sensitivity analysis to assess how perturbations of model
parameters within the bounds of the posterior distribution affect Ca2+ responses. Given ˜θ, the posterior
distribution of a cell, each parameter θj is perturbed to two extreme values: ˜θ
(0.01)
j
, the 0.01-quantile of
˜θ·,j , and ˜θ
(0.99)
j
, the 0.99-quantile of ˜θ·,j . Nine “evenly spaced” samples are drawn from the posterior range
of ˜θ for the parameter of interest, ˜θ·,j : the k
th draw corresponds to a sample ˜θi,· such that ˜θi,j = ˜θ
(0.1k)
j
,
the 0.1k-quantile of ˜θ·,j . For each draw ˜θi,·
, we replace ˜θi,j by either or ˜θ
(0.01)
j
or ˜θ
(0.99)
j
and then simulate
a Ca2+ response. The mean Euclidean distances between trajectories simulated from the evenly spaced
samples and the perturbed samples are used to quantify the sensitivity of each parameter perturbation.
2.3.7 Correlation analysis and cell clustering of MERFISH data
Correlations between single-cell gene expression values and posterior parameters from the Ca2+ pathway
model are determined for variable genes. We calculate the z-scores of posterior means for each parameter
of a cell sampled from a population, and remove that cell if any of its parameters has a posterior mean
z-score smaller than −3.0 or greater than 3.0. PCA is performed on log-normalized gene expression of
remaining cells using scikit-learn 0.24 [100], which yields a loadings matrix A such that Ai,j represents the
“contribution” of gene i to component j. We designate gene i as variable if Ai,j is ranked top 10 or bottom
10 in the j
th column of A for any j ≤ 10. For each variable gene, we calculate the Pearson correlation
between its log-normalized expression value and the posterior means of individual model parameters.
Gene-parameter pairs are ranked by their absolute Pearson correlations and the top 30 are selected for
analysis. Gene-parameter pair relationships are quantified by linear regression using a Huber loss, which
is more robust to outliers than mean squared error.
20
To cluster cells using their single-cell gene expression, raw count matrices are normalized, log transformed, and scaled to zero mean and unit variance before clustering using the Leiden algorithm at 0.5
resolution [101], implemented in Scanpy 1.8 [102]. Marker genes for each cluster are determined by a
t-test.
2.3.8 Clustering of cell posterior parameter distributions
Cells are clustered according to their posterior distributions. For each parameter, the posterior means for
each cell are computed and scaled to [0, 1]. The distance between two cells is defined as the m-dimensional
Euclidean distance between their posterior means (where m is the number of parameters). Given distances
calculated between all pairs of cells, agglomerative clustering with Ward linkage is performed using SciPy
1.7 [103]. Marker genes for each cluster identified are determined using a t-test.
2.4 Results
2.4.1 Single-cell priors informed by cell predecessors enable computationally efficient
parameter inference
To study the dynamic Ca2+ responses of cells to ATP stimulation, we fit the ODE model (Equations 2.1) to
data in single cells using Bayesian parameter inference (Figure 2.1A). Only those MCF10A cells classified as
“responders” to ATP were included — cells with very low overall responses (less than 1.8 Ca2+ peak height)
were filtered out. To assess whether cell chains improve inference, we performed parameter inference of
the ODE model in single cells fit either individually, each from the same prior (we used the “Lemon” prior
(Table 2.1)), or fit via the construction of a cell chain. In a cell chain there is a transfer of information,
whereby the posterior parameter distribution of one cell informs the prior distribution of the next cell in
the chain. The first cell in the chain was fit using the Lemon prior. We are primarily interested in cell
chains constructed using transcriptional similarity: we constructed cell chains based on a single-cell gene
21
0 500 1000
Sampling time (minutes)
0.000
0.002
0.004
0.006
Density
Similar-r1
Similar-r2
Similar-r3
A
B C
D
MERFISH
gene expression
Cell-cell similaritybased cell chain
Smoothed Ca2+
trajectories
Analyze fits and
posterior distributions Sample parameters
by MCMC
Ca2+ (AU)
Time (seconds)
Live imaging
in MCF10A cells
Data
Simulation
Time (second)
Ca2+ (AU)
Time (seconds)
Ca2+ (AU)
Cell i
πi
πi+1= g(pi
)
...
Cell i+1
pi
pi+1
θ2
θ2
All cells
(MAP) Cell 1
Cell 2 Cell 3
θ1 θ1
E
Figure 2.1: Cell chains improve performance of single-cell parameter inference. A: Workflow for
single-cell parameter inference along a cell chain. πi and pi respectively denote the prior and posterior
distributions of cell i. MAP: maximum a posteriori value. B–C: Comparison of inference for gene expression similarity-based cell chain (Similar-r1; first column) vs. cells fit individually, i.e. without a cell chain
(second two columns). Output metrics are the mean log posterior, where higher values denote better fits;
and the sampling times. Each row represents a cell. For Similar-r1, one cell was sampled from a Lemon
prior before Cell 5121 to inform the prior of that cell (not shown). Run 1 for the individually fitted cells has
500 warmup steps and run 2 has 1000. D: Comparison of predictions of the dynamics of an unfitted cell
using: samples from the posterior distribution of a fitted cell with similar gene expression, samples from
the posterior distribution of a random fitted cell, and reference values from literature (“Lemon” values; see
Table 2.1). E: Comparison of the effects of HMC parameters. Parameters (num. warmup steps; max. tree
depth) are for r1: (500, 10), r2: (1000, 15), r3: (500, 15).
22
expression similarity metric (Section 2.3.3) and compared them with alternatives (Table A.1). We studied
the effects of different choices of g, where πi+1 = g(pi); pi
is the posterior distribution for cell i, and
πi+1 is the prior distribution for the following cell. We found that transformations via scaling and clipping
were necessary to sufficiently explore the parameter space for each cell while maintaining stable marginal
posterior distributions along a cell chain (Section A.1.1, Figure A.2). We tested various numerical methods
to solve the ODE system (stiff and non-stiff), and found that we could simulate Ca2+ responses sufficiently
well using a non-stiff solver (Figure A.3), so for inference runs with hundreds of single cells we proceeded
to use a non-stiff solver.
Parameter inference of the ODE model via a cell chain (denoted Similar-r1) was more efficient and
gave more accurate results than individually fit cells (Figure 2.1B–C), with shorter overall computational
run times and higher posterior model probabilities (Table A.2). The model fit quality was also higher for
the cell chain vs. individually fit cells as assessed by the Rb statistic (Table A.3). To test whether these
improved model fits are in part due to longer fitting times rather than the construction of the cell chain
directly, we fit the same cell consecutively ten times: the fits improved over the ten repeat epochs, but
the only substantial improvements were seen for the first couple of epochs, after which improvements
were minimal and the overall fit quality was comparable to the same cell fitted in the chain (Section A.1.2,
Figure A.4A–F), albeit with some evidence for overfitting in individual parameters (Figure A.4G–H). Thus,
the quality of fits obtained from fitting in a cell chain are not inherently due to more time spent running
inference but are due to the transfer of information between different cells along the chain.
The advantage of transfer learning in cell chains can be demonstrated by the higher predictive power
of sampled posterior distributions. We predicted Ca2+ responses of test cells for which the parameters have
not been inferred (i.e. cells not in Similar-r1), using the initial conditions of the test cells but parameters
from elsewhere. Each test cell was simulated using parameters sampled from: the posterior of a fitted cell
with similar gene expression to the test cell; the posterior of a random fitted cell; and reference values from
23
literature (“Lemon” values; see Table 2.1). To compare predicted Ca2+ responses, we used the Euclidean
distance between a predicted Ca2+ trajectory and data to quantify the prediction error. We found that
posteriors of similar cells and posteriors of random cells had equally good predictive power on test cells:
in both cases better than using Lemon values for prediction (Figure 2.1D). These results illustrate how
constructing priors for cells using posterior information from other cells offers greater ability to capture
the dynamics of a new cell not previously modeled.
To assess whether cell chains ordered using gene expression information improve inference performance over cell chains ordered randomly, we compared inference runs of at least 500 cells in a chain,
with priors informed by cell predecessor, where the chain construction was either random or gene expression similarity-based. The performance of cell chains ordered randomly — evaluated by computational
efficiency (sampling times) and accuracy of fits (model posterior probabilities) — was not significantly
different than that of the similarity-based chains (Table A.4). Therefore, although the use of a cell chain
(priors informed by cell predecessors) improved inference relative to individually fit cells, the choice of
cell predecessors (similarity-based vs. randomly assigned) did not affect computational efficiency or the
accuracy of fits.
We also studied the effects of HMC parameters on sampling. We found that sampling times were faster
without loss of fit quality when we reduced the maximum tree depth (a parameter controlling the size of
the search space) from 15 to 10, since rarely was a tree depth > 10 used in practice; so this reduction did not
negatively impact the model fits (Table A.5). We also found that a warmup period of 500 steps was sufficient
for convergence of MCMC chains for most cells. Setting the maximum tree depth to 10 and the number of
warmup steps to 500 led to much faster sampling times for large populations of cells (Figure 2.1E).
24
A
B
C
D
E
Posterior0.08
0.0
Posterior3.91
0.19
Fold change 1.0
1.0
Fold change
2.43
6.42
Cell position along chain
Figure 2.2: Parameter dependencies revealed by the analysis of marginal posterior distributions
along cell chains. A: Marginal parameter posterior distributions for the PLC degradation rate (Koff, ATP)
in two parallel runs of the same cell chain. Box plots of the first–third quartiles of the distribution with
whiskers denoting its full range (upper). Marginal posterior mean fold changes of consecutive cells in
the chain (lower). B: As for (A), with the IP3 channel dissociation parameter (d5). C: Left: intercellular
variability. Scatter plot of the maximum a posteriori (MAP) values of 500 cells for parameters η3 and c0.
Color indicates position along the chain. Right: intracellular variability. Scatter plots of 500 samples from
the posterior of one cell from the chain; three representative cells are shown. D: As for (C), with parameters
ϵ and η2. E: As for (C), with parameters Koff, ATP and KATP.
25
2.4.2 Analysis of single-cell posteriors reveals divergent intracellular and intercellular
sources of variability
The posterior distributions of hundreds of cells show striking differences between marginal parameters:
some are consistent across cells in a chain while others vary widely. To quantitatively assess this, we ran
two similarity-based cell chains with identical cell ordering for the final 100 cells but with different initial
cells. We found that while some marginal posterior parameters were similar for all cells (e.g. Koff, ATP,
Figure 2.2A), others diverged for the same set of cells along a chain (e.g. d5, Figure 2.2B). Relative changes
in marginal posteriors were seen to be tightly correlated. We computed the fold change in mean marginal
posterior parameter values between consecutive cells along the chain (Figure 2.2A–B, second row): the
majority of consecutive cell pairs were tightly correlated both in direction and magnitude, even when the
absolute values diverged. We obtained similar results for random cell chains run in parallel with different
initial cells (Figure A.5). Analysis of the posterior values of parameters relative to their “Lemon” values
revealed that in some cases large distances between them (e.g. d5 varies from 0.08 (Lemon) to posterior
values greater than 3). There are several possible causes for these prior–posterior discrepancies, including
differences in the biological system and differences in experimental inputs, e.g. the stimulus used or the
amount of stimulus that cells receive.
Further analysis of the marginal posterior distributions revealed two uninformative (“sloppy” [104])
parameters. The posterior distributions of Be and η1 drifted, i.e. varied along the chain independent of
the particular cell (Figure A.6A–B). Given these insensitivities, we studied model variants where either
one or both of these parameters were set to a constant. Comparing chains of 500 cells each, the reduced
models performed as well as the original in terms of sampling efficiency and convergence (Figure A.6C–
E, Table A.6). Posterior predictive checks of the reduced models showed no significant differences in
simulated Ca2+ trajectories. Thus, for further investigation into the parameters underlying single-cell
26
Ca2+ dynamics, we analyzed the model with both Be and η1 set to a constant. This cell chain is referred
to as Reduced-3.
We discovered striking differences between intracellular and intercellular variability through analysis of the joint posterior distributions of parameters in chain Reduced-3 . Several parameter pairs were
highly correlated, as can be expected given their roles in the Ca2+ pathway, e.g. as activators or inhibitors
of the same species. However, comparison of parameter correlations within (intra) and between (inter)
cells yielded stark differences. Some parameter pairs showed consistent directions of correlation intercellularly (along the chain) and within single cells. The Ca2+ pump permeability (η3) and the concentration of
free Ca2+ (c0) were positively correlated both inter- and intracellularly (Figure 2.2C). Similarly, the ER-tocytosolic volume (ϵ) and the ER permeability (η2) were negatively correlated in both cases (Figure 2.2D).
However, the ATP decay rate (KATP) and the PLC degradation rate (Koff, ATP) were positively correlated
along the chain (posterior means) but — for many cells — negatively correlated within the cell (Figure 2.2E).
The distribution of MAP values is well-mixed, i.e. there is no evidence of biases arising due to a cell’s position in the chain: the variation observed in the posterior distributions represents biological differences in
the population. These differences may be in part explained by the differences in scale: intercellular parameter ranges are necessarily as large as (and sometimes many times larger than) intracellular ranges. On
these different scales, parameters can be positively correlated over the large scale but negatively correlated
locally, or vice versa. These divergent sources of variability at the inter- and intracellular levels highlight
the complexity of the dynamics arising from a relatively simple model of Ca2+ pathway activation.
2.4.3 Quantifying the sensitivity of Ca2+ responses in a population of heterogeneous
single cells
We conducted analysis of the sensitivity of Ca2+ responses to the model parameters. Typically, one defines
a parameter sensitivity as the derivative of state variables with respect to that parameter [105, 106]. Here,
27
0 250 500 750
1
2
3
0 250 500 750
1
2
0 250 500 750
1
2
0 250 500 750
1
2
3
4
A B
C ATP D Koff, IP3
0.01 0.99
Mean distance (AU)
Ca2+ (AU) Ca2+ (AU)
Time (seconds)
Quantile of perturbed value
Time (seconds)
Time (seconds) Time (seconds)
Quantile of perturbed value
...
Posterior parameter samples
...
(a)
(b)
(c)
(a)
(b)
(c)
(a)
(b)
(c)
Simulate purturbed trajectories
Perturb
Compare to unpurturbed trajectories
Figure 2.3: Sensitivity of Ca2+ responses to parameter perturbations. A: Schematic diagram of approach to studying model sensitivities with respect to parameters sampled across the posterior range. The
parameter to be perturbed (θj ) is set to extreme values (0.01- or 0.99-quantile of the marginal distribution).
B: Parameter sensitivities for a population of fitted cells from a gene expression similarity-based chain
(Reduced-3 model). Sensitivities are reported in terms of model distances between baseline and perturbed
trajectories. std: standard deviation. C: Simulated model trajectories in response to perturbations in ATP
concentration. D: Simulated model trajectories in response to perturbations in Koff, IP3.
28
we are most interested in how the dynamics are affected by parameter perturbations over the range of
their marginal posterior distributions. Thus, we evaluate the model response to a given parameter perturbation across its marginal posterior distribution in a population of cells as follows. First, sample from a
cell’s posterior distribution, and alter each sample such that the parameter of interest is set to an extreme
value according to its marginal posterior distribution (0.01-quantile or 0.99-quantile). We then simulate
trajectories from these altered samples (Figure 2.3A), and use the distance between unperturbed and perturbed model trajectories to define the sensitivity of model output to that parameter, taking the mean of
nine simulated trajectories.
We find that there is a lot of variation in the Ca2+ responses: sensitive to some model parameters and
insensitive to others (Figure 2.3B). Notably, the sensitivities of the least sensitive parameters had mean
values of close to 1.0: similar to the distances obtained from the best-fit posterior values (Table A.5), i.e.
the Ca2+ response is insensitive to these parameters across the whole posterior range. The insensitive
parameters were not simply those which had the highest posterior variance: there was little correlation
between the inferred sensitivity and the posterior variance (Table A.7), compare, e.g., parameters d1 and
d5.
Analysis of the Ca2+ responses to parameter perturbations provides means to predict how much Ca2+
responses are affected by changes in extracellular and intracellular dynamics (Figure 2.3C–D). For example,
low concentrations of ATP result in very low Ca2+ responses; increasing the concentration of ATP can
more than double the peak response (Figure 2.3C). The importance of IP3 in Ca2+ signal transduction is
in agreement with the results of Yao et al. [41]; here we go further in that we can quantify the particular
properties of the Ca2+ response affected by each parameter. In the case of Koff, IP3, the main effect is also
in the peak height of the Ca2+ response (Figure 2.3D).
29
2.4.4 Variability in gene expression is associated with variability in Ca2+ dynamics
We studied variation between pairs of genes and parameters sampled from a cell population to assess
whether relationships between them might exist. We found that several gene-parameter pairs were correlated. In general, the proportion of variance explained between a gene-parameter pair was low; this is
to be expected given the many sources of variability in both the single-cell gene expression and the Ca2+
responses.
Analysis of the most highly correlated gene-parameter pairs (Section 2.3.7; Table A.8) identified a number of genes that were correlated with multiple parameters, e.g. PPP1CC, as well as parameters that were
correlated with multiple genes, e.g. η3. Pairwise relationships were analyzed via linear regression. The top
four correlated gene-parameter pairs from a similarity-based cell chain are shown in Figure 2.4A–D: cells
are well-mixed according to their positions along the chain, i.e. correlations are not due to local effects.
The pairwise correlations overall are low, which we expect given single-gene inputs. Performing multiple regression could improve predictive power, however our goal here is to study whether any evidence
supports the existence of individual gene-parameter relationships. We performed the same analysis on a
randomly ordered cell chain, where the same gene-parameter relationships were recapitulated, albeit with
lower absolute correlation values (Figure 2.4E–H, Table A.9). There is no discernable influence of a cell’s
position in a chain on the gene-parameter relationship, confirming that these correlations among a cell
population reflects the variability in the population rather than any sampling artefacts.
We compared the top genes ranked by gene-parameter correlations for four populations: from two
randomly sampled and two similarity-informed cell chains. Gene-parameter pairs were sorted by their
absolute Pearson correlation coefficients, and the genes ranked by their positions among sorted pairs.
In total, we identified 75 correlated gene-parameter pairs for the Reduced-3 chain, applying a Bonferroni
correction for multiple testing (Figure A.7). Out of the top 30 genes, 25 appeared in the top 30 in at least
three of the four cell chains studied (Figure 2.4I). Of these 25 genes, 20 also appeared as top-10 marker genes
30
from unsupervised clustering (into 3 clusters) of the gene expression data directly (Figure 2.4I). The high
degree of overlap between these gene sets demonstrates that a subset of genes expressed in MCF10A cells
explain not only their overall transcriptional variability but also their variability in Ca2+ model dynamics.
These results are also suggestive of how information content pertaining to the heterogeneous Ca2+ cellular
responses is encapsulated in the parameter posterior distributions.
Next, we turn our attention from the level of individual genes/parameters to that of the whole: what is
the relationship between the posterior parameter distribution of a cell and its global transcriptional state?
We used principal component analysis (PCA) for dimensionality reduction of the posterior distributions
to address this question. We selected a cell (denoted the “focal cell”) from a similarity-based cell chain
(Reduced-3) and decomposed its posterior distribution using PCA. We projected the posterior distributions
of other cells onto the first two components of the focal cell (Figure 2.4J–K and Figure A.8A–B) to evaluate
the overall similarity between the posterior distributions of cells relative to the focal cell. On PCA projection plots, posterior samples are colored based on gene expression: samples are derived from cells that
are either transcriptionally similar to the focal cell, or share no transcriptional similarity. Comparison of
similar and dissimilar cells from the same population showed that cells that were transcriptionally similar
were located closer to the focal cell than dissimilar cells (Figure 2.4L–M and Figure A.8C–D). By contrast,
similar analysis of a random cell chain showed that transcriptionally similar cells were not located closer
to the focal cell than dissimilar cells (Figure A.9). Notably, proximity of posterior samples derived from
transcriptionally similar cells was not driven by a cell’s position along the chain (no block structure observed; Figure A.10). Similarities between posterior distributions of transcriptionally similar cells were
thus not driven by local cell-cell similarity, but rather underlie a global effect and denote a relationship
between the transcriptional states of cells and the Ca2+ pathway dynamics that they produce.
31
Cell positions
Regression line
A B C D
E
Leiden
SLC25A6
RRM1
RCN1
PRKCI
PPP3CA
PPP2CA
PPP1CC
GNAS
CCDC47
CAPN1
TOP2A
PPP2CB
PLK1
MSH2
MCM2
KIF20A
ITPRIPL2
DHFR
CENPF
CDC25A
CCNA2
ATP2B1
PIP4K2C
CCNF
ATP2C1
Present
Absent
Reduced-3
Similar-r1
Random-1
Random-2
F G H
I J L
K M
Log(PPP1CC) Log(CCDC47) Log(PPP1CC) Log(MCM6)
η3 η3
η3
Koff, IP3
Koff, IP3
a
η
a
3
Reduced-3 Random-2
Figure 2.4: Variability in Ca2+ model dynamics is associated with variability in gene expression.
A–D: Top ranked gene-parameter pairs by Pearson correlation from similarity-based chain (Reduced-3).
Log gene expression is plotted against marginal posterior means. Each dot represents a cell; color denotes
cell position along the chain. E–H: The same gene-parameter pairs shown in (A), for cells sampled from a
randomly ordered cell chain (Random-2). I: Comparison of genes identified as present/absent in the top 30
genes from gene-parameter correlation analysis across different chains, and the top 30 marker genes from
Leiden clustering directly on the gene expression space. J: Projection of posterior samples (500 per cell)
onto the first two principal components of the focal cell (Cell 5106), shown in blue. Posterior samples from
dissimilar cells shown in gray; posterior samples from similar cells shown in yellow-red, where the shade
indicates degree of transcriptional similarity. K: As for (J), with PCA performed on a different focal cell
(Cell 4940). L: Histogram corresponding to (J): mean distances from the origin (focal cell 5106) to samples
from surrounding cells. M: Histogram corresponding to (K): mean distances from the origin (focal cell
4940) to samples from surrounding cells.
32
2.4.5 Similarity-based posterior cell clustering reveals distinct transcriptional states
underlying Ca2+ dynamics
To characterize the extent to which we can predict Ca2+ responses from knowledge of the model dynamics,
we clustered 500 cells from a similarity-based cell chain (Reduced-3) based on the single-cell posterior
distributions using hierarchical clustering (Section 2.3.8). Three clusters were obtained (Figure 2.5A). Each
cluster showed distinct Ca2+ dynamics: “low-responders” exhibited lower overall Ca2+ peaks in response
to ATP (Figure 2.5B); “early-responders” exhibited earlier overall Ca2+ peaks in response to ATP; and “latehigh-responders” exhibited robust Ca2+ responses with peaks that were later and higher than cells from
other clusters (Figure A.11). The distinct dynamic profiles can be explained by the model parameters that
give rise to them: low-responders were characterized by high concentration of free Ca2+ in the ER (c0) and
low activation rates of IP3R (Figure A.11, Figure A.12, Figure A.13). Early-responders were characterized by
parameters leading to faster and earlier IP3 and PLC dynamics. Late-high-responders were characterized
by small values of d1 (Figure A.13).
Comparison of posterior parameter clustering with the clustering done by Yao et al. [41] highlights
similarities and distinctions. In both cases, one of the three clusters was characterized by larger responses
to ATP and correspondingly higher values of dinh (Figure A.13). In Yao et al., both d1 and d5 were smaller
in cells with stronger Ca2+ responses; we found that d1 was smaller in the late-high-responder cluster,
but not in the early responders. In our results, d5 was higher for the early-responders, in contrast with
Yao et al. (Figure A.13). We note that we set a stringent threshold for minimum peak Ca2+ response, i.e.
we excluded non-responding cells, unlike Yao et al., thus in a direct comparison most of the cells in our
population would belong to the “strong positive” cluster in [41].
To assess the Ca2+ dynamic clusters we obtain in light of single-cell gene expression, we performed
two analyses for comparison. We clustered the same 500 cells based solely on their gene expression using
community detection (Leiden algorithm in Scanpy [101]); and we clustered cells from a randomly ordered
33
0 2 4 6 8 10
Density
Gene expression clustering
Ca low
Ca mid
Ca high
0 2 4 6 8 10
Density
Posterior clustering, random
C1
C2
C3
B
C
A
E
D
F
G
Peak Ca2+ response (AU)
Peak Ca2+ response (AU)
Peak Ca2+ response (AU)
Figure 2.5: Clustering of cell posterior distributions reveals marker genes for Ca2+ states. A: Agglomerative clustering on posterior means from a similarity-based chain (Reduced-3) using Ward linkage
(k = 3 clusters). B: Kernel density estimate of Ca2+ peak height from posterior clustering of a similaritybased chain. C: Kernel density estimate of Ca2+ peak height from gene expression clustering (Leiden) for
the same cell population as shown in (A–B). D: Kernel density estimate of Ca2+ peak height from posterior clustering of a randomly ordered cell chain (Random-2). E: Top ten marker genes per cluster from
similarity-based posterior clustering. F: Top ten marker genes per cluster from gene expression clustering
of cells. G: Top ten marker genes per cluster from posterior clustering of randomly ordered cell chain.
34
cell chain using the same approach as above for hierarchical clustering of posterior parameters. The cell
clusters obtained based solely on gene expression can be distinguished based on the Ca2+ profiles observed:
“Ca-low”, “Ca-mid”, and “Ca-high” responses (Figure 2.5C); overlapping partially with the similarity-based
clusters obtained (Figure 2.5B). By contrast, no distinct Ca2+ dynamic responses could be observed for the
posterior clustering based on the random cell chain (Figure 2.5D and Figure A.14).
We analyzed differential gene expression from each set of clusters obtained, from the similarity-based
chain (Figure 2.5E), the gene expression-based clustering (Figure 2.5F), and the randomly ordered chain
(Figure 2.5G). Distinct markers for each cluster were observed for the similarity-based clustering and the
gene expression-based clustering, but were not discernible for cell clustering on the random chain. Clustering of cell posteriors from the randomly ordered chain was thus unable to distinguishable Ca2+ dynamic
profiles nor gene expression differences. On the other hand, clustering posteriors from a similarity-based
chain identified distinct gene expression profiles, and these overlapped with the marker gene profiles obtained by clustering on the gene expression directly. That is, parameter inference of single-cell Ca2+ dynamics from using a gene expression similarity-based chain enables the identification of cell clusters with
distinct transcriptional profiles and distinct responses to ATP stimulation.
Analysis of the genes that are associated with each Ca2+ profile showed that low-responder cells were
characterized by upregulation of CCDC47 and PP1 family genes (PPP1CC and PPP2CA). Early-responder
cells were characterized by upregulation of CAPN1 and CHP1, among others. The late-high responder
cells were characterized by increased expression CALM3 among others, although the marker genes for this
cluster were less evident than the others. By comparison of marker genes, we saw considerable overlap
between the early-responders (similarity-based clustering) and the Ca-mid responders (gene expression
clustering). We also saw overlap between the low-responder and the Ca-low marker gene signatures.
These results highlight that the posterior distributions of cells fit from similarity-based cell chains captured
information about the underlying transcriptional states of the cells, linking cellular response parameters
35
directly to transcriptional states. For example, the low-responder cells (similarity-based clustering) were
distinguished by parameter d5, characterizing the dynamics of IP3 dissociation, and were marked by high
PPP1CC and CCDC47 expression.
Finally, we considered whether alternative means for cell chain construction could provide similar
information. We constructed a cell chain using cells from Reduced-3, denoted “Ca-similarity” for which
consecutive cells displayed similar Ca2+ responses (see Section A.1.3). Clustering cells from Ca-similarity
based on their Ca2+ responses (via k-means) showed that cells with different Ca2+ responses had distinct
gene expression profiles (Figure A.15). However, hierarchical clustering on the parameter posterior distributions from these cells after performing inference on Ca-similarity did not separate cells into clusters
with distinct Ca2+ responses or distinct gene expression profiles (Figure A.16A–B). This result makes sense
when analyzed via the similarity matrices obtained for Ca-similarity vs. a gene expression similarity-based
chain (Figure A.16C–D). For the former, almost all pairs of neighboring cells did not share similarity in
gene expression despite their similarity in Ca2+ response. Whereas for Reduced-3, the gene expression
similarity-based chain, all pairs of consecutive cells were similar in gene expression (Figure A.16D).
2.5 Discussion
We have presented methods for inferring the parameters of a signaling pathway model, given data describing dynamics in single cells coupled with subsequent gene expression profiling. We hypothesized
that via transfer learning we could use posterior information from a cell to inform the prior distribution of
its neighbor along a “cell chain” of transcriptionally similar cells. Implemented using Hamiltonian Monte
Carlo sampling [46], we discovered that the cell chain construction for prior distributions did indeed lead
to faster sampling of parameters. However, these improvements did not rely on the use of gene expression
to construct priors: the performance of inference on cells in a chain ordered randomly was equally good.
However, cell chains constructed using gene expression similarity contained more information (as defined
36
by their posterior parameter distributions) about Ca2+ signaling responses. Clustering the posterior parameters identified important relationships between gene expression and the dynamic Ca2+ phenotypes,
thus providing mappings from state to dynamic cell fate.
The model studied here is described by ODEs to characterize the Ca2+ signaling pathway, adapted from
[16, 17], consisting of 12 variables and (originally) a 40-dimensional parameter space. This was reduced to
19 parameters in Yao et al. [41] and 16 parameters in our work. Analysis of even a single 16-dimensional
posterior distribution requires dimensionality reduction techniques, let alone the analysis of the posterior
distributions obtained for populations of hundreds of single cells. Parameter sensitivity analysis highlighted the effects of specific parameter perturbations on the Ca2+ dynamic responses. Indeed, we advocate for the use of sensitivity analysis more generally as means to distinguish and pinpoint the effects of
different parameter combinations for models of complex biochemical signaling pathways.
By unsupervised clustering of the posterior distributions, we found that distinct patterns of Ca2+ in
response to ATP could be mapped to specific variation in the single-cell gene expression. In previous work
using similar approaches for clustering [41], posterior parameter clusters predominantly revealed response
patterns consisting of responders and non-responders; here we excluded those cells that did not exhibit a
robust response to ATP. We were able to characterize subtler Ca2+ response dynamics (described by “early”,
“low”, and “late-high” responders) and predict which transcriptional states give rise to each. This approach
is limited since relatively little gene expression variance is explained by an individual model parameter:
it may be possible to address this in future work by surveying a larger range of cell behaviors, e.g. by
including a wider range of cellular responses or by considering higher-level co-variance in the posterior
parameter space. It also remains to be tested whether the given model of Ca2+ dynamics is appropriate to
describe the signaling responses in cell types other than MCF10A cells.
Our ability to fit to of the single cells tested came potentially at the expense of an unwieldy model size.
With four variables and a 16-dimensional parameter space, the dimension of the model far exceeds that
37
of the data: time series of Ca2+ responses in single cells. Without data with which to constrain the three
additional model species, we needed to constrain the model in other way. We used an approach of “scaling
and clipping” for construction of the priors, i.e. setting ad hoc limits to control posterior variance. More
effective (and less ad hoc) techniques could improve inference overall and may become necessary in the
case of larger models. These include (in order of sophistication): tailoring the scaling/clipping choices to be
parameter-specific; tailoring the choice of prior variance based on additional sources of data; or performing
model reduction/identifiability analysis to further constrain the prior space before inference. Constructing
priors from cells with similar gene expression also helped to curb the curse of dimensionality: sampling
cells sequentially places a constraint on the model. Nonetheless, in the future more directed approaches
to tackle model identifiability ought to be considered.
Connecting dynamic cell phenotypes to transcriptional states remains a grand challenge in systems biology. The limitations of deriving knowledge from gene expression data alone [84] have led to the proposal
of new methods seeking to bridge the gap between states and fates [107]. Here, making use of technology
that jointly measures Ca2+ dynamics and gene expression in single cells, we have shown that parameter
inference informed by transcriptional similarity represents another way by which we can connect gene expression states to dynamic cellular phenotypes. The statistical framework employed improved our ability
to perform parameter inference for single-cell models. We expect that improvements to statistical inference
frameworks could be gained by similar approaches applied to other models of nonlinear cellular response
dynamics. Current and future technologies combining higher-resolution dynamic cell measurements with
high-throughput genomics will provide new data to inform these parameter inference methods and, we
expect, lead to new discoveries of means of transcriptional control of biological dynamics.
38
Chapter 3
Robust data-driven model discovery methods for biological systems
3.1 Abstract
Biological systems exhibit complex dynamics that can be well-characterized by differential equations. Ordinary differential equations (ODEs) are commonplace and compelling tools, however until recently their
construction has required extensive prior knowledge of the system. Recently, machine learning methods
have been developed for learning differential equation models from data (via sparse identification of nonlinear dynamics; SINDy), but these tools struggle to cope with realistic levels of biological noise. Here
we propose a data-driven framework built upon previously developed hybrid formulation of ODE model,
in which neural networks are used to approximate latent dynamics. By learning the hybrid dynamical
model from data, we can effectively denoise the data and simultaneously learn latent dynamics. We show
that model learning by SINDy is greatly improved using the time derivatives interpolated from the fitted
latent dynamics. We also demonstrate the necessity of a robust model selection scheme for finding the
best-fit dynamical model when knowledge of the system of interest is limited. Overall, this work provides
a practical framework for discovering underlying mechanisms of biological systems and predicting future
states from noisy observations with at-best partial prior knowledge.
39
3.2 Introduction
Ordinary differential equations (ODEs) have long been used to model dynamical systems in biology, but
finding an ODE model that fits data is still challenging today. For over a century, ODE models have been
built manually upon previously acquired knowledge and then validated on data [1, 2, 3]. For example, to
learn rate constants between species in a chemical reaction network, we need to know the network topology beforehand, from which we can then write down a set of governing ODEs for the network according
to law of mass actions, and fit data to the ODEs [108]. It would be ideal to construct an ODE model directly from data with only limited prior knowledge and assumptions. Among other similar designations,
the problem of finding ODE models from data is often referred as model discovery or system identification [53, 51, 109, 50, 54]. Two classes of methods have been developed to tackle the difficult problem of
model discovery in recent years, namely symbolic regression and SINDy (sparse identification of nonlinear
dynamics).
A symbolic regression algorithm builds an expression tree by genetic programming, where the tree
nodes can be variables, operators, or functions [54, 55]. Recent symbolic regression methods, such as
AI Feynman, have incorporated techniques from machine learning [54, 55, 110]. The SINDy framework
assumes that the right-hand side of an ODE system can be expressed as the product of basis functions
and a coefficient matrix [53]. The goal of any SINDy algorithm is to find a sparse coefficient matrix that
fits given data. Many SINDy algorithms, such as STLSQ (sequentially thresholded least squares) and SR3
(sparse relaxed regularized regression), enforce sparsity by means of regularization and thresholding [53,
111, 112]. A variant of SINDy called Weak SINDy was introduced to recover partial differential equations
(PDEs), which is also applicable to ODEs [113, 114]. Both types of model discovery methods have practical
concerns due to their drawbacks. Symbolic regression methods can be computationally expensive for
larger dynamical systems. SINDy methods require derivatives of data as input, which are often not directly
40
observed and thus must be estimated. As in the case of biology, numerical estimation for derivatives of
noisy time series could be inaccurate, especially when noise is large [115].
Recent advancement in machine learning, especially in deep learning and automatic differentiation, has
opened up new ways to learn differential equations from data. Chen et al. [49] invented neural ordinary differential equations (neural ODEs) and showed that backpropagation could be performed on ODE solutions
using automatic differentiation. Raissi et al. [51] introduced physics-informed neural networks (PINNs)
which are solutions to partial differential equations (PDEs) in the form of neural networks. Rackauckas
et al. [50] coined the term universal differential equations (UDEs) that include all differential equations
that are entirely or partially defined by some universal approximators including neural networks. In the
case of ODEs, we can write the right-hand side as the sum of a closed-form function and a neural network,
where the closed-form function represents our prior knowledge of the underlying dynamical system and
the neural network approximates latent dynamics [50]. We will refer to such ODE system as a hybrid dynamical system. Rackauckas et al. [50] demonstrated a pipeline for recovering ODE models in closed form
starting from a hybrid dynamical system. First, the neural network in a hybrid dynamical model is learned
from data with appropriate objective function and numerical optimization algorithms [50]. An ODE model
is then recovered by SINDy from derivatives estimated by the learned hybrid dynamical model [50].
In this chapter, we will propose a robust method for recovering ODE models from noisy measurements.
Our method derives its robustness against generalization error from a straightforward yet effective model
selection scheme integrated into the aforementioned model recovery pipeline from Rackauckas et al. [50].
In this model selection scheme, noisy samples are first split into training samples and validation samples. Both hybrid dynamical models and SINDy models are then learned with different combinations of
hyperparameters from training samples and evaluated on validation samples. The trained hybrid dynamical models with lowest validation loss is used to approximate derivatives of training samples for SINDy.
41
ODE models recovered by SINDy are ranked by a metric called Akaike information criterion with correction (AICc), which evaluates the recovered models by their fitting error and complexity [116]. We will
show how our method can be implemented to recover ODE models from synthetic data for two biological
systems, assuming partial knowledge of respective systems, and compare recovered models with ground
truth.
3.3 Methods
We develop a method for recovering governing ODEs for dynamical systems from n noisy observations
X =
X(1), . . . , X(n)
sampled on a time span [0, T] (Figure 3.1A). We learn the derivatives of X with
a hybrid formulation of the underlying ODEs and recover the full closed-form ODEs from the learned
derivatives using SINDy. We perform grid search on hyperparameters when learning hybrid dynamical
models and recovering closed-form ODEs with SINDy.
Throughout this chapter, we will use lowercase letters for variables (e.g. x) and uppercase letters for
data (e.g. X). Subscripts are used for indices of multidimensional variables, data, and functions, for example, xi for the i-th element of x, and gi(x) for i-th component of function g(x). One exception is x0, which
is reserved for initial conditions. Superscripts with a pair of parentheses are labels for data, for example,
X(i)
for the i-th sample and X(true)
for true values.
3.3.1 Formulation of hybrid dynamical model
A hybrid dynamical model is defined as an ODE system of the form
x
′ = f(x) = g(x) + NN(x) (3.1)
where g(x) is a closed-form function and NN(x) is a neural network.
42
ODESolve( )
Loss
Backpropagation
Hidden layers
Input layer
Output layer
ODESolve( )
STLSQ
Sliced training data
Raw training data
Batch training data
B C
D
Raw training data
Batch training data
Data preprocessing Learned
hybrid dyamics
Select the best hybrid
dynamical model
Select the best
SINDy model
A
Figure 3.1: Pipeline for data-driven model discovery for dynamical systems with model selection.
A. Overview of the pipeline. Raw data is converted to batch training data for learning hybrid dynamical
models using different hyperparameters. Dynamics approximated by the best hybrid dynamical model is
used for recovering ODE models with SINDy. Recovered ODE models are evaluated on how well they fit
data. B. We slide through each sample in raw training data with a window to obtain samples on shorter
time span. Those samples are then assembled into batches of the same size. C. A hybrid dynamical model
is trained by simulating it on training time span and backpropagating through the loss between simulated
data and training data. D. The STLSQ algorithm recovers ODE models from dynamics simulated from
trained hybrid dynamical model.
43
3.3.2 Data generation
For a dynamical model with governing equations x
′ = f(x), we generate three types of synthetic samples
for a dataset: training, validation, and test. We first simulate the model from x0 using an ODE integration
method from SciPy and obtain noise-free time series X(true) on [0, T] with step size ∆t [103]. Each noisy
observation X(i)
is then generated according to noise type (additive or multiplicative) and noise level ϵ.
For additive noise, observation for state k at time tj is generated by X
(i)
j,k ∼ N
X
(true)
i,j , ϵX
(true)
·,k
where
X
(true)
·,k is the mean of state k for X(true)
; that is, the noise scale is constant at all time points for a state k.
For multiplicative noise, the same observation is generated by X
(i)
j,k ∼ N
X
(true)
i,j , ϵX(true)
i,j
; that is, the
noise scale of X
(i)
j,k depends on the noise-free value of state k at time tj only.
3.3.3 Data preprocessing for learning hybrid dynamical models
To facilitate learning on local segments with gradient-based optimization in Section 3.3.4, we convert raw
training samples X =
X(1), . . . , X(n)
into batches of shorter time series, denoted as Xe (Figure 3.1B) .
We slide through each X(i) with a window of size w and append data inside the window to Xe. All samples
in Xe are then shuffled and split into batches of size b before training.
3.3.4 Gradient-based learning of hybrid dynamical models
Given a hybrid dynamical model (Equations 3.1) implemented as a PyTorch module [117], we learn the
neural network approximator NN(x) from preprocessed training samples Xe (Figure 3.1C). The objective
here is to find weights of NN(x) that minimize the batch training loss. We iterate through each batch
in Xe, compute batch loss L from dynamics predicted by f(x), and update the weights of NN(x). More
specifically, for each sample Xe(i) on time points {tj , . . . , tj+w−1} in a batch, we simulate f(x) from initial condition Xe(i)
j,·
using the Dopri5 integrator from the torchode package [118], and obtain predicted
dynamics Xb(i) on the same time points. The mean squared error (MSE) between Xe(i)
and Xb(i)
is used as
44
the sample loss, and the mean loss of all samples in the batch is used as the batch loss L. We compute the
gradient of L with respect to the weights of NN(x) by performing backpropagation on the gradients of
intermediate tensors, which are are retained during the computation of L. We use the Adam algorithm to
update the weights of NN(x) from the gradient of L [119].
3.3.5 Model selection for hybrid dynamical models
To choose the best NN(x) that fits training data X, we perform grid search on three hyperparameters
used to learn NN(x): window size (w) and batch size (b) of Xe and learning rate of the Adam algorithm
(lr). That is, we determine a set of possible values for each hyperparameter and learn hybrid dynamical
models on every combination of hyperparameter values (w, b, lr). For each combination, we train the
model for 10 epochs such that all batches in Xe are learned once in one epoch. At the end of each epoch,
we evaluate the current model on validation samples and compute the loss between validation samples
and dynamics predicted by the trained model. More specifically, for each validation sample, we simulate
the trained hybrid dynamical model f(x) on its full time span and obtain predicted dynamics on the same
time points. We then compute the MSE between the data and the predicted dynamics as the validation loss
of the sample. The hybrid dynamical model with lowest mean validation loss is selected as the best model
for generating SINDy inputs in the next step (Section 3.3.6).
3.3.6 Sparse regression for model recovery from hybrid dynamics
We use a modified SINDy framework to recover the close-form expression of f(x) (Figure 3.1D). First, we
simulate trained dynamics f(x) with initial conditions from training samples on training time span [0, T]
with some step size ∆t to obtain SINDy input Xb. We then select a library of basis functions Θ(x) and find a
sparse matrix Ξb that best fits the regression model NN(Xb) = Θ(Xb)Ξ. We use PySINDy’s implementation
of multi-sample STLSQ to determine Ξb, which iteratively performs ridge regression on basis functions in
45
Θ(x) with nonzero coefficients and sets any coefficient to 0 once it drops below a threshold λ [53, 120,
121]. The full recovered ODE system is f(x) = g(x) + Θ(x)Ξb.
3.3.7 Model selection for sparse regression
To find the best ODE model recovered by our modified SINDy framework, we formulate a grid search
strategy that goes over the following set of hyperparameters: step size ∆t for Xb simulated from trained
hybrid dynamical model f(x), library of basis functions, regularization parameter α for STLSQ. For a
recovered model with k parameters, we compute its Akaike information criterion with correction (AICc)
on n validation samples [116]:
AICc = n ln(RSS/n) + 2k +
2(k + 1)(k + 2)
(n − k − 2) (3.2)
where RSS =
Pn
i=1
X(i) − Xb(i)
2
2
is the residual sum of squares between data X(i)
and model-predicted
dynamics Xb(i)
. We will refer to n ln(RSS/n) as the AIC term and the remaining term as the correction
term. The recovered model with lowest AICc is chosen as the best model for the dataset.
3.4 Results
3.4.1 Robust model selection scheme recovers dynamical models using a hybrid formulation
For a dynamical model x
′ = f(x) such that x ∈ R
d
and the complete closed-form expression of f(x) is
not known, we propose a hybrid formulation of the model defined in Equation 3.1:
x
′ = f(x) = g(x) + NN(x)
46
In the above equation, g : R
d → R
d
is a function with known closed-form expression, and NN : R
d → R
d
is a neural network for latent dynamics which is to be learned from data. In the case that no terms in f(x)
are known, we simply have g(x) = 0 and learn the entire right-hand side with NN(x). To learn NN(x)
from raw training samples X, we first transform X into batch samples Xb (Section 3.3.3; Figure 3.1B). We
then learn NN(x) from Xb by minimizing batch loss between hybrid dynamics predicted by f(x) and data
(Section 3.3.4; Figure 3.1C). Having learned NN(x), we run the STLSQ algorithm on the SINDy regression
model NN(x) = Θ(x)Ξ to find a sparse coefficient matrix Ξb (Section 3.3.6; Figure 3.1D). The recovered
ODE model is given by x
′ = g(x) + Θ(x)Ξb.
There are a number of factors in the model recovery process that can affect the final recovered ODEs,
including hyperparameters in gradient-based learning of hybrid dynamical models and SINDy learning.
Here, we demonstrate our model selection scheme which addresses those factors, using the Lotka-Volterra
model as an example. The Lotka-Volterra model is a classic dynamical model for predator-prey relationship
in an ecosystem (Figure 3.2A), described by the following equations [1, 2]:
x
′
1 =αx1 − βx1x2
x
′
2 =γx1x2 − δx2
(3.3)
In Equations 3.3, x1 is the population size of the prey and x2 is the population size of the predators. We
generated datasets with true parameters {α, β, γ, δ} = {1.3, 0.9, 0.8, 1.8} and initial conditions x0 =
[0.4425, 4.6281] (Section 3.3.2). Given those parameter values and initial conditions, the system is oscillatory with a stable limit cycle (Figure 3.2B). Each dataset included 200 training samples, 50 validation
samples, and 50 test samples. Both training samples and validation samples were generated on t = [0, 4]
with step size ∆t = 0.1; test samples were generated on t = [0, 20] with step size ∆t = 0.1. All samples
in a dataset were generated with the same noise type (additive or multiplicative) at the same noise level
(0.1%, 0.5%, 1%, 5%, 10%, or 20%). Figure 3.2C–D show an overview of training samples with 5% additive
47
0 5 10 15 20
0
2
4
6
8
A
B
0 1 2 3 4
0
2
4
0 1 2 3 4
0
2
4
6
C D
Population size
Time Time Time
True data Additive noise — 5% Multiplicative noise — 5%
Prey Predator
Prey Predator
0 1 2 3 4
0
2
4
6
8
Hybrid dynamics
(non-optimal fit)
0 1 2 3 4
0
2
4
E F Hybrid dynamics
(best fit)
Time Time
Population size
Figure 3.2: Data generated to study the Lotka-Volterra model and hybrid dynamical model fits.
A. The Lotka-Volterra describes predator-prey relationship in an ecosystem. B. The population sizes of
both predators and prey oscillate with stable limit cycle. C–D. In our synthetic datasets, we add either
additive or multiplicative noise with different noise scales. Mean trajectories of all samples in each dataset
are shown with bands for three times the standard deviation. E. An example of a trained hybrid model
with high validation loss. F. Trained hybrid model with lowest validation loss fits data well.
48
Learning step Hyperparameter Values/options
Hybrid dynamical
model
Window size (data preprocessing) 5, 10
Batch size (data preprocessing) 5, 10, 20
Learning rate (Adam optimization) 0.001, 0.01, 0.1
SINDy
Step size (for generating Xb) 0.05, 0.1
Basis library poly_max_2, poly_2_3
α (STLSQ regularization parameter) 0.05, 0.1, 0.5, 1, 5, 10
λ (STLSQ coefficient threshold) 0.1
Table 3.1: Model selection configuration for Lotka-Volterra data. Refer to Section 3.4.1 for descriptions of basis libraries.
noise and 5% multiplicative noise, respectively. The same model was studied in Rackauckas et al. [50] to
show how hybrid formulation could be used to approximate derivatives for SINDy, but there were two
key differences. First, we sought to characterize noisy dynamics for large datasets containing multiple
trajectories rather than individual samples. Second, we used a model validation and selection scheme to
find models that would generalize for both seen and unseen data.
Our model selection strategy consists of two stages. In the first stage, we determine the best NN(x)
in Equation 3.1 that fits data (Section 3.3.5). For the Lotka-Volterra model, we assumed that only the growth
terms were known and other terms were to be learned by NN : R
2 → R
2
:
x
′
1 =αx1 + NN1(x)
x
′
2 = − δx2 + NN2(x)
(3.4)
The neural network NN(x) in Equations 3.4 had an input layer of length 2 and an output layer of length
2. Between those two layers, we only added one fully-connected hidden linear layer of 8 neurons to avoid
overfitting due to model complexity. For each dataset, we trained NN(x) from samples preprocessed with
different window sizes and batch sizes using different learning rates, as described in Section 3.3.5. The
hyperparameter values used for training are specified in Table 3.1. Training results varied by the combinations of hyperparameter values. Using the dataset with 1% additive noise as an example, most hybrid
49
dynamical models trained with 0.001 learning rate would have high validation loss and underfit training data (Figure 3.2E). For the same dataset, hybrid dynamical models trained with other learning rates
could attain low validation loss and fit training data very well (Figure 3.2F). The same observation was
made for most other datasets (except a few with very high noise): high validation loss usually resulted in
poor approximation of data, whereas low validation loss usually led to decent approximation. We examined hyperparameter values used to attain lowest validation loss for all datasets and found that the best
hyperparameter combination varied by datasets (Table B.1). The results here suggest that the choice of
hyperparameter values should not be arbitrary. Therefore, our model selection strategy was effective in
finding better hybrid dynamical models and spared us from randomly guessing which hyperparameter
values to use when initializing training of NN(x).
50
Noise type Noise level Lowest AICc for
given noise model Recovered model
Additive 0.1% −368.5463682
x
′
1 =1.3x1 − 0.907x1x2
x
′
2 =−1.8x2 + 0.804x1x2
Additive 0.5% −365.3647057
x
′
1 =1.3x1 − 0.911x1x2
x
′
2 =−1.8x2 + 0.780x1x2
Additive 1% −367.6000685
x
′
1 =1.3x1 − 0.906x1x2
x
′
2 =−1.8x2 + 0.781x1x2
Additive 5% −216.0545979
x
′
1 =1.3x1 − 0.893x1x2
x
′
2 =−1.8x2 + 0.814x1x2
Additive 10% −205.9683278
x
′
1 =1.3x1 + 0.206x
2
2 − 0.842x1x2 − 1.044x1x
2
2
x
′
2 =−1.8x2 + 0.924x1x2
Additive 20% −138.8480075
x
′
1 =1.3x1 + 0.199x
2
2 − 0.525x
2
1x2 − 1.707x1x
2
2
x
′
2 =−1.8x2 + 0.798x1x2 + 0.237x1x
2
2
Multiplicative 0.1% −371.6843486
x
′
1 =1.3x1 − 0.889x1x2
x
′
2 =−1.8x2 + 0.826x1x2
Multiplicative 0.5% −392.8762013
x
′
1 =1.3x1 − 0.910x1x2
x
′
2 =−1.8x2 + 0.761x1x2
Multiplicative 1% −208.0347844
x
′
1 =1.3x1 − 0.935x1x2
x
′
2 =−1.8x2 + 0.814x1x2
Multiplicative 5% −137.2343019
x
′
1 =1.3x1 − 0.836x1x2
x
′
2 =−1.8x2 + 0.846x1x2
Multiplicative 10% −91.80296946
x
′
1 =1.3x1 − 0.884x1x2 − 0.333x
2
1x2
x
′
2 =−1.8x2 + 0.883x1x2
Multiplicative 20% −19.96711175
x
′
1 =1.3x1 + 0.127x1 − 1.194x1x2
x
′
2 =−1.8x2 + 0.502x2 + 0.247x1x2 − 0.198x
2
2
Table 3.2: Best (according to AICc) ODE models recovered from Lotka-Volterra datasets using
hybrid formulation. Terms colored in green are known terms in Equations 3.4.
51
0 5 10 15 20
0.0
2.5
5.0
7.5
0 5 10 15 20
0.0
2.5
5.0
7.5
10.0
0 5 10 15 20
0.0
2.5
5.0
7.5
G Additive noise — 10% H Multiplicative noise — 10%
0 5 10 15 20
0.0
2.5
5.0
7.5
10.0
0 5 10 15 20
0
2
4
6
8
D Multiplicative noise — 1% E Multiplicative noise — 5% F Multiplicative noise — 10%
Time Time Time
Population size
Population size
Time Time
x1 = 1.3x1 − 0.935x1x2
x2 = −1.8x2 + 0.814x1x2
x1 = 1.3x1 − 0.836x1x2
x2 = −1.8x2 + 0.846x1x2
x1 = 1.3x1 − 0.884x1x2 − 0.333x2
1x2
x2 = −1.8x2 + 0.883x1x2
x1 = 1.3x1 − 0.841x1x2
x2 = −1.8x2 + 0.924x1x2
x1 = 1.3x1 − 1.063x1x2
x2 = −1.8x2 + 0.877x1x2
Prey (predicted) Predator (predicted)
Prey (test) Predator (test) End of training time
0 5 10 15 20
0
2
4
6
8
0 5 10 15 20
0.0
2.5
5.0
7.5
0 5 10 15 20
0
2
4
6
8
A Additive noise — 1% B Additive noise — 5% C Additive noise — 10%
Time Time Time
Population size
x1 = 1.3x1 − 0.906x1x2
x2 = −1.8x2 + 0.781x1x2
x1 = 1.3x1 − 0.893x1x2
x2 = −1.8x2 + 0.814x1x2
x1 =1.3x1 + 0.206x2
2 − 0.842x1x2
− 1.044x1x2
2
x2 = − 1.8x2 + 0.924x1x2
Figure 3.3: ODE models recovered from Lotka-Volterra datasets. A–C. Recovered models with lowest
AICc for datasets with additive noise. D–F. Recovered models with lowest AICc for datasets with multiplicative noise. G. Recovered model with correct terms but not lowest AICc for 10% additive noise. H.
Recovered model with correct terms but not lowest AICc for 10% multiplicative noise.
In the second stage of our model selection strategy, we compare ODE models recovered by SINDy
(Section 3.3.7). We opt for a metric called AICc (Akaike information criterion with correction; Equation 3.2)
for evaluating recovered models, which estimates error of model prediction with a penalty term based on
52
the number of model parameters [116]. In general, a model with lower AICc is better than one with higher
AICc. For the Lotka-Volterra model, we considered three hyperparameters. The first hyperparameter
was the step size for generating SINDy training samples Xb from trained hybrid dynamical model f(x).
Note that generating samples with arbitrary step size would only be possible because a hybrid dynamical
model was trained — methods like finite difference could compute derivatives only on given time points.
The second hyperparameter was the library of basis functions. Since simple interaction terms in such
models can be represented as polynomials, we constructed two basis libraries with polynomial terms only.
One basis library included all polynomial terms up to order 2, which will be denoted as poly_max_2.
Since the first-order terms were already known in the hybrid formulation, the other basis library included
polynomial terms of orders 2 and 3 only, which will be denoted as poly_2_3. The third hyperparameter
was the regularization parameter α for the STLSQ algorithm. Table 3.1 shows the detailed configuration
for all three hyperparameters. We used the same value for another STLSQ hyperparameter λ, which is the
threshold for minimum nonzero coefficients for basis functions.
Given the model selection configuration, 24 ODE models were recovered (counting duplicates) from
SINDy. We considered a recovered model correct if it had all correct terms in Equations 3.3 and no extra
terms; parameter values did not have to be exactly correct. For datasets of both additive and multiplicative
noises up to 5% noise level, the ODE models with lowest AICc were all correct (Table 3.2). Those models
could be used to accurately predict future states for test data on t = [0, 20] (Figure 3.3A–B, D–E). For 10%
and 20% noise levels, extra terms were recovered in models with lowest AICc (Table 3.2); those models failed
to extrapolate beyond t = 4 (Figure 3.3C, F). The ODEs with lowest AICc for 10% multiplicative noise was
stiff and we had to terminate simulation early (Figure 3.3F), even though the ground truth (Equations 3.3)
was not stiff given the true parameter values and initial conditions. Nonetheless, correct models were still
successfully recovered at 10% noise level (Table B.2). Those models was able to predict stable oscillation
beyond t = 4, albeit with inaccurate periodicity and amplitude (Figure 3.3G–H). We saw that, for some
53
datasets, the best models were recovered from hybrid dynamics simulated at the smaller step size (∆t =
0.05), which shows that training hybrid dynamical model indeed allowed us to obtain better results as it
enabled data interpolation with arbitrary step size (Table B.3).
3.4.2 Prior knowledge incorporated into hybrid formulation improves SINDy learning
We next studied whether the hybrid formulation with partial knowledge of dynamical systems has an
advantage for SINDy learning compared to methods that do not leverage any knowledge. The baseline
method for the comparison was the base SINDy method provided PySINDy, which takes in training data
X and estimates the derivatives X′ by finite differences [120, 121]. We applied our model selection strategy
to the baseline method by searching over different basis libraries and STLSQ regularization parameters,
but we could not try different step sizes as base SINDy does not interpolate input data. In addition to the
baseline method, all datasets were also learned with a pure neural network formulation, i.e. x
′ = f(x) =
NN(x), using the model selection configuration specified in Table 3.1.
We compared the best correct models recovered by each method, i.e. correct models with lowest AICc.
For the baseline method, correct ODE models were recovered for datasets with additive noise at all noise
levels and datasets with multiplicative noise up to 1% noise level (Table 3.3, Table B.4). Interestingly, the
baseline method was the only method that could recover correct model for 20% additive noise. For the pure
neural network formulation, correct ODE models were recovered for datasets with additive noise at 4 noise
levels (0.1%, 1%, 5%, 10%) and multiplicative noise up to 1% (Table 3.3, Table B.5). The hybrid formulation
achieved the best performance overall. It successfully recovered correct models for 10 datasets out of 12,
which was more than either of the other two methods (Table 3.3). In particular, the hybrid formulation
approach was able to handle higher noise levels for both additive and multiplicative noises, whereas the
other two methods failed to recover correct models for datasets with 5% and 10% multiplicative noises.
In terms of the quality of recovered models, the best correct models recovered using hybrid formulation
54
Noise type Noise level Base SINDy Pure NN Hybrid
Additive 0.1% −309.3006393 −432.5755145 −368.5463682
Additive 0.5% −299.044635 N/A −365.3647057
Additive 1% −283.3833402 −299.504492 -367.6000685
Additive 5% −178.2411226 −39.08409104 −216.0545979
Additive 10% −104.6566267 −147.4397894 −145.6818318
Additive 20% −26.46272851 N/A N/A
Multiplicative 0.1% −297.8189774 −332.4479059 −371.6843486
Multiplicative 0.5% −254.144023 −318.4897128 −392.8762013
Multiplicative 1% −214.856131 −81.38255749 −208.0347844
Multiplicative 5% N/A N/A −137.2343019
Multiplicative 10% N/A N/A −58.92769301
Multiplicative 20% N/A N/A N/A
Table 3.3: Lowest AICcs of correct models recovered from Lotka-Volterra datasets using different
methods. Base SINDy estimates derivatives of data by finite differences. Pure NN approximates the entire
right-hand side of an ODE system by a neural network. Hybrid refers to the hybrid formulation specified
by Equations 3.4. Number in bold is the lowest AICc of correct models across all methods for the same
dataset.
55
had the lowest AICcs or close to the lowest for 9 out of 12 datasets (Table 3.3). This suggests that hybrid
formulation could deliver favorable results more consistently.
3.4.3 Model selection enables discovery of dynamical models from complex dynamics
for biological networks
In previous sections, we showed that our proposed method could enhance the results of SINDy learning
for a dynamical system of two states with simple interactions between them. Here, we will demonstrate
how the method can be extended to recover a biological network called the repressilator system with
more complex dynamics [4]. There are three proteins in the repressilator system, which form a negative
feedback loop that can be described by the following equations (Figure 3.4A):
x
′
1 =
β
1 + x
n
3
− x1
x
′
2 =
β
1 + x
n
1
− x2
x
′
3 =
β
1 + x
n
2
− x3
(3.5)
The i-th equation in Equations 3.5 characterizes the rate of change of xi
, the concentration of protein i.
This rate reflects the combined effect of inhibition of protein i by another protein, described by a Hill
function of order n, and self-degradation, described by a linear term. For the rest of the section, we will
study the repressilator system with true parameters n = 3, β = 10 and initial conditions x0 = [1, 1, 1.2].
In this system, the concentrations of all proteins would oscillate with increasing amplitude until reaching
limit cycle (Figure 3.4B). We shall see whether it would be possible to recover Equations 3.5 from data on
t = [0, 10], an time interval before the limit cycle.
We began our attempt to recover Equations 3.5 with base SINDy and ignored hybrid formulation and
model selection for the moment. We assumed that we had the prior knowledge that the degradation term
for each protein is linear and the only possible interaction between a pair of proteins is inhibition. Such
56
Protein
1
Protein
2
Protein
3
0 10 20 30
2
4
A B
0.0 2.5 5.0 7.5 10.0
1
2
3
4
0.0 2.5 5.0 7.5 10.0
1
2
3
4
D E
Concentration
Time
Time Time
True data
Additive noise — 5% Multiplicative noise — 5%
Protein 1 Protein 2 Protein 3 Concentration
0 10 20 30
2
4
6
x
1 = − 0.909x1 − 1.595/(1 + x3)+3.756/(1 + x2
3)+6.967/(1 + x3
3)
x
2 = − 0.816x2 − 0.889/(1 + x3)+1.419/(1 + x2
1)+8.051/(1 + x3
1)
x
3 = − 0.899x3 − 3.022/(1 + x2)+7.391/(1 + x2
2)+4.610/(1 + x3
2)
Noise-free data
Concentration
Protein 1 (test) Protein 2 (test) End of training time
Protein 1 (predicted) Protein 2 (predicted) Protein 3 (predicted)
Protein 3 (test)
Protein 1
Protein 2
Protein 3
C
Time
Figure 3.4: Data generated to study the repressilator model and base SINDy fits. A. The repressilator
system is a negative feedback loop consisting of three proteins. B. The concentrations of the three proteins
oscillates and eventually reach a stable limit cycle. C. Base SINDy fails to recover correct ODE model from
noise-free data on t = [0, 10]. D–E. In our synthetic datasets, we add either additive or multiplicative
noise with different noise scales. Mean trajectories of all samples in each dataset are shown with bands
for three times the standard deviation.
57
inhibition is characterized by a Hill function 1/(1 + u
n
) of order n such that n is a constant for the
system and n ∈ {1, 2, 3}. However, for any two proteins i and j, we did not know whether one inhibits
the other. If one of them does inhibit the other, we were uncertain whether i inhibits j, or j inhibits i,
or both. Under our assumptions, the minimal basis library for base SINDy would include Hill functions
{1/(1 + xi), 1/(1 + x
2
i
), 1/(1 + x
3
i
)} and linear functions {xi} for i = 1, 2, 3. In total, there were 12
basis functions in the basis library and 36 coefficients to be determined. When we considered performing
the STLSQ algorithm with this basis, we realized that it was difficult to find a proper value for λ, the
threshold for basis coefficients. We could not set λ to a value close or greater than 1, since the linear
terms would either be eliminated or have coefficients much larger than the true value −1 in magnitude.
However, if λ ≪ 1, the recovered ODE models would admit some extra Hill terms with wrong orders
and small coefficients. For instance, we ran STLSQ with λ = 0.5 on noise-free data sampled at every 0.2
seconds on t = [0, 10]. The recovered model had Hill terms of orders n = 1 and n = 2 and thus could not
accurately predict future states (Figure 3.4C). If we had repeated the experiment for noisy data, the results
would have not been better.
Our proposed method would address the above issue of base SINDy for the repressilator system. We
could first separate the linear terms in a hybrid formulation where NN : R
3 → R
3
is a neural network to
be learned from data:
x
′
1 =NN1(x) − x1
x
′
2 =NN2(x) − x2
x
′
3 =NN3(x) − x3
(3.6)
Once the linear terms are separated, the choice of λ would no longer affect them, since we only needed to
recover the Hill terms from latent dynamics approximated by NN(x). Thus, at the SINDy step, we could
use a basis library that only includes Hill functions {1/(1 + xi), 1/(1 + x
2
i
), 1/(1 + x
3
i
)}. There was still
58
Learning step Hyperparameter Values/options
Hybrid dynamical
model
Window size (data preprocessing) 5, 10
Batch size (data preprocessing) 5, 10, 20
Learning rate (Adam optimization) 0.001, 0.01, 0.1
SINDy
Step size (for generating Xb) 0.1, 0.2
Basis library hill_1, hill_2, hill_3, hill_max_3
α (STLSQ regularization parameter) 0.05, 0.1, 0.5, 1.0, 5.0, 10.0
λ (STLSQ coefficient threshold) 1.0
Table 3.4: Model selection configuration for repressilator data. Refer to Section 3.4.3 for descriptions
of basis libraries.
one potential issue, though, with this basis library: the assumption that all Hill terms have the same order
n could not be enforced as Hill terms of different orders might appear in the recovered model. Our model
selection scheme for SINDy (Section 3.3.7) would resolve this issue by considering models recovered using
basis libraries of Hill functions of the same order. We will use hill_n for a basis library of Hill functions
of order n only and hill_max_n for a basis library of all Hill functions up to order n.
Now we present the full recipe for recovering Equations 3.5 using our method as an alternative to base
SINDy. We first learned the neural network approximator NN(x) in Equations 3.6, the hybrid dynamical
model for the repressilator system. Between the input layer and output layer of NN(x), we added two fullyconnected linear layers of 8 neurons each. To find the best NN(x) that fits data, we trained NN(x) with
various learning rates from data preprocessed with different window sizes and batch sizes (Section 3.3.5;
Figure 3.1B–C). As discussed earlier, we used basis libraries of Hill functions of a single order or mixed
orders for SINDy model selection (Section 3.3.7; Figure 3.1D). In addition, we generated SINDy input Xb
from NN(x) with different step sizes and ran the STLSQ algorithm with threshold λ = 1.0 and different
values of the regularization parameter α. The detailed configuration is shown in Table 3.4.
To assess how well our method can recover the repressilator model from data, we generated datasets
with additive or multiplicative noise at 6 different noise levels (0.1%, 0.5%, 1%, 5%, 10%, 20%) using the
data generation method described in Section 3.3.2 (see examples for 5% noise level in Figure 3.4D–E). Each
59
dataset had 200 training samples and 50 validation samples on t = [0, 10], as well as 50 test samples on
t = [0, 30], all sampled with step size ∆t = 0.2. Similar to the Lotka-Volterra example, we consider a
recovered ODE model correct if it had all terms in Equations 3.5 and no other terms; parameter values did
not have to be exactly correct. Our method worked well for low noise datasets. For additive noise up to
1% noise level and multiplicative noise up to 5% noise level, recovered ODE models with lowest AICc were
all correct (Table 3.5) and could accurately predict future states (Figure 3.5A–B, D–E).
For all other datasets, the models with lowest AICc did not correctly recover Equations 3.5. Those
models could not extrapolate beyond training time span (Figure 3.5CF). We surveyed the top 10 recovered models according to AICc for each of those datasets, as listed in Tables B.6, B.7, B.8, B.9, B.10. For
additive noise at 5% and 10% noise levels and multiplicative noise at 10% levels, we were actually able to
recover correct ODE models, including Models 4 and 5 in Table B.6, Models 6 and 7 in Table B.7, Models
7 and 8 in Table B.8. Trajectories simulated from those models showed oscillation similar to the true data
(Figure 3.5GH). At 20% noise level, we were not able to find any correct model for both additive and multiplicative noises. However, recovered models with relatively low AICc still learned some aspects of the
true model. Some recovered models learned the correct order of Hill terms, i.e. n = 3, but had one extra
term (Model 4, 5 in Table B.9, Model 7, 8 in Table B.10). In those models, correct Hill terms had coefficients
close to the true value β = 10, and extra terms all had coefficients small in magnitude. Models with wrong
order of Hill terms may still be used for recovering the repressilator network: if a Hill function 1/(1 +x
n
i
)
appears in the equation for x
′
j
, then we say that protein i inhibits j regardless of the value of n. As an
example, Model 1 in Table B.9 recovered the correct network topology (protein 1 protein 2 protein
3 protein 1), although it was obtained from a basis library of incorrect Hill order (hill_2). For some
models with wrong order of Hill terms, the reconstructed networks would have extra edges, such as Model
1 in Table B.10. Again, in those models, the Hill terms corresponding to existing edges in the true network
60
Noise type Noise level Lowest AICc for
given noise model Recovered model
Additive 0.1% −278.8432192
x
′
1 =10.017/(1 + x
3
3)−x1
x
′
2 =9.988/(1 + x
3
1)−x2
x
′
3 =10.223/(1 + x
3
2)−x3
Additive 0.5% −212.6177866
x
′
1 =9.981/(1 + x
3
3)−x1
x
′
2 =9.948/(1 + x
3
1)−x2
x
′
3 =9.821/(1 + x
3
2)−x3
Additive 1% −145.2121059
x
′
1 =10.358/(1 + x
3
3)−x1
x
′
2 =10.012/(1 + x
3
1)−x2
x
′
3 =10.158/(1 + x
3
2)−x3
Additive 5% −23.58369235
x
′
1 = − 2.152/(1 + x
2
2) + 9.303/(1 + x
2
3)−x1
x
′
2 =7.755/(1 + x
2
1)−x2
x
′
3 = − 1.491/(1 + x
2
1) + 10.563/(1 + x
2
2) − 1.897/(1 + x
2
3)−x3
Additive 10% −24.58871036
x
′
1 = − 2.129/(1 + x
2
2) + 9.345/(1 + x
2
3)−x1
x
′
2 =9.502/(1 + x
2
1) − 2.427/(1 + x
2
3)−x2
x
′
3 = − 2.177/(1 + x
2
1) + 10.042/(1 + x
2
2)−x3
Additive 20% −16.96979235
x
′
1 =7.800/(1 + x
2
3)−x1
x
′
2 =7.514/(1 + x
2
1)−x2
x
′
3 =7.401/(1 + x
2
2)−x3
Multiplicative 0.1% −330.8308192
x
′
1 =10.150/(1 + x
3
3)−x1
x
′
2 =9.986/(1 + x
3
1)−x2
x
′
3 =10.133/(1 + x
3
2)−x3
Multiplicative 0.5% −271.4031143
x
′
1 =9.931/(1 + x
3
3)−x1
x
′
2 =9.944/(1 + x
3
1)−x2
x
′
3 =9.876/(1 + x
3
2)−x3
Multiplicative 1% −172.8338723
x
′
1 =9.841/(1 + x
3
3)−x1
x
′
2 =9.464/(1 + x
3
1)−x2
x
′
3 =9.705/(1 + x
3
2)−x3
Multiplicative 5% −66.13494624
x
′
1 =10.560/(1 + x
3
3)−x1
x
′
2 =9.824/(1 + x
3
1)−x2
x
′
3 =10.386/(1 + x
3
2)−x3
Multiplicative 10% −25.74424163
x
′
1 = − 1.873/(1 + x
2
2) + 9.551/(1 + x
2
3)−x1
x
′
2 =10.506/(1 + x
2
1) − 1.660/(1 + x
2
2) − 1.610/(1 + x
2
3)−x2
x
′
3 = − 1.821/(1 + x
2
1) + 10.355/(1 + x
2
2) − 1.087/(1 + x
2
3)−x3
Multiplicative 20% −13.97239256
x
′
1 = − 1.887/(1 + x
2
2) + 9.666/(1 + x
2
2)−x1
x
′
2 =8.569/(1 + x
2
1)−x2
x
′
3 = − 3.325/(1 + x
2
1) + 11.144/(1 + x
2
2)−x3
Table 3.5: Best (according to AICc) ODE models recovered from repressilator datasets using hybrid
formulation. Terms colored in green are known terms in Equations 3.6.
61
x
1 = − 2.129/(1 + x2
2)
+ 9.345/(1 + x2
3) − x1
x
2 =9.502/(1 + x2
1)
− 2.427/(1 + x2
3) − x2
x
3 = − 2.177/(1 + x2
1)
+ 10.042/(1 + x2
2) − x3
0 10 20 30
2
4
6
0 10 20 30
0
2
4
6
G Additive noise — 10% H Multiplicative noise — 10%
0 10 20 30
2
4
0 10 20 30
2
4
6
0 10 20 30
2
4
D E
F
Multiplicative noise — 0.1% Multiplicative noise — 1%
Multiplicative noise — 10%
Time Time
Time
Concentration Concentration
Time Time
x
1 =10.150/(1 + x3
3) − x1
x
2 =9.986/(1 + x3
1) − x2
x
3 =10.133/(1 + x3
2) − x3
x
1 =9.841/(1 + x3
3) − x1
x
2 =9.464/(1 + x3
1) − x2
x
3 =9.705/(1 + x3
2) − x3
x
1 = − 1.873/(1 + x2
2)
+ 9.551/(1 + x2
3) − x1
x
2 =10.506/(1 + x2
1)
− 1.660/(1 + x2
2)
− 1.610/(1 + x2
3) − x2
x
3 = − 1.821/(1 + x2
1)
+ 10.355/(1 + x2
2)
− 1.087/(1 + x2
3) − x3
x
1 =10.404/(1 + x3
3) − x1
x
2 =9.797/(1 + x3
1) − x2
x
3 =10.722/(1 + x3
2) − x3
x
1 =11.224/(1 + x3
3) − x1
x
2 =10.628/(1 + x3
1) − x2
x
3 =10.740/(1 + x3
2) − x3
0 10 20 30
2
4
0 10 20 30
0
2
4
6
0 10 20 30
2
4
A Additive noise — 0.1% B Additive noise — 1% C Additive noise — 10%
Time Time Time
Concentration
x
1 =10.017/(1 + x3
3) − x1
x
2 =9.988/(1 + x3
1) − x2
x
3 =10.223/(1 + x3
2) − x3
x
1 =10.358/(1 + x3
3) − x1
x
2 =10.012/(1 + x3
1) − x2
x
3 =10.158/(1 + x3
2) − x3
Protein 1 (test) Protein 2 (test) End of training time
Protein 1 (predicted) Protein 2 (predicted) Protein 3 (predicted)
Protein 3 (test)
Figure 3.5: ODE models recovered from repressilator datasets. A–C. Recovered models with lowest
AICc for datasets with additive noise. D–F. Recovered models with lowest AICc for datasets with multiplicative noise. G. Recovered model with correct terms but not lowest AICc for 10% additive noise. H.
Recovered model with correct terms but not lowest AICc for 10% multiplicative noise.
62
(protein 1 protein 2 protein 3 protein 1) had coefficients closer to the true value β = 10 and extra
Hill terms had much smaller coefficients.
3.5 Discussion
In this chapter, we proposed a robust data-driven method for recovering ODE models. Our method was
built upon available methods for learning dynamical models with neural networks and recovering closedform expression of ODEs. Similar to the pipeline in Rackauckas et al. [50], there are two main steps in
our method. In step one, we learn the right-hand side of a hybrid dynamical model from data, which is
partially represented by a neural network approximator. In step two, we reconstruct the symbolic form of
the neural network approximator using SINDy. Unlike the pipeline in Rackauckas et al. [50] which only
learns one model per sample, our method is capable of generating a single model from multiple samples.
By integrating model validation and selection into each step, we address practical concerns in this datadriven model discovery pipeline: how do we make sure that the recovered ODE model is as close to the true
underlying model as possible without overfitting the given data? That is, we discover models that could
generalize well on seen and unseen data by systematically searching within possible hyperparameter space.
To avoid overfitting, we compare models using metrics evaluated on validation data separate from training
data. We use AICc (Equation 3.2) for ODE models recovered by SINDy, which favors models that fit data
well and are meanwhile parsimonious.
We applied our method to two dynamical models from biology, namely the Lotka-Volterra model and
the repressilator model. Limited prior knowledge was incorporated into different parts of the actual implementation of our method. In the hybrid dynamical models (Equations 3.4 and Equations 3.6), only simple
growth and decay terms were assumed to be known while the interaction terms of each model were inferred. At the SINDy step, system-informed restrictions on basis libraries were made. For datasets with
5% or lower noise, the models with lowest AICc would usually be the correct models for both examples.
63
Although those models were recovered from data on relatively short time spans, they were able to accurately predict future states. At 10% noise level, correct models were still recovered, but there were incorrect
models with lower AICc. Correct models were not found at 20% noise level, but we were still able to learn
certain aspects of the true models from incorrect models.
There are many possible ways to manage data noise without changing core learning steps in our
pipeline. Before training a hybrid dynamical model, we could have denoised data first. We could also
seek better ways to evaluate learned models. A possible reason why correct models had higher AICcs than
some incorrect models for 10% noise datasets is that the AIC term in Equation 3.2 grows with noise level
but the correction term is insensitive to noise. Therefore, we will need to formulate a better correction
term that takes noise level into account. We should note, though, that model validation and selection do
not specifically elevate the performance of an algorithm against noise. After all, the primary goal of our
model selection scheme is to identify models with low generalization error.
Apart from worse performance for higher noise levels, there are several other issues that we would like
to point out here. A notable one is the computational cost of grid search of hyperparameters, as computational time grows exponentially with the number of hyperparameters to be searched. We can use existing
methods for hyperparameter optimization that do not exhaustively search over hyperparameter space,
which may require less computational time than our grid search implementation does [122, 123]. Those
methods work not only for continuous hyperparameters but also for discrete and categorical ones [122,
123]. Thus, they should be applicable to model selection for hybrid dynamical models and SINDy models.
We also saw that in Section 3.4.2 that results from hybrid dynamics were significantly better than results from methods that did not incorporate prior knowledge, although extensive search in hyperparameter
space was carried out for all those methods. Therefore, the use of model selection does not compensate
the absence of prior knowledge. We have not explored the full range of possible neural network architectures, nor have we tested alternative SINDy implementations (e.g. Weak SINDy) or SINDy optimizers
64
(e.g. SR3) [113, 114, 111, 112], which may learn well with less knowledge. We could add neural network
architectures and SINDy algorithms as categorical hyperparameters in model selection, but again, having
more hyperparameters would increase computational time considerably.
We noticed a few issues that were not emphasized in many previous works on SINDy-based data-driven
model discovery. Many SINDy algorithms, such as STLSQ and SR3, were tested on data with Gaussian noise
only [53, 111, 112]. One benefit of using zero-mean Gaussian noise is that the term for fitting error in an
objective function can be as simple as theL2 norm. However, such objective function may not be unsuitable
for data with asymmetric noise. Since we used the STLSQ algorithm for SINDy, we also generated datasets
with zero-mean Gaussian noise only. It would be interesting to learn noise explicitly without assuming
the distribution of noise. Kaheman et al. [124] proposed a SINDy-based method that learns noise as a
parameter on individual time points and can recover ODE models from asymmetric noise. However, the
size of the noise parameter is equal to the number of time points and would thus grow in size with the
number of time points in training data. A possible future work is to approximate the noise distribution with
Bayesian or variational inference so that the space complexity for learning noise is constrained. We also
found that many ODE models recovered by SINDy, despite fitting training data closely, had unexpected
properties. For one dataset, the recovered model with lowest AICc was stiff outside training time span
(Figure 3.3F), but the true model could be simulated with a non-stiff ODE solver on any time span.
Data-driven learning of ODE models is an exciting subject of study in that computational techniques
from many relevant fields are introduced, be it scientific computing or machine learning or dynamical
systems. In this chapter, we have presented a framework for inferring ODE model from noisy time series
data observed from biological systems, along with a comprehensive evaluation scheme which via model
selection allows us to select models that explain biological mechanisms and predict long-term evolution.
We have shown that incomplete models can be fully recovered with only limited prior knowledge, provided
that the knowledge is properly utilized during the model recovery processes. We hope that our work
65
would motivate further development of data-driven model discovery methods that focus on robustness
and knowledge integration, which in turn would accelerate finding of biological mechanisms yet to be
understood.
66
Chapter 4
Conclusions and future works
Data-driven learning of dynamical systems has been an interesting topic in computational biology for a
long time and revitalized by increasing amount of data and newly emerging computational methods in
the last decade. In this work, we have presented two frameworks for learning dynamical systems from
different types of biological data, built on methods developed recently. Both frameworks were enhanced
with the incorporation of extra knowledge.
4.1 Future works for Bayesian parameter inference for large ordinary
differential equation systems
In Chapter 2, we demonstrated the utility of additional knowledge for Bayesian parameter inference for
multiple instances of the same ODE model. Based on the hypothesis that parameter distributions should
be statistically similar for ODE models of cells from the same cell line, we decided to transfer knowledge
learned from one cell to the learning of another cell. By doing so, we successfully reduced average computational time for a sequence of cells, which formed a cell chain in which two consecutive cells were linked
by transfer learning (Section 2.4.1). In addition, the average posterior likelihood of sampled parameters
were elevated (Section 2.4.1). To improve our results even further, we incorporated transcriptomic data
from MERFISH into the transfer learning. By ordering cells based on their transcriptional similarity, we
67
ensured that the prior for one cell was induced by the posterior of a cell with similar gene expression profile. Clustering on posterior distributions sampled from similarity-guided cell chain separated cells into
clusters with distinct Ca2+ dynamics and distinct gene expression patterns, while clustering on posterior
distributions from randomly ordered cell chain failed to do so (Section 2.4.5).
Our approach to enhance Bayesian parameter inference was conceptually simple, but as often is the
case in systems biology, the curse of dimensionality makes implementation and subsequent analyses challenging. The full Ca2+ signaling model had 18 parameters, but only a single Ca2+ response variable was
measured. To constrain posterior distributions, we had to introduce a “scaling and clipping” technique
when constructing the prior of a cell from the posterior of its predecessor (Section A.1.1). Even so, two
“sloppy” parameters were identified when inspecting the marginal posterior distribution of each parameter, which were then set to constants in reduced models (Section 2.4.2). Ordering cells by transcriptional
similarity might have served as a constraint since the posterior distributions from similarity-induced cell
chains were more informative. As such, incorporating more relevant data, for instance, spatial information
from MERFISH, may help further constrain the problem.
Another challenge due to high dimensionality was that posterior distributions of parameters were
output as empirical samples rather than scalar values, making it difficult to use conventional approaches to
analyze them. For example, sensitivity of each parameter was assessed by perturbing it across its marginal
distribution instead of the more common forward and adjoint sensitivities (Section 2.4.3). Also, clustering
in Section 2.4.5 were performed on posterior means rather than full posterior samples, as no suitable
method for clustering on full samples was found. Thus, one next step for Bayesian parameter inference
for large ODEs is to develop better tools for analyzing posterior samples, including but not limited to
sensitivity, clustering, and visualization.
68
4.2 Future works for data-driven model discovery for complex dynamical systems
In Chapter 3, we presented a robust method for discovering ODE models from data with limited prior
knowledge. The method was developed on top of a model discovery pipeline introduced in Rackauckas
et al. [50] for dynamical systems, which fits a hybrid dynamical model to data and uses SINDy to restore
symbolic terms. The hybrid model reflects our understanding of the dynamical system from which data
was collected: explicit terms in the model characterize prior knowledge, and neural networks approximate
latent dynamics, i.e. those which we must infer from data. To lower generalization error of trained neural
networks and recovered ODEs, we introduce techniques commonly used in modern machine learning. That
is, we split data into training samples and validation samples, and learn models from training data with
different hyperparameters. For synthetic datasets with Gaussian noise, our method recovered governing
ODEs with correct terms up to 10% noise (Section 3.4.1 and Section 3.4.3). Those correctly recovered
models were able to extrapolate data beyond the time span on which the models were trained (Figure 3.3
and Figure 3.5).
The robustness of our method against generalization error comes at the cost of computational complexity. The computational time needed to perform systematic search of hyperparameters under our current
implementation grows exponentially as more hyperparameters are included. In practice, it is strongly recommended to only search over hyperparameters that have larger impact on model learning. To determine
optimal hyperparameters efficiently, we will need to use a hyperparameter optimization strategy in place
of grid search [122, 123]. We would also like to note that we did not specifically attempt to make our
method robust against noise, but it can be improved to handle higher noise levels and different types of
noise. For data with larger noise, we can either denoise the data at the very beginning using methods
69
introduced in Section 1.2, or use Weak SINDy instead of base SINDy [113, 114]. If given data with nonzero
mean or asymmetric noise, we may learn the noise explicitly by Bayesian and variational inference.
4.3 Conclusions
We have seen how integration of supplemental knowledge into data-driven learning of dynamical systems
can contribute to the improvement of learning outcome. It is worth noting that how pieces of knowledge
can be integrated depends on the form in which the knowledge is presented. In Chapter 2, the extra
information about the ATP-stimulated cells, i.e. transcriptomic data from MERFISH, was quantitative to
begin with and remained quantitative when being used to guide transfer learning of model parameters. It
was converted into cell-cell similarity scores, which were then employed to inform the construction of cell
chains. In Chapter 3, the assumed prior knowledge about the Lotka-Volterra model and the repressilator
model was mostly qualitative. Each state variable’s growth or decay was turned into symbolic terms in
hybrid dynamical models, and assumptions about uncovered interactions between state variables became
candidate basis functions for SINDy regression.
Overall, by incorporating prior knowledge into data-driven learning frameworks, methods proposed
in this work have expanded our toolkit for learning ODE models from data, offering new means by which
to infer dynamic properties of biological systems. As scientists expect to learn more from growing amount
of data and multimodal biological data becomes increasingly prevalent, the importance of knowledge integration becomes even more pronounced. Application of our methods to new datasets should benefit
researchers in fundamental biology as well as in clinical studies and enable the discovery of new knowledge. It is our goal to propose more computational methods for learning dynamical systems, which are not
only efficient and robust but also scalable for large, noisy, and complex biological data. Those methods will
leverage existing knowledge, extract tremendous information from data, and greatly enhance our ability
to learn new biology.
70
Bibliography
[1] Alfred J. Lotka. “Analytical Note on Certain Rhythmic Relations in Organic Systems”. In:
Proceedings of the National Academy of Sciences 6.7 (July 1920), pp. 410–415. issn: 0027-8424,
1091-6490. doi: 10.1073/pnas.6.7.410.
[2] Vito Volterra. “Fluctuations in the Abundance of a Species Considered Mathematically”. In:
Nature 118.2972 (Oct. 1926), pp. 558–560. issn: 0028-0836, 1476-4687. doi: 10.1038/118558a0.
[3] William O. Kermack and A.G. McKendrick. “A Contribution to the Mathematical Theory of
Epidemics”. In: Proceedings of the Royal Society of London. Series A, Containing Papers of a
Mathematical and Physical Character 115.772 (Aug. 1927), pp. 700–721. issn: 0950-1207,
2053-9150. doi: 10.1098/rspa.1927.0118.
[4] Michael B. Elowitz and Stanislas Leibler. “A Synthetic Oscillatory Network of Transcriptional
Regulators”. In: Nature 403.6767 (Jan. 2000), pp. 335–338. issn: 0028-0836, 1476-4687. doi:
10.1038/35002125.
[5] Megan K. Rommelfanger and Adam L. MacLean. “A Single-Cell Resolved Cell-Cell
Communication Model Explains Lineage Commitment in Hematopoiesis”. In: Development 148.24
(Dec. 2021), dev199779. issn: 0950-1991, 1477-9129. doi: 10.1242/dev.199779.
[6] Ian Cooper, Argha Mondal, and Chris G. Antonopoulos. “A SIR Model Assumption for the Spread
of COVID-19 in Different Communities”. In: Chaos, Solitons & Fractals 139 (Oct. 2020), p. 110057.
issn: 09600779. doi: 10.1016/j.chaos.2020.110057.
[7] Yi-Cheng Chen, Ping-En Lu, Cheng-Shang Chang, and Tzu-Hsuan Liu. “A Time-Dependent SIR
Model for COVID-19 With Undetectable Infected Persons”. In: IEEE Transactions on Network
Science and Engineering 7.4 (Oct. 2020), pp. 3279–3294. issn: 2327-4697, 2334-329X. doi:
10.1109/TNSE.2020.3024723.
[8] Rui Wang, Danielle Maddix, Christos Faloutsos, Yuyang Wang, and Rose Yu. “Bridging
Physics-Based and Data-Driven Modeling for Learning Dynamical Systems”. In: Proceedings of the
3rd Conference on Learning for Dynamics and Control. Ed. by Ali Jadbabaie, John Lygeros,
George J. Pappas, Pablo A. Parrilo, Benjamin Recht, Claire J. Tomlin, and
Melanie N. Zeilinger. Vol. 144. Proceedings of Machine Learning Research. PMLR, June 2021,
pp. 385–398. url: https://proceedings.mlr.press/v144/wang21a.html.
71
[9] J J Tyson. “Modeling the Cell Division Cycle: Cdc2 and Cyclin Interactions.” In: Proceedings of the
National Academy of Sciences 88.16 (Aug. 1991), pp. 7328–7332. issn: 0027-8424, 1091-6490. doi:
10.1073/pnas.88.16.7328.
[10] T. Alarcón, H.M. Byrne, and P.K. Maini. “A Mathematical Model of the Effects of Hypoxia on the
Cell-Cycle of Normal and Cancer Cells”. In: Journal of Theoretical Biology 229.3 (Aug. 2004),
pp. 395–411. issn: 00225193. doi: 10.1016/j.jtbi.2004.04.016.
[11] John J. Tyson and Béla Novák. “Time-Keeping and Decision-Making in the Cell Cycle”. In:
Interface Focus 12.4 (Aug. 2022), p. 20210075. issn: 2042-8901. doi: 10.1098/rsfs.2021.0075.
[12] Mary Lee, George T Chen, Eric Puttock, Kehui Wang, Robert A Edwards, Marian L Waterman,
and John Lowengrub. “Mathematical Modeling Links Wnt Signaling to Emergent Patterns of
Metabolism in Colon Cancer”. In: Molecular Systems Biology 13.2 (Feb. 2017), p. 912. issn:
1744-4292, 1744-4292. doi: 10.15252/msb.20167386.
[13] William R. Holmes, JinSeok Park, Andre Levchenko, and Leah Edelstein-Keshet. “A Mathematical
Model Coupling Polarity Signaling to Cell Adhesion Explains Diverse Cell Migration Patterns”.
In: PLOS Computational Biology 13.5 (May 2017). Ed. by Qing Nie, e1005524. issn: 1553-7358. doi:
10.1371/journal.pcbi.1005524.
[14] Tao Peng, Linan Liu, Adam L MacLean, Chi Wut Wong, Weian Zhao, and Qing Nie. “A
Mathematical Model of Mechanotransduction Reveals How Mechanical Memory Regulates
Mesenchymal Stem Cell Fate Decisions”. In: BMC Systems Biology 11.1 (Dec. 2017), p. 55. issn:
1752-0509. doi: 10.1186/s12918-017-0429-x.
[15] Carolin Loos and Jan Hasenauer. “Mathematical Modeling of Variability in Intracellular
Signaling”. In: Current Opinion in Systems Biology 16 (Aug. 2019), pp. 17–24. issn: 24523100. doi:
10.1016/j.coisb.2019.10.020.
[16] Yue-Xian Li and John Rinzel. “Equations for InsP3 Receptor-mediated [Ca2+]i Oscillations
Derived from a Detailed Kinetic Model: A Hodgkin-Huxley Like Formalism”. In: Journal of
Theoretical Biology 166.4 (Feb. 1994), pp. 461–473. issn: 00225193. doi: 10.1006/jtbi.1994.1041.
[17] G. Lemon, W.G. Gibson, and M.R. Bennett. “Metabotropic Receptor Activation, Desensitization
and Sequestration—I: Modelling Calcium and Inositol 1,4,5-Trisphosphate Dynamics Following
Receptor Activation”. In: Journal of Theoretical Biology 223.1 (July 2003), pp. 93–111. issn:
00225193. doi: 10.1016/S0022-5193(03)00079-1.
[18] Peter J. Brockwell and Richard A. Davis. Time Series: Theory and Methods. 2. ed. Springer Series in
Statistics. New York Berlin Heidelberg: Springer, 1991. isbn: 978-1-4419-0320-4 978-0-387-97429-3
978-3-540-97429-1 978-1-4419-0319-8.
[19] Clive Loader. Local Regression and Likelihood. Statistics and Computing. New York Heidelberg:
Springer, 1999. isbn: 978-0-387-98775-0.
[20] P. Whittle. Hypothesis Testing in Time Series Analysis. Statistics / Uppsala Universitet. Almqvist &
Wiksells boktr., 1951. isbn: 978-0-598-91982-3.
72
[21] P.J. Brockwell. “Continuous-Time ARMA Processes”. In: Handbook of Statistics. Vol. 19. Elsevier,
2001, pp. 249–276. isbn: 978-0-444-50014-4. doi: 10.1016/S0169-7161(01)19011-5.
[22] Jan G. De Gooijer and Rob J. Hyndman. “25 Years of Time Series Forecasting”. In: International
Journal of Forecasting 22.3 (Jan. 2006), pp. 443–473. issn: 01692070. doi:
10.1016/j.ijforecast.2006.01.001.
[23] Bryan Lim and Stefan Zohren. “Time-Series Forecasting with Deep Learning: A Survey”. In:
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
379.2194 (Apr. 2021), p. 20200209. issn: 1364-503X, 1471-2962. doi: 10.1098/rsta.2020.0209.
[24] Ricardo P. Masini, Marcelo C. Medeiros, and Eduardo F. Mendes. “Machine Learning Advances for
Time Series Forecasting”. In: Journal of Economic Surveys 37.1 (Feb. 2023), pp. 76–111. issn:
0950-0804, 1467-6419. doi: 10.1111/joes.12429.
[25] G. Cybenko. “Approximation by Superpositions of a Sigmoidal Function”. In: Mathematics of
Control, Signals, and Systems 2.4 (Dec. 1989), pp. 303–314. issn: 0932-4194, 1435-568X. doi:
10.1007/BF02551274.
[26] Kurt Hornik, Maxwell Stinchcombe, and Halbert White. “Multilayer Feedforward Networks Are
Universal Approximators”. In: Neural Networks 2.5 (Jan. 1989), pp. 359–366. issn: 08936080. doi:
10.1016/0893-6080(89)90020-8.
[27] Kurt Hornik. “Approximation Capabilities of Multilayer Feedforward Networks”. In: Neural
Networks 4.2 (1991), pp. 251–257. issn: 08936080. doi: 10.1016/0893-6080(91)90009-T.
[28] Moshe Leshno, Vladimir Ya. Lin, Allan Pinkus, and Shimon Schocken. “Multilayer Feedforward
Networks with a Nonpolynomial Activation Function Can Approximate Any Function”. In:
Neural Networks 6.6 (Jan. 1993), pp. 861–867. issn: 08936080. doi: 10.1016/S0893-6080(05)80131-5.
[29] Ding-Xuan Zhou. “Universality of Deep Convolutional Neural Networks”. In: Applied and
Computational Harmonic Analysis 48.2 (Mar. 2020), pp. 787–794. issn: 10635203. doi:
10.1016/j.acha.2019.06.004.
[30] Anton Maximilian Schäfer and Hans Georg Zimmermann. “Recurrent Neural Networks Are
Universal Approximators”. In: Artificial Neural Networks – ICANN 2006. Ed. by David Hutchison,
Takeo Kanade, Josef Kittler, Jon M. Kleinberg, Friedemann Mattern, John C. Mitchell, Moni Naor,
Oscar Nierstrasz, C. Pandu Rangan, Bernhard Steffen, Madhu Sudan, Demetri Terzopoulos,
Dough Tygar, Moshe Y. Vardi, Gerhard Weikum, Stefanos D. Kollias, Andreas Stafylopatis,
Włodzisław Duch, and Erkki Oja. Vol. 4131. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006,
pp. 632–640. isbn: 978-3-540-38625-4 978-3-540-38627-8. doi: 10.1007/11840817_66.
[31] Antonio Rafael Sabino Parmezan, Vinicius M.A. Souza, and Gustavo E.A.P.A. Batista. “Evaluation
of Statistical and Machine Learning Models for Time Series Prediction: Identifying the
State-of-the-Art and the Best Conditions for the Use of Each Model”. In: Information Sciences 484
(May 2019), pp. 302–337. issn: 00200255. doi: 10.1016/j.ins.2019.01.076.
[32] T. Warren Liao. “Clustering of Time Series Data—a Survey”. In: Pattern Recognition 38.11 (Nov.
2005), pp. 1857–1874. issn: 00313203. doi: 10.1016/j.patcog.2005.01.025.
73
[33] Ian C. McDowell, Dinesh Manandhar, Christopher M. Vockley, Amy K. Schmid,
Timothy E. Reddy, and Barbara E. Engelhardt. “Clustering Gene Expression Time Series Data
Using an Infinite Gaussian Process Mixture Model”. In: PLOS Computational Biology 14.1 (Jan.
2018). Ed. by Qing Nie, e1005896. issn: 1553-7358. doi: 10.1371/journal.pcbi.1005896.
[34] PW Hemker. “Numerical Methods for Differential Equations in System Simulation and in
Parameter Estimation”. In: Analysis and Simulation of biochemical systems 28 (1972), pp. 59–80.
[35] Yonathan Bard. Nonlinear Parameter Estimation. New York: Academic Press, 1974. isbn:
978-0-12-078250-5.
[36] H. G. Bock. “Recent Advances in Parameteridentification Techniques for O.D.E.” In: Numerical
Treatment of Inverse Problems in Differential and Integral Equations: Proceedings of an International
Workshop, Heidelberg, Fed. Rep. of Germany, August 30 — September 3, 1982. Boston, MA:
Birkhäuser Boston, 1983, pp. 95–121. isbn: 978-1-4684-7324-7. doi: 10.1007/978-1-4684-7324-7_7.
[37] Gloria I. Valderrama-Bahamóndez and Holger Fröhlich. “MCMC Techniques for Parameter
Estimation of ODE Based Models in Systems Biology”. In: Frontiers in Applied Mathematics and
Statistics 5 (Nov. 2019), p. 55. issn: 2297-4687. doi: 10.3389/fams.2019.00055.
[38] Tina Toni, David Welch, Natalja Strelkowa, Andreas Ipsen, and Michael P.H Stumpf.
“Approximate Bayesian Computation Scheme for Parameter Inference and Model Selection in
Dynamical Systems”. In: Journal of The Royal Society Interface 6.31 (Feb. 2009), pp. 187–202. issn:
1742-5689, 1742-5662. doi: 10.1098/rsif.2008.0172.
[39] Adam L. MacLean, Cristina Lo Celso, and Michael P. H. Stumpf. “Population Dynamics of Normal
and Leukaemia Stem Cells in the Haematopoietic Stem Cell Niche Show Distinct Regimes Where
Leukaemia Will Be Controlled”. In: Journal of The Royal Society Interface 10.81 (Apr. 2013),
p. 20120968. issn: 1742-5689, 1742-5662. doi: 10.1098/rsif.2012.0968.
[40] Adam L. MacLean, Sarah Filippi, and Michael P. H. Stumpf. “The Ecology in the Hematopoietic
Stem Cell Niche Determines the Clinical Outcome in Chronic Myeloid Leukemia”. In: Proceedings
of the National Academy of Sciences 111.10 (Mar. 2014), pp. 3883–3888. issn: 0027-8424, 1091-6490.
doi: 10.1073/pnas.1317072111.
[41] Jason Yao, Anna Pilko, and Roy Wollman. “Distinct Cellular States Determine Calcium Signaling
Response”. In: Molecular Systems Biology 12.12 (Dec. 2016), p. 894. issn: 1744-4292, 1744-4292. doi:
10.15252/msb.20167137.
[42] Nicholas Metropolis, Arianna W. Rosenbluth, Marshall N. Rosenbluth, Augusta H. Teller, and
Edward Teller. “Equation of State Calculations by Fast Computing Machines”. In: The Journal of
Chemical Physics 21.6 (June 1953), pp. 1087–1092. issn: 0021-9606, 1089-7690. doi:
10.1063/1.1699114.
[43] W. K. Hastings. “Monte Carlo Sampling Methods Using Markov Chains and Their Applications”.
In: Biometrika 57.1 (Apr. 1970), pp. 97–109. issn: 1464-3510, 0006-3444. doi:
10.1093/biomet/57.1.97.
74
[44] Stuart Geman and Donald Geman. “Stochastic Relaxation, Gibbs Distributions, and the Bayesian
Restoration of Images”. In: IEEE Transactions on Pattern Analysis and Machine Intelligence
PAMI-6.6 (Nov. 1984), pp. 721–741. issn: 0162-8828. doi: 10.1109/TPAMI.1984.4767596.
[45] Michael Betancourt. “A Conceptual Introduction to Hamiltonian Monte Carlo”. In:
arXiv:1701.02434 [stat] (July 2018). arXiv: 1701.02434 [stat]. url:
http://arxiv.org/abs/1701.02434.
[46] Matthew D. Hoffman and Andrew Gelman. “The No-U-Turn Sampler: Adaptively Setting Path
Lengths in Hamiltonian Monte Carlo”. In: Journal of Machine Learning Research 15.47 (2014),
pp. 1593–1623. url: http://jmlr.org/papers/v15/hoffman14a.html.
[47] Robert Foreman and Roy Wollman. “Mammalian Gene Expression Variability Is Explained by
Underlying Cell State”. In: Molecular Systems Biology 16.2 (Feb. 2020). issn: 1744-4292, 1744-4292.
doi: 10.15252/msb.20199146.
[48] John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger,
Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, Alex Bridgland,
Clemens Meyer, Simon A. A. Kohl, Andrew J. Ballard, Andrew Cowie,
Bernardino Romera-Paredes, Stanislav Nikolov, Rishub Jain, Jonas Adler, Trevor Back,
Stig Petersen, David Reiman, Ellen Clancy, Michal Zielinski, Martin Steinegger,
Michalina Pacholska, Tamas Berghammer, Sebastian Bodenstein, David Silver, Oriol Vinyals,
Andrew W. Senior, Koray Kavukcuoglu, Pushmeet Kohli, and Demis Hassabis. “Highly Accurate
Protein Structure Prediction with AlphaFold”. In: Nature 596.7873 (Aug. 2021), pp. 583–589. issn:
0028-0836, 1476-4687. doi: 10.1038/s41586-021-03819-2.
[49] Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. “Neural Ordinary
Differential Equations”. In: CoRR abs/1806.07366 (2018). url: http://arxiv.org/abs/1806.07366.
[50] Christopher Rackauckas, Yingbo Ma, Julius Martensen, Collin Warner, Kirill Zubov,
Rohit Supekar, Dominic Skinner, and Ali Jasim Ramadhan. “Universal Differential Equations for
Scientific Machine Learning”. In: CoRR abs/2001.04385 (2020). url:
https://arxiv.org/abs/2001.04385.
[51] M. Raissi, P. Perdikaris, and G. E. Karniadakis. “Physics-Informed Neural Networks: A Deep
Learning Framework for Solving Forward and Inverse Problems Involving Nonlinear Partial
Differential Equations”. In: Journal of Computational Physics 378 (2019), pp. 686–707. issn:
0021-9991. doi: 10.1016/j.jcp.2018.10.045.
[52] Tong Qin, Kailiang Wu, and Dongbin Xiu. “Data Driven Governing Equations Approximation
Using Deep Neural Networks”. In: Journal of Computational Physics 395 (Oct. 2019), pp. 620–635.
issn: 0021-9991. doi: 10.1016/j.jcp.2019.06.042.
[53] Steven L. Brunton, Joshua L. Proctor, and J. Nathan Kutz. “Discovering Governing Equations from
Data by Sparse Identification of Nonlinear Dynamical Systems”. In: Proceedings of the National
Academy of Sciences 113.15 (Apr. 2016), pp. 3932–3937. issn: 0027-8424, 1091-6490. doi:
10.1073/pnas.1517384113.
75
[54] Joshua S. North, Christopher K. Wikle, and Erin M. Schliep. “A Review of Data-Driven Discovery
for Dynamic Systems”. In: International Statistical Review 91.3 (Dec. 2023), pp. 464–492. issn:
0306-7734, 1751-5823. doi: 10.1111/insr.12554.
[55] Dimitrios Angelis, Filippos Sofos, and Theodoros E. Karakasidis. “Artificial Intelligence in
Physical Sciences: Symbolic Regression Trends and Perspectives”. In: Archives of Computational
Methods in Engineering 30.6 (July 2023), pp. 3845–3865. issn: 1134-3060, 1886-1784. doi:
10.1007/s11831-023-09922-z.
[56] Iain C. Macaulay, Chris P. Ponting, and Thierry Voet. “Single-Cell Multiomics: Multiple
Measurements from Single Cells”. In: Trends in Genetics 33.2 (Feb. 2017), pp. 155–168. issn:
01689525. doi: 10.1016/j.tig.2016.12.003.
[57] Lia Chappell, Andrew J.C. Russell, and Thierry Voet. “Single-Cell (Multi)Omics Technologies”. In:
Annual Review of Genomics and Human Genetics 19.1 (Aug. 2018), pp. 15–41. issn: 1527-8204,
1545-293X. doi: 10.1146/annurev-genom-091416-035324.
[58] Jeongwoo Lee, Do Young Hyeon, and Daehee Hwang. “Single-Cell Multiomics: Technologies and
Data Analysis Methods”. In: Experimental & Molecular Medicine 52.9 (Sept. 2020), pp. 1428–1442.
issn: 1226-3613, 2092-6413. doi: 10.1038/s12276-020-0420-2.
[59] Alev Baysoy, Zhiliang Bai, Rahul Satija, and Rong Fan. “The Technological Landscape and
Applications of Single-Cell Multi-Omics”. In: Nature Reviews Molecular Cell Biology 24.10 (Oct.
2023), pp. 695–713. issn: 1471-0072, 1471-0080. doi: 10.1038/s41580-023-00615-w.
[60] David Tatarakis, Zixuan Cang, Xiaojun Wu, Praveer P. Sharma, Matthew Karikomi,
Adam L. MacLean, Qing Nie, and Thomas F. Schilling. “Single-Cell Transcriptomic Analysis of
Zebrafish Cranial Neural Crest Reveals Spatiotemporal Regulation of Lineage Decisions during
Development”. In: Cell Reports 37.12 (Dec. 2021), p. 110140. issn: 22111247. doi:
10.1016/j.celrep.2021.110140.
[61] Lingyun Xiong, Jing Liu, Seung Yub Han, Kari Koppitch, Jin-Jin Guo, Megan Rommelfanger,
Zhen Miao, Fan Gao, Ingileif B. Hallgrimsdottir, Lior Pachter, Junhyong Kim, Adam L. MacLean,
and Andrew P. McMahon. “Direct Androgen Receptor Control of Sexually Dimorphic Gene
Expression in the Mammalian Kidney”. In: Developmental Cell 58.21 (Nov. 2023), 2338–2358.e5.
issn: 15345807. doi: 10.1016/j.devcel.2023.08.010.
[62] J.R. Moffitt and X. Zhuang. “RNA Imaging with Multiplexed Error-Robust Fluorescence In Situ
Hybridization (MERFISH)”. In: Methods in Enzymology. Vol. 572. Elsevier, 2016, pp. 1–49. isbn:
978-0-12-802292-4. doi: 10.1016/bs.mie.2016.03.020.
[63] Xiaojun Wu, Roy Wollman, and Adam L. MacLean. “Single-Cell Ca 2+ Parameter Inference
Reveals How Transcriptional States Inform Dynamic Cell Responses”. In: Journal of The Royal
Society Interface 20.203 (June 2023), p. 20230172. issn: 1742-5662. doi: 10.1098/rsif.2023.0172.
[64] Darren James Wilkinson. Stochastic Modelling for Systems Biology. Third edition, first issued in
paperback. Chapman & Hall/CRC Mathematical and Computational Biology. Boca Raton London
New York: CRC Press, Taylor & Francis Group, 2020. isbn: 978-0-367-65693-5.
76
[65] S. Hoops, S. Sahle, R. Gauges, C. Lee, J. Pahle, N. Simus, M. Singhal, L. Xu, P. Mendes, and
U. Kummer. “COPASI–a COmplex PAthway SImulator”. In: Bioinformatics 22.24 (Dec. 2006),
pp. 3067–3074. issn: 1367-4803, 1460-2059. doi: 10.1093/bioinformatics/btl485.
[66] Juliane Liepe, Sarah Filippi, Michał Komorowski, and Michael P. H. Stumpf. “Maximizing the
Information Content of Experiments in Systems Biology”. In: PLoS Computational Biology 9.1 (Jan.
2013). Ed. by Andrey Rzhetsky, e1002888. issn: 1553-7358. doi: 10.1371/journal.pcbi.1002888.
[67] David J. Warne, Ruth E. Baker, and Matthew J. Simpson. “Simulation and Inference Algorithms
for Stochastic Biochemical Reaction Networks: From Basic Concepts to State-of-the-Art”. In:
Journal of The Royal Society Interface 16.151 (Feb. 2019), p. 20180943. issn: 1742-5689, 1742-5662.
doi: 10.1098/rsif.2018.0943.
[68] Allon Wagner, Aviv Regev, and Nir Yosef. “Revealing the Vectors of Cellular Identity with
Single-Cell Genomics”. In: Nature Biotechnology 34.11 (Nov. 2016), pp. 1145–1160. issn: 1087-0156,
1546-1696. doi: 10.1038/nbt.3711.
[69] Brian Munsky, Gregor Neuert, and Alexander van Oudenaarden. “Using Gene Expression Noise
to Understand Gene Regulation”. In: Science 336.6078 (Apr. 2012), pp. 183–187. issn: 0036-8075,
1095-9203. doi: 10.1126/science.1216379.
[70] David M. Suter, Nacho Molina, David Gatfield, Kim Schneider, Ueli Schibler, and Felix Naef.
“Mammalian Genes Are Transcribed with Widely Different Bursting Kinetics”. In: Science
332.6028 (Apr. 2011), pp. 472–474. issn: 0036-8075, 1095-9203. doi: 10.1126/science.1198817.
[71] Arjun Raj and Alexander van Oudenaarden. “Nature, Nurture, or Chance: Stochastic Gene
Expression and Its Consequences”. In: Cell 135.2 (Oct. 2008), pp. 216–226. issn: 00928674. doi:
10.1016/j.cell.2008.09.050.
[72] Pavel A. Brodskiy and Jeremiah J. Zartman. “Calcium as a Signal Integrator in Developing
Epithelial Tissues”. In: Physical Biology 15.5 (May 2018), p. 051001. issn: 1478-3975. doi:
10.1088/1478-3975/aabb18.
[73] Lucy J Leiper, Petr Walczysko, Romana Kucerova, Jingxing Ou, Lynne J Shanley, Diane Lawson,
John V Forrester, Colin D McCaig, Min Zhao, and J Martin Collinson. “The Roles of Calcium
Signaling and ERK1/2 Phosphorylation in a Pax6+/-Mouse Model of Epithelial Wound-Healing
Delay”. In: BMC Biology 4.1 (Dec. 2006), p. 27. issn: 1741-7007. doi: 10.1186/1741-7007-4-27.
[74] József Maléth and Péter Hegyi. “Calcium Signaling in Pancreatic Ductal Epithelial Cells: An Old
Friend and a Nasty Enemy”. In: Cell Calcium 55.6 (June 2014), pp. 337–345. issn: 01434160. doi:
10.1016/j.ceca.2014.02.004.
[75] G. Dupont, L. Combettes, G. S. Bird, and J. W. Putney. “Calcium Oscillations”. In: Cold Spring
Harbor Perspectives in Biology 3.3 (Mar. 2011), a004226–a004226. issn: 1943-0264. doi:
10.1101/cshperspect.a004226.
[76] Muthu Periasamy and Anuradha Kalyanasundaram. “SERCA Pump Isoforms: Their Role in
Calcium Transport and Disease”. In: Muscle & Nerve 35.4 (Apr. 2007), pp. 430–442. issn: 0148639X,
10974598. doi: 10.1002/mus.20745.
77
[77] Leung Hang Ma, Sarah E. Webb, Ching Man Chan, Jiao Zhang, and Andrew L. Miller.
“Establishment of a Transitory Dorsal-Biased Window of Localized Ca2+ Signaling in the
Superficial Epithelium Following the Mid-Blastula Transition in Zebrafish Embryos”. In:
Developmental Biology 327.1 (Mar. 2009), pp. 143–157. issn: 00121606. doi:
10.1016/j.ydbio.2008.12.015.
[78] Ramya Balaji, Christina Bielmeier, Hartmann Harz, Jack Bates, Cornelia Stadler,
Alexander Hildebrand, and Anne-Kathrin Classen. “Calcium Spikes, Waves and Oscillations in a
Large, Patterned Epithelial Tissue”. In: Scientific Reports 7.1 (Feb. 2017), p. 42786. issn: 2045-2322.
doi: 10.1038/srep42786.
[79] Delfina C. Domínguez, Manita Guragain, and Marianna Patrauchan. “Calcium Binding Proteins
and Calcium Signaling in Prokaryotes”. In: Cell Calcium 57.3 (Mar. 2015), pp. 151–165. issn:
01434160. doi: 10.1016/j.ceca.2014.12.006.
[80] Adam L. MacLean, Tian Hong, and Qing Nie. “Exploring Intermediate Cell States through the
Lens of Single Cells”. In: Current Opinion in Systems Biology 9 (June 2018), pp. 32–41. issn:
24523100. doi: 10.1016/j.coisb.2018.02.009.
[81] Chee-Huat Linus Eng, Michael Lawson, Qian Zhu, Ruben Dries, Noushin Koulena, Yodai Takei,
Jina Yun, Christopher Cronin, Christoph Karp, Guo-Cheng Yuan, and Long Cai.
“Transcriptome-Scale Super-Resolved Imaging in Tissues by RNA seqFISH+”. In: Nature 568.7751
(Apr. 2019), pp. 235–239. issn: 0028-0836, 1476-4687. doi: 10.1038/s41586-019-1049-y.
[82] Steven H. Strogatz. Nonlinear Dynamics and Chaos: With Applications to Physics, Biology,
Chemistry, and Engineering. Second edition. Boulder, CO: Westview Press, a member of the
Perseus Books Group, 2015. isbn: 978-0-8133-4910-7.
[83] Amy Y. Chang and Wallace F. Marshall. “Dynamics of Living Cells in a Cytomorphological State
Space”. In: Proceedings of the National Academy of Sciences 116.43 (Oct. 2019), pp. 21556–21562.
issn: 0027-8424, 1091-6490. doi: 10.1073/pnas.1902849116.
[84] Caleb Weinreb, Samuel Wolock, Betsabeh K. Tusi, Merav Socolovsky, and Allon M. Klein.
“Fundamental Limits on Dynamic Inference from Single-Cell Snapshots”. In: Proceedings of the
National Academy of Sciences 115.10 (Mar. 2018), E2467–E2476. issn: 0027-8424, 1091-6490. doi:
10.1073/pnas.1714723115.
[85] Irena Kuzmanovska, Andreas Milias-Argeitis, Jan Mikelson, Christoph Zechner, and
Mustafa Khammash. “Parameter Inference for Stochastic Single-Cell Dynamics from Lineage Tree
Data”. In: BMC Systems Biology 11.1 (Dec. 2017), p. 52. issn: 1752-0509. doi:
10.1186/s12918-017-0425-1.
[86] Liam J Ruske, Jochen Kursawe, Anestis Tsakiridis, Valerie Wilson, Alexander G Fletcher,
Richard A Blythe, and Linus J Schumacher. “Coupled Differentiation and Division of Embryonic
Stem Cells Inferred from Clonal Snapshots”. In: Physical Biology 17.6 (Nov. 2020), p. 065009. issn:
1478-3967, 1478-3975. doi: 10.1088/1478-3975/aba041.
78
[87] Patrick S. Stumpf, Rosanna C.G. Smith, Michael Lenz, Andreas Schuppert, Franz-Josef Müller,
Ann Babtie, Thalia E. Chan, Michael P.H. Stumpf, Colin P. Please, Sam D. Howison, Fumio Arai,
and Ben D. MacArthur. “Stem Cell Differentiation as a Non-Markov Stochastic Process”. In: Cell
Systems 5.3 (Sept. 2017), 268–282.e7. issn: 24054712. doi: 10.1016/j.cels.2017.08.009.
[88] Yutong Sha, Shuxiong Wang, Peijie Zhou, and Qing Nie. “Inference and Multiscale Model of
Epithelial-to-Mesenchymal Transition via Single-Cell Transcriptomic Data”. In: Nucleic Acids
Research 48.17 (Sept. 2020), pp. 9505–9520. issn: 0305-1048. doi: 10.1093/nar/gkaa725.
[89] Cerys S. Manning, Veronica Biga, James Boyd, Jochen Kursawe, Bodvar Ymisson, David G. Spiller,
Christopher M. Sanderson, Tobias Galla, Magnus Rattray, and Nancy Papalopulu. “Quantitative
Single-Cell Live Imaging Links HES5 Dynamics with Cell-State and Fate in Murine
Neurogenesis”. In: Nature Communications 10.1 (June 2019), p. 2835. issn: 2041-1723. doi:
10.1038/s41467-019-10734-8.
[90] Joshua Burton, Cerys S. Manning, Magnus Rattray, Nancy Papalopulu, and Jochen Kursawe.
“Inferring Kinetic Parameters of Oscillatory Gene Regulation from Single Cell Time-Series Data”.
In: Journal of The Royal Society Interface 18.182 (Sept. 2021), p. 20210393. issn: 1742-5662. doi:
10.1098/rsif.2021.0393.
[91] Lekshmi Dharmarajan, Hans-Michael Kaltenbach, Fabian Rudolf, and Joerg Stelling. “A Simple
and Flexible Computational Framework for Inferring Sources of Heterogeneity from Single-Cell
Dynamics”. In: Cell Systems 8.1 (Jan. 2019), 15–26.e11. issn: 24054712. doi:
10.1016/j.cels.2018.12.007.
[92] Sebastian Persson, Niek Welkenhuysen, Sviatlana Shashkova, Samuel Wiqvist, Patrick Reith,
Gregor W. Schmidt, Umberto Picchini, and Marija Cvijovic. “Scalable and Flexible Inference
Framework for Stochastic Dynamic Single-Cell Models”. In: PLOS Computational Biology 18.5
(May 2022). Ed. by James R. Faeder, e1010082. issn: 1553-7358. doi: 10.1371/journal.pcbi.1010082.
[93] Carolin Loos, Katharina Moeller, Fabian Fröhlich, Tim Hucho, and Jan Hasenauer. “A
Hierarchical, Data-Driven Approach to Modeling Single-Cell Populations Predicts Latent Causes
of Cell-To-Cell Variability”. In: Cell Systems 6.5 (May 2018), 593–603.e13. issn: 24054712. doi:
10.1016/j.cels.2018.04.008.
[94] Zixuan Cang and Qing Nie. “Inferring Spatial and Signaling Relationships between Cells from
Single Cell Transcriptomic Data”. In: Nature Communications 11.1 (Apr. 2020), p. 2084. issn:
2041-1723. doi: 10.1038/s41467-020-15968-5.
[95] Kaan Öcal, Michael U. Gutmann, Guido Sanguinetti, and Ramon Grima. “Inference and
Uncertainty Quantification of Stochastic Gene Expression via Synthetic Models”. In: Journal of
The Royal Society Interface 19.192 (July 2022), p. 20220153. issn: 1742-5662. doi:
10.1098/rsif.2022.0153.
[96] Qingchao Jiang, Xiaoming Fu, Shifu Yan, Runlai Li, Wenli Du, Zhixing Cao, Feng Qian, and
Ramon Grima. “Neural Network Aided Approximation and Parameter Inference of
Non-Markovian Models of Gene Expression”. In: Nature Communications 12.1 (May 2021), p. 2618.
issn: 2041-1723. doi: 10.1038/s41467-021-22919-1.
79
[97] Bob Carpenter, Andrew Gelman, Matthew D. Hoffman, Daniel Lee, Ben Goodrich,
Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. “Stan : A
Probabilistic Programming Language”. In: Journal of Statistical Software 76.1 (2017). issn:
1548-7660. doi: 10.18637/jss.v076.i01.
[98] Shuxiong Wang, Matthew Karikomi, Adam L MacLean, and Qing Nie. “Cell Lineage and
Communication Network Inference via Optimization for Single-Cell Transcriptomics”. In: Nucleic
Acids Research 47.11 (Mar. 2019), e66–e66. issn: 0305-1048. doi: 10.1093/nar/gkz204.
[99] Ravin Kumar, Colin Carroll, Ari Hartikainen, and Osvaldo Martin. “ArviZ a Unified Library for
Exploratory Analysis of Bayesian Models in Python”. In: Journal of Open Source Software 4.33
(Jan. 2019), p. 1143. issn: 2475-9066. doi: 10.21105/joss.01143.
[100] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion,
Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg,
Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and
Édouard Duchesnay. “Scikit-Learn: Machine Learning in Python”. In: Journal of Machine Learning
Research 12.85 (2011), pp. 2825–2830. url: http://jmlr.org/papers/v12/pedregosa11a.html.
[101] V. A. Traag, L. Waltman, and N. J. van Eck. “From Louvain to Leiden: Guaranteeing
Well-Connected Communities”. In: Scientific Reports 9.1 (Dec. 2019), p. 5233. issn: 2045-2322. doi:
10.1038/s41598-019-41695-z.
[102] F. Alexander Wolf, Philipp Angerer, and Fabian J. Theis. “SCANPY: Large-Scale Single-Cell Gene
Expression Data Analysis”. In: Genome Biology 19.1 (Dec. 2018), p. 15. issn: 1474-760X. doi:
10.1186/s13059-017-1382-0.
[103] Pauli Virtanen et al. “SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python”. In:
Nature Methods 17.3 (Mar. 2020), pp. 261–272. issn: 1548-7091, 1548-7105. doi:
10.1038/s41592-019-0686-2.
[104] Ryan N Gutenkunst, Joshua J Waterfall, Fergal P Casey, Kevin S Brown, Christopher R Myers, and
James P Sethna. “Universally Sloppy Parameter Sensitivities in Systems Biology Models”. In: PLoS
Computational Biology 3.10 (Oct. 2007). Ed. by Adam P Arkin, e189. issn: 1553-7358. doi:
10.1371/journal.pcbi.0030189.
[105] Arvind Varma, Massimo Morbidelli, and Hua Wu. Parametric Sensitivity in Chemical Systems.
1st ed. Cambridge University Press, Mar. 1999. isbn: 978-0-521-62171-7 978-0-521-01984-2
978-0-511-72177-9. doi: 10.1017/CBO9780511721779.
[106] Michał Komorowski, Maria J. Costa, David A. Rand, and Michael P. H. Stumpf. “Sensitivity,
Robustness, and Identifiability in Stochastic Chemical Kinetics Models”. In: Proceedings of the
National Academy of Sciences 108.21 (2011), pp. 8645–8650. issn: 0027-8424. doi:
10.1073/pnas.1015814108.
80
[107] Xiaojie Qiu, Yan Zhang, Jorge D. Martin-Rufino, Chen Weng, Shayan Hosseinzadeh, Dian Yang,
Angela N. Pogson, Marco Y. Hein, Kyung Hoi (Joseph) Min, Li Wang, Emanuelle I. Grody,
Matthew J. Shurtleff, Ruoshi Yuan, Song Xu, Yian Ma, Joseph M. Replogle, Eric S. Lander,
Spyros Darmanis, Ivet Bahar, Vijay G. Sankaran, Jianhua Xing, and Jonathan S. Weissman.
“Mapping Transcriptomic Vector Fields of Single Cells”. In: Cell (Feb. 2022), S0092867421015774.
issn: 00928674. doi: 10.1016/j.cell.2021.12.045.
[108] Martin Feinberg. Foundations of Chemical Reaction Network Theory. 1st ed. 2019. Applied
Mathematical Sciences 202. Cham: Springer International Publishing : Imprint: Springer, 2019.
isbn: 978-3-030-03858-8. doi: 10.1007/978-3-030-03858-8.
[109] Gert-Jan Both, Subham Choudhury, Pierre Sens, and Remy Kusters. “DeepMoD: Deep Learning
for Model Discovery in Noisy Data”. In: Journal of Computational Physics 428 (Mar. 2021),
p. 109985. issn: 00219991. doi: 10.1016/j.jcp.2020.109985.
[110] Silviu-Marian Udrescu and Max Tegmark. “AI Feynman: A Physics-Inspired Method for Symbolic
Regression”. In: Science Advances 6.16 (Apr. 2020), eaay2631. issn: 2375-2548. doi:
10.1126/sciadv.aay2631.
[111] Peng Zheng, Travis Askham, Steven L. Brunton, J. Nathan Kutz, and Aleksandr Y. Aravkin. “A
Unified Framework for Sparse Relaxed Regularized Regression: SR3”. In: IEEE Access 7 (2019),
pp. 1404–1423. issn: 2169-3536. doi: 10.1109/ACCESS.2018.2886528.
[112] Kathleen Champion, Peng Zheng, Aleksandr Y. Aravkin, Steven L. Brunton, and J. Nathan Kutz.
“A Unified Sparse Optimization Framework to Learn Parsimonious Physics-Informed Models
From Data”. In: IEEE Access 8 (2020), pp. 169259–169271. issn: 2169-3536. doi:
10.1109/ACCESS.2020.3023625.
[113] Patrick A. K. Reinbold, Daniel R. Gurevich, and Roman O. Grigoriev. “Using Noisy or Incomplete
Data to Discover Models of Spatiotemporal Dynamics”. In: Physical Review E 101.1 (Jan. 2020),
p. 010203. doi: 10.1103/PhysRevE.101.010203.
[114] Daniel A. Messenger and David M. Bortz. “Weak SINDy: Galerkin-Based Data-Driven Model
Selection”. In: Multiscale Modeling & Simulation 19.3 (Jan. 2021), pp. 1474–1497. issn: 1540-3459,
1540-3467. doi: 10.1137/20M1343166.
[115] Karsten Ahnert and Markus Abel. “Numerical Differentiation of Experimental Data: Local versus
Global Methods”. In: Computer Physics Communications 177.10 (Nov. 2007), pp. 764–774. issn:
00104655. doi: 10.1016/j.cpc.2007.03.009.
[116] N. M. Mangan, J. N. Kutz, S. L. Brunton, and J. L. Proctor. “Model Selection for Dynamical
Systems via Sparse Regression and Information Criteria”. In: Proceedings of the Royal Society A:
Mathematical, Physical and Engineering Sciences 473.2204 (Aug. 2017), p. 20170009. issn:
1364-5021, 1471-2946. doi: 10.1098/rspa.2017.0009.
81
[117] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan,
Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf,
Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner,
Lu Fang, Junjie Bai, and Soumith Chintala. “PyTorch: An Imperative Style, High-Performance
Deep Learning Library”. In: Proceedings of the 33rd International Conference on Neural Information
Processing Systems. Red Hook, NY, USA: Curran Associates Inc., 2019.
[118] Marten Lienen and Stephan Günnemann. “Torchode: A Parallel ODE Solver for PyTorch”. In: The
Symbiosis of Deep Learning and Differential Equations II. 2022. url:
https://openreview.net/forum?id=uiKVKTiUYB0.
[119] Diederik P. Kingma and Jimmy Ba. “Adam: A Method for Stochastic Optimization”. In: (2014).
doi: 10.48550/ARXIV.1412.6980.
[120] Brian M. de Silva, Kathleen Champion, Markus Quade, Jean-Christophe Loiseau, J. Nathan Kutz,
and Steven L. Brunton. PySINDy: A Python Package for the Sparse Identification of Nonlinear
Dynamics from Data. Apr. 2020. arXiv: 2004.08424 [physics]. url:
http://arxiv.org/abs/2004.08424.
[121] Alan A. Kaptanoglu, Brian M. de Silva, Urban Fasel, Kadierdan Kaheman, Andy J. Goldschmidt,
Jared L. Callaham, Charles B. Delahunt, Zachary G. Nicolaou, Kathleen Champion,
Jean-Christophe Loiseau, J. Nathan Kutz, and Steven L. Brunton. “PySINDy: A Comprehensive
Python Package for Robust Sparse System Identification”. In: Journal of Open Source Software 7.69
(Jan. 2022), p. 3994. issn: 2475-9066. doi: 10.21105/joss.03994. arXiv: 2111.08481 [physics].
[122] James Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. “Algorithms for
Hyper-Parameter Optimization”. In: Proceedings of the 24th International Conference on Neural
Information Processing Systems. NIPS’11. Red Hook, NY, USA: Curran Associates Inc., 2011,
pp. 2546–2554. isbn: 978-1-61839-599-3.
[123] Matthias Feurer and Frank Hutter. “Hyperparameter Optimization”. In: Automated Machine
Learning. Ed. by Frank Hutter, Lars Kotthoff, and Joaquin Vanschoren. Cham: Springer
International Publishing, 2019, pp. 3–33. isbn: 978-3-030-05317-8 978-3-030-05318-5. doi:
10.1007/978-3-030-05318-5_1.
[124] Kadierdan Kaheman, Steven L. Brunton, and J. Nathan Kutz. “Automatic Differentiation to
Simultaneously Identify Nonlinear Dynamics and Extract Noise Probability Distributions from
Data”. In: Machine Learning: Science and Technology 3.1 (Mar. 2022), p. 015031. issn: 2632-2153.
doi: 10.1088/2632-2153/ac567a.
82
Appendices
Appendix A: Supplementary material for Chapter 2
A.1 Supplementary results
A.1.1 Analysis of scaling and clipping of prior distributions along cell chains
To assess the use of scaling and clipping the prior standard deviation, we studied to what extent various
levels of scaling/clipping control the variance of the posterior distribution. To understand the effects of
various choices of scaling/clipping, we ran sampling along the same cell chain using priors derived in three
different ways:
1. Use the posterior standard deviation of each parameter from the previous cell as the prior standard
deviation for the current cell (Scale-1.0)
2. Double the posterior standard deviation from the previous cell (Scale-2.0)
3. Scale the posterior standard deviation by 1.5 and clip it to [0.001, 5] (Similar-r1)
For Scale-1.0, we observed overfitting of Ca2+ trajectories. Trajectories simulated from sampled parameters
were close to the input trajectories (Figure A.2A), but the mean log posterior likelihood kept dropping as
more cells were sampled along the cell chain (Figure A.2B). Meanwhile, most parameters shrank quickly
in standard deviation (Figure A.2F–G). For Scale-2.0, the marginal distributions of some parameters were
stable (Figure A.2H), but others grew exponentially in mean and standard deviation (Figure A.2I). After
83
the 30th cell, the sampling time also began to increase rapidly (Figure A.2C), so we terminated sampling at
the 36th cell. Similar-r1 had stable log posteriors and mean errors (Figure A.2A–B). Some parameters still
tended to grow or shrink for certain parts of the cell chains, but the clipping technique prevented them
from growing or shrinking indefinitely (Figure A.2D–E). In other words, using the combination of scaling
and clipping, we were able to stabilize parameter distribution along the cell chain. The subsequent cell
chain runs in this study adopted these parameters for scaling/clipping.
A.1.2 Parameter inference for a single cell with multiple epochs
For each cell in Figure 2.1B–C, we also performed parameter inference with multiple epochs. For the
first epoch, we used the Lemon prior. After finishing one epoch, we used the posterior of the epoch to
construct the prior for the next epoch as described in Section 2.3.5. We found that, under the same NUTS
configuration, it usually took two to three epochs to reach the same posterior probabilities and fitting
errors that were attained by learning from the posterior of a different cell (Figure A.4A–F). Improvement
in posterior probabilities and fitting errors was usually insignificant after the second or the third epoch
(Figure A.4A–F), but the marginal posterior distributions of some parameters tended to become narrower
in range (Figure A.4G–H).
A.1.3 Parameter inference for a cell chain induced by Ca2+ signaling similarity
In addition to sampling along cell chains induced by cell-cell similarity in gene expression, we sampled a
cell chain induced by signaling similarity, denoted as Ca-similarity. Such cell chain may be useful when
gene expression data is not available. To construct this cell chain from the cells already sampled in Reduced3, we computed the Euclidean distance between the Ca2+ response trajectories for each pair of cells, built
a graph where each node is a cell and an edge is placed between two cells if their distance is below some
threshold, and ran DFS on the graph. The threshold was chosen to be 5.0 so that most cells were still
84
included in the same connected component and the resulting DFS tree was close to a straight chain. We
sampled the first 250 cells in Ca-similarity and performed clustering of cells on their posterior means as
with Reduced-3 and Random-2 (Section 2.4.5). The clustering was not able to separate cells by their Ca2+
signaling profiles or their gene expression profiles (Figure A.16A–B). A possible reason that the clustering was unable to divide cells into distinct populations could be that the Ca-similarity cell chain failed to
transfer information about gene expression from cell to cell. We observed that most cells in Ca-similarity
were not similar in gene expression with their predecessor or successor (Figure A.16C), whereas every cell
in Reduced-3 was at least similar in gene expression with both of its immediate neighbors (Figure A.16D).
Thus, in the case of Ca-similarity, information encoded within parameter posteriors regarding transcriptional states might have been lost as we sampled cells along the cell chain.
85
A.2 Supplementary tables
Run name Sampled
in chain?
Similaritybased? Reduced model NUTS hyperparameters
(warmup, max_tree_depth)
Indiv. cells run 1 N – N (500, 15)
Indiv. cells run 2 N – N (1000, 15)
Similar-r1 Y Expression-based N (500, 10)
Similar-r2 Y Expression-based N (1000, 15)
Similar-r3 Y Expression-based N (500, 15)
Reduced-1 Y Expression-based Constant η1 (500, 10)
Reduced-2 Y Expression-based Constant Be (500, 10)
Reduced-3 Y Expression-based Constant η1 and Be (500, 10)
Random-1 Y N Constant η1 and Be (500, 10)
Random-2 Y N Constant η1 and Be (500, 10)
Ca-similarity Y Ca2+-based Constant η1 and Be (500, 10)
Table A.1: Summary of inference runs referred to in Chapter 2. “Sampled in chain” refers to whether
the run is cell predecessor-based (cell prior informed by the previous cell’s posterior). “Similarity-based”
refers to whether the cell chain ordering is informed by cell-cell similarity in either gene expression or
Ca2+ signaling.
86
Cell ID Similar-r1 Indiv. cells run 1 Indiv. cells run 2
5121 441.07 ± 17.23 449.61 ± 3.26 449.55 ± 3.39
5125 440.46 ± 4.43 337.23 ± 21.41 326.50 ± 3.52
5116 398.47 ± 3.45 344.46 ± 3.39 343.90 ± 3.54
5107 378.01 ± 21.79 322.17 ± 58.75 337.87 ± 69.66
5093 382.29 ± 3.39 362.15 ± 2.97 358.37 ± 7.01
5127 408.79 ± 3.48 302.19 ± 68.33 365.09 ± 14.78
5117 408.77 ± 3.33 363.86 ± 9.18 359.55 ± 8.14
5046 385.26 ± 3.16 218.81 ± 7.49 224.66 ± 3.34
5091 356.95 ± 3.33 311.55 ± 17.02 322.25 ± 10.83
5082 329.24 ± 3.29 322.96 ± 4.15 323.99 ± 3.84
Table A.2: Comparison of mean log posterior for ten cells from the similarity-based chain
(Similar-r1) with the same cells sampled individually using uninformative prior with either 500
or 1000 warmup steps (Indiv. cells run 1 and Indiv. cells run 2). For each fitted cell the log posterior
is given ± the standard deviation.
87
Run Maximum
tree depth
Number of
warmup steps
Number of
converged cells
Number of
mixed chains
Rb
(mixed only)
Indiv. cells run 1 15 500 10 3.80 ± 0.40 1.77 ± 0.79
Indiv. cells run 2 15 1000 9 3.89 ± 0.31 1.70 ± 0.86
Similar-r1 10 500 10 3.80 ± 0.40 1.04 ± 0.10
Table A.3: Sampling performance of the similarity-based chain (Similar-r1) and two runs in
which the same cells were sampled individually using the uninformative Lemon prior with either 500 or 1000 warmup steps (Indiv. cell run 1 and Indiv. cell run 2). The number of mixed chains
(out of 4) and the average Rb are used to compare performance, each is given ± the standard deviation.
88
Run Sampling time
(minutes) Mean error
Reduced-3 67.25 ± 83.62 0.79 ± 0.37
Random-1 64.84 ± 46.43 0.82 ± 0.47
Table A.4: Comparison between cell chains sampled using similarity-informed priors (Reduced-3)
or randomly ordered (Random-1). The mean sampling times and mean errors between fitted trajectories
and data are given ± the standard deviation. No significant differences are observed between the two
chains. Both cell chains use a reduced model in which Be and η1 were set to constant values.
89
Run Number of
warmup steps
Max tree
depth Tree depth Sampling time
(minutes) Rb Mean error
Similar-r1 500 10 9.76 ± 0.43 68.11 ± 38.75 1.11 ± 0.33 0.72 ± 0.32
Similar-r2 1000 15 10.36 ± 0.80 253.23 ± 151.75 1.11 ± 0.40 0.65 ± 0.29
Similar-r3 500 15 10.72 ± 1.21 189.48 ± 154.54 1.08 ± 0.36 0.66 ± 0.29
Table A.5: Comparison between cell chains run with different hyperparameters, all based on
similarity-informed cell chains. The tree depth reported is the mean over the population of 500 fitted
cells. All statistics are given ± the standard deviation.
90
Run Tree depth Sampling time
(minutes) Rb Mean error
Similar-r1 9.76 ± 0.44 55.54 ± 34.67 1.12 ± 0.37 0.79 ± 0.37
Reduced-1 9.61 ± 0.66 52.73 ± 39.89 1.14 ± 0.41 0.84 ± 0.42
Reduced-2 9.61 ± 0.74 58.27 ± 66.76 1.18 ± 0.50 0.84 ± 0.55
Reduced-3 9.86 ± 0.35 67.25 ± 83.62 1.17 ± 0.47 0.79 ± 0.37
Table A.6: Comparison between cell chains run on the full model (Similar-r1) and models reduced
by setting one or two parameters to a constant. All statistics are given ± the standard deviation. No
significant differences are observed between the chains.
91
Parameter Mean Standard
deviation
Mean sensitivity
(0.01-quantile)
Mean sensitivity
(0.99-quantile)
ATP 0.007 0.010 8.98 14.0
KATP 0.032 0.017 5.43 4.47
Koff, ATP 0.031 0.016 5.57 4.50
VPLC 1.46 3.75 8.93 13.5
KIP3 0.207 0.548 17.3 6.86
Koff, IP3 0.043 0.038 4.31 3.52
a 0.017 0.008 2.87 1.78
dinh 0.810 2.80 1.28 2.01
Ke 0.005 0.014 0.47 0.60
d1 2.62 3.50 25.0 7.40
d5 0.870 1.01 2.64 2.68
ϵ 0.347 0.283 13.3 49.5
η2 0.223 0.224 13.8 58.2
η3 0.726 0.667 32.2 9.77
c0 20.0 14.2 9.65 38.1
k3 0.075 0.140 1.31 1.26
Table A.7: Statistics of posterior distributions for cells fit to the similarity-based chain and mean
parameter sensitivities, reduced model (Reduced-3). First, marginal posterior means of each parameter are computed for each cell, we then compute the mean and standard deviation of the posterior means of
each parameter over all cells, i.e. across the cell population. Mean sensitivities are taken from Figure 2.3B.
92
Gene Parameter Correlation p-value
PPP1CC η3 0.382423 1.75 × 10−15
CCDC47 η3 0.373109 9.30 × 10−15
PPP1CC a 0.355153 2.01 × 10−13
MCM6 Koff, IP3 −0.343088 1.42 × 10−12
PPP2CA η3 0.337860 3.23 × 10−12
ITPRIPL2 η3 0.335172 4.90 × 10−12
RCN1 η3 0.333198 6.64 × 10−12
MCM6 c0 0.329770 1.12 × 10−11
MCM6 η3 0.326605 1.80 × 10−11
CCDC47 Koff, IP3 −0.324868 2.33 × 10−11
PPP1CC Koff, IP3 −0.319847 4.88 × 10−11
PPP3CA η3 0.319310 5.28 × 10−11
RRM1 Koff, IP3 −0.318196 6.21 × 10−11
PRKCI η3 0.308928 2.33 × 10−10
PIP4K2C η3 0.308875 2.34 × 10−10
PPP1CC c0 0.308291 2.54 × 10−10
RRM1 η3 0.303851 4.70 × 10−10
E2F1 Koff, IP3 −0.301569 6.43 × 10−10
CCNE1 Koff, IP3 −0.292604 2.13 × 10−9
CCNE1 c0 0.284548 6.06 × 10−9
Table A.8: Gene-parameter pairs ranked by correlation coefficient in a similarity-based chain
(Reduced-3). Top variable genes were determined (see Section 2.3.7) and for all possible pairs of variable
genes and model parameters we computed the Pearson correlation between the gene expression values
and the posterior parameter means for all cells in the population. Then the gene-parameter pairs were
ranked by their absolute Pearson correlation coefficients (top 20 pairs listed).
93
Gene Parameter Correlation p-value
SLC25A6 KATP 0.200664 3.11 × 10−19
SLC25A6 Koff, ATP 0.180955 7.11 × 10−16
CCDC47 d5 0.175142 5.93 × 10−15
ATP2B1 d5 0.165159 1.92 × 10−13
PPP1CC d5 0.155976 3.92 × 10−12
PRKCI d5 0.150759 2.01 × 10−11
RCN1 η3 0.150009 2.54 × 10−11
RCN1 d5 0.149303 3.15 × 10−11
ATP2B1 η3 0.147382 5.63 × 10−11
PPP1CC η3 0.144494 1.33 × 10−10
PPP2CA a 0.144204 1.45 × 10−10
CCDC47 a 0.143404 1.84 × 10−10
PPP3CA d5 0.143021 2.05 × 10−10
PPM1F d5 0.141377 3.31 × 10−10
PPP3CA η3 0.139341 5.94 × 10−10
PPP1CC a 0.137844 9.08 × 10−10
RCN1 a 0.137340 1.05 × 10−9
CCDC47 k3 0.137064 1.13 × 10−9
ATP2C1 d5 0.136529 1.31 × 10−9
SLC25A6 d5 −0.136192 1.44 × 10−9
Table A.9: Gene-parameter pairs ranked by correlation coefficient in a randomly ordered chain
(Random-2). Top variable genes were determined (see Section 2.3.7) and for all possible pairs of variable
genes and model parameters we computed the Pearson correlation between the gene expression values
and the posterior parameter means for all cells in the population. Then the gene-parameter pairs were
ranked by their absolute Pearson correlation coefficients (top 20 pairs listed).
94
A.3 Supplementary figures
200 400 600 800 1000
Time (seconds)
1
2
3
4
5
C
a
2 + resp
o
nse (A
U)
Figure A.1: Ca2+ dynamic response trajectories to stimulus by ATP. Responses for 500 cells are shown
over a time frame of 800 seconds. Cells were stimulated by ATP at 200 seconds; the first 200 seconds prior
to stimulation are not shown. The raw Ca2+ trajectories were smoothed using a moving average filter with
a window size of twenty seconds.
95
1 6 11 16 21 26 31 36
Cell position
0
200
400
600
Mean log posterior
1 6 11 16 21 26 31 36
Cell position
0
200
400
600
800
1000
Time (minutes)
1 6 11 16 21 26 31 36
0.0
0.1
KATP
Scale-1.0
1 6 11 16 21 26 31 36
0.00
0.25
d1
Scale-1.0
1 6 11 16 21 26 31 36
0.0
0.1
KATP
Scale-2.0
1 6 11 16 21 26 31 36
0
200000
d1
Scale-2.0
A B
C
D E
F G
H I
Figure A.2: Analysis of scaling and clipping of prior distributions along cell chains. A: Mean error
of trajectories simulated from parameter posterior of 36 cells sampled from one of three chains, (Similarr1): posterior parameter standard deviation is scaled by a factor of 1.5 and clipped; (Scale-1.0): posterior
parameter standard deviation scaled by a factor of 1.0; and (Scale-2.0): posterior parameter standard deviation scaled by a factor of 2.0. Means and ranges are shown on the violin plots. Only 36 cells are compared
per chain because at this point in the chain run times for Scale-2.0 became prohibitive (see C). B: Mean log
posterior values per cell for each of the three chains between sampled trajectories and the data. Mean value
over a whole chain is given as horizontal line. C: Mean sampling times per cell. Mean value over a whole
chain is given as horizontal line. D–I: Marginal parameter distributions for KATP and d1 for Similar-r1
(D–E), Scale-1.0 (F-G) and Scale-2.0 (H–I).
96
A B
C D
E F RK BDF
Figure A.3: Comparison of stiff and non-stiff solvers for Ca2+ response simulation. We simulated
Ca2+ responses from sampled posterior for each individual cell with a non-stiff ODE solver (Runge-Kutta,
RK) and a stiff solver (based on backward differentiation formula, BDF). A–B: Examples of Ca2+ response
simulated by a non-stiff solver (RK scheme). C–D: Examples of Ca2+ response simulated by a stiff solver
(BDF scheme). E: Mean error from data using RK and BDF solvers for all cells. F: Mean Euclidean distance
between simulated Ca2+ trajectories from both solvers for the same cells.
97
1 2 3 4 5 6 7 8 9 10
Epoch
375
400
Mean log posterior
Cell 5117
1 2 3 4 5 6 7 8 9 10
Epoch
325
350
375
Mean log posterior
Cell 5082
1 2 3 4 5 6 7 8 9 10
Epoch
300
350
Mean log posterior
Cell 5091
Multiple epochs Similar-r1
A B
C D
E F
G H
Figure A.4: Comparison of parameter inference of cells in a chain vs. multiple inference runs on
the same cell. We trained individual cells with multiple epochs, starting with the Lemon prior for the first
epoch. A: Posterior probabilities of sampled parameters for Cell 5117, multi-epoch learning (blue points)
vs. the cell as fit in the chain Similar-r1. B: Mean error of trajectories simulated from sampled posterior
parameters for Cell 5117, multi-epoch learning vs. the cell as fit in the chain Similar-r1. C: As for (A) with
Cell 5082. D: As for (B) with Cell 5082. E: As for (A) with Cell 5091. F: As for (B) with Cell 5091. G:
Marginal posterior distributions of VPLC from multi-epoch training for cell 5121 vs. the cell as fit in the
chain Similar-r1 (“Similar”). H: As for (G) with d5.
98
A
B
Posterior0.08
0.0
Posterior3.91
0.19
Fold change 1.0
1.0
Fold change
2.43
6.42
Figure A.5: Comparison of replicate runs of random cell chains. A: Two randomly ordered cell
chains were run with different initial cells (6 unique initial cells for the blue chain and 3 for the orange)
followed by the same 100 cells. Marginal parameter posterior distributions for Koff, ATP (PLC degradation
rate) are shown (top), and the corresponding fold changes between consecutive cells (bottom). B: Marginal
posteriors and fold changes as for (A), with d5, the IP3 channel dissociation constant.
99
1 100 200 300 400 500
Cell position
0
200
400
600
Mean log posterior
C D
E
401 420 440 460 480 500
175
200
Be
301 320 340 360 380 400
600
625
η1 A B
Figure A.6: Comparison of model reduction for insensitive parameters. A–B: Marginal posterior
values for Be (A) and for η1 (B) from the chain Similar-r1. In both cases ‘drift’ is observed over 100 fitted
cells (positions 401–500 in the cell chain in (A) and 301–400 for (B)): these parameters are insensitive
given the data. C: Comparison of the full model (Similar-r1) with three reduced models. In Reduced-1, η1
is set to a constant value; in Reduced-2, Be is set to a constant value; in Reduced-3, both η1 and Be are
set to constant values. Constants taken from previous parameter estimation in Lemon et al. [17]. Mean
log posterior values are shown for all 500 cells. Means over cell chain is shown as horizontal line. No
significant differences are observed. D: Mean errors between sampled trajectories and the data. E: Mean
sampling times for each of the four chains. No significant differences are observed.
100
0.2 0.0 0.2 0.4
Correlation
0
50
100
150
200
250
Count
Genes-parameter correlations
Adjusted
p-value<0.05
Adjusted
p-value 0.05
Figure A.7: Distribution of Pearson correlations between genes and parameters from the chain
Reduced-3. For all significant gene-parameter pairs (adjusted p-value < 0.05), the absolute Pearson correlation was > 0.2.
101
A C
B D
Figure A.8: Comparison of global similarity via PCA using min-max normalization for chain
Reduced-3. A: Projection of all cells from the Reduced-3 chain onto first two PCs of cell 5106 via minmax normalization (cf. z-score normalization used in Fig 4). B: As for (A), with focal cell 4940. C: Mean
distances between focal cell 5106 and projected samples from (A). D: Mean distances between focal cell
4940 and projected samples from (B).
102
A C
B D
Figure A.9: Comparison of global similarity via PCA using min-max normalization for chain
Random-2. A: Projection of all cells from the Random-2 chain onto first two PCs of cell 5106 under zscore normalization. B: Projection of all cells onto first two PCs of cell 5106 under min-max normalization.
C: Mean distances between focal cell 5106 and projected samples from (A). D: Mean distances between
focal cell 5106 and projected samples from (B).
103
A B
Figure A.10: Comparison of cell-cell similarity along cell chains. A: Black spots in the lower triangle
indicate that the cell pair has gene expression similarity > 0 for the Reduced-3 chain. Red lines mark
the row/column of the 101st cell. Small clusters of similar cells are observed however these do not locate
preferentially near the diagonal, i.e. we do not see evidence of block structure. B: As for (A), for chain
Random-2. Similar cells are randomly distributed along the chain.
104
A B
Figure A.11: Comparison of Ca2+ dynamics for various cell clusterings of chain Reduced-3. A: Simulated Ca2+ dynamics sampled from the posterior distributions of cells clustered by posterior parameters.
Solid lines mark the mean Ca2+ response and shaded region marks one standard deviation. Distinct peak
heights and peak times are observed. Dashed lines mark the timing of the Ca2+ peak. B: Simulated Ca2+
dynamics sampled from the posterior distributions of cells clustered by gene expression. Distinct Ca2+
peaks are observed (low, mid, high).
105
Figure A.12: Comparison of model dynamics in clusters from Reduced-3 by posterior parameter
clustering. For each of the clusters identified (early, low, and late-high) samples from the posterior distributions of cells belonging to those clusters are plotted for the three model species: PLC, IP3, and h. Colors
denote samples from different cells. Early responders are characterized by transient peaked trajectories
for PLC and IP3. Low responders are characterized by low activation of the IP3 receptor (h).
106
ATP KATP Koff, ATP VPLC
KIP3 Koff, IP3 a dinh
Ke d1 d5 ε
η2 η3 c0 k3
Early Low Late-high
Figure A.13: Comparison of parameter distributions across clusters for Reduced-3. To compare
parameter distribution between clusters (for Reduced-3 clustered by posterior parameters), we performed
bootstrap sampling to obtain samples of the same size for each cluster. Boxplots for each parameter in
each cluster are shown. We observed different distributions of the bootstrapped samples in each cluster.
For instance, low responders had higher concentrations of free Ca2+ in ER (c0); early responders differed
from others in those parameters controlling the PLC and IP3 dynamics.
107
Figure A.14: Comparison of model dynamics in clusters from Random-2 by posterior parameter
clustering. For each of the clusters identified (C1, C2, C3) samples from the posterior distributions of
cells belonging to those clusters are plotted for the three model species: PLC, IP3, and h. Colors denote
samples from different cells. C2 is characterized by more variable responses than the other clusters; no
clearly discernable patterns in the responses are observed.
108
Low High Medium
−0.5 0.0 0.5
Mean expression
in group
RCN1
PPP1CC
CCDC47
PPP3CA
ATP2B1
PPP2CA
ATP2C1
PRKCI
ITPRIPL2
ITPR2
MCM2
LETM1
PLK1
CABIN1
KIF20A
CAPN1
SPHK1
CALML4
PPP3R2
PPIF
SLC25A6
CAPN1
GNAS
ITPR3
TRPC4
CDH1
AVPR1B
CPPED1
HPCAL1
PPM1H
Low
High
Medium
0 2 4 6 8 10
Peak Ca2 + response (AU)
Density
Low
High
Medium
Time 0.0
2.5
Low
Time 0.0
2.5
High
0 50 100
Time
0.0
2.5
Medium
Ca2+ response (AU)
A B
C
Figure A.15: Clustering cells by Ca2+ response profiles. For 500 cells (same cells as from the chains
“Reduced-3” and “Ca-similarity”) we performed k-means clustering with k=3 on the Ca2+ response trajectories of the cells. A: Ca2+ response trajectories for cells from each of the three clusters identified. The
mean Ca2+ response (solid line) and the standard deviation (shaded region) are shown. B: Kernel density
estimation of the distribution of peak Ca2+ responses in each cluster. C: Marker genes corresponding to
each k-means cluster.
109
Reduced-3
(Gene similarity-based)
A
B
C D
Peak Ca2+ response (AU)
Ca-similarity
■ Similar
□ Not similar
Figure A.16: Posterior parameter clustering for a cell chain constructed via Ca2+ signaling similarity. Inference was performed on cells sampled along a cell chain constructed by Ca2+ signaling similarity (“Ca-similarity”), followed by hierarchical clustering on the posterior parameter means of each cell.
A: Kernel density estimation of peak Ca2+ response in each posterior parameter cluster. B: Top marker
genes associated with each cluster. C: Gene expression similarity between cells, for cells plotted from
“Ca-similarity,” a cell chain based on similarity in Ca2+ signaling. Black pixel indicates that the two corresponding cells are similar in gene expression. D: Gene expression similarity between cells, for cells plotted
from “Reduced-3,” a cell chain based on similarity in gene expression.
110
A.4 Data availability
Parameter inference was developed in Python 3.6 and Stan 2.19. Posterior analyses were developed in
Python 3.8. All code developed to simulate models and run parameter inference is released under an MIT
license at: https://github.com/maclean-lab/singlecell-parinf. Ca2+ data and MERFISH data (processed
files) are also available at the GitHub repository.
111
Appendix B: Supplementary material for Chapter 3
B.1 Supplementary tables
Noise type Noise level Window size Batch size Learning rate
Additive 0.1% 10 5 0.01
Additive 0.5% 5 5 0.01
Additive 1% 5 5 0.01
Additive 5% 5 20 0.01
Additive 10% 10 10 0.01
Additive 20% 5 10 0.01
Multiplicative 0.1% 10 10 0.1
Multiplicative 0.5% 10 5 0.01
Multiplicative 1% 10 5 0.01
Multiplicative 5% 10 5 0.01
Multiplicative 10% 10 5 0.01
Multiplicative 20% 5 5 0.1
Table B.1: Hyperparameters used to attain lowest mean squared losses for hybrid dynamical models from Lotka-Volterra datasets.
112
Noise type Noise level Lowest AICc for
given noise model Recovered model
Additive 10% −145.6818318
x
′
1 =1.3x1 − 0.841x1x2
x
′
2 =−1.8x2 + 0.924x1x2
Additive 20% N/A Not found
Multiplicative 10% −58.92769301
x
′
1 =1.3x1 − 1.063x1x2
x
′
2 =−1.8x2 + 0.877x1x2
Multiplicative 20% N/A Not found
Table B.2: Best (according to AICc) correct ODE models recovered from Lotka-Volterra datasets
using hybrid formulation. Terms colored in green are known terms in Equations Equation 3.4. Models
already listed in Table 3.2 are not shown.
113
Noise type Noise level Step size Basis library α (STLSQ regularization)
Additive 0.1% 0.1 poly_max_2
poly_2_3
0.05
0.05, 0.1, 0.5, 1, 5
Additive 0.5% 0.1 poly_max_2
poly_2_3
0.05, 0.1
0.05, 0.1, 0.5, 1, 5
Additive 1% 0.05 poly_max_2
poly_2_3
0.05, 0.1
0.05, 0.1, 0.5
Additive 5% 0.05 poly_max_2
poly_2_3
0.05, 0.1, 0.5, 1
5, 10
Additive 10% 0.05 poly_2_3 0.05, 0.1, 0.5
Additive 20% 0.1 poly_2_3 1
Multiplicative 0.1% 0.05 poly_2_3 0.05, 0.1, 0.5, 1, 5, 10
Multiplicative 0.5% 0.1 poly_max_2
poly_2_3
0.05, 0.1
0.05, 0.1, 0.5, 1, 5
Multiplicative 1% 0.1 poly_max_2
poly_2_3
0.05, 0.1, 0.5
0.05, 0.1, 0.5, 1, 5, 10
Multiplicative 5% 0.1 poly_max_2 0.05, 0.1, 0.5, 1, 5, 10
Multiplicative 10% 0.1 poly_2_3 0.05, 0.1, 0.5
Multiplicative 20% 0.1 poly_max_2 5
Table B.3: Hyperparameters used to attain lowest AICcs for SINDy regression on Lotka-Volterra
datasets.
114
Noise type Noise level Lowest AICc for
given noise model Recovered model
Additive 0.1% −309.3006393
x
′
1 =1.301x1 − 0.892x1x2
x
′
2 = − 1.801x2 + 0.792x1x2
Additive 0.5% −299.044635
x
′
1 =1.302x1 − 0.892x1x2
x
′
2 = − 1.800x2 + 0.790x1x2
Additive 1% −283.3833402
x
′
1 =1.303x1 − 0.892x1x2
x
′
2 = − 1.799x2 + 0.786x1x2
Additive 5% −178.2411226
x
′
1 =1.310x1 − 0.883x1x2
x
′
2 = − 1.782x2 + 0.728x1x2
Additive 10% −104.6566267
x
′
1 =1.317x1 − 0.859x1x2
x
′
2 = − 1.759x2 + 0.653x1x2
Additive 20% −26.46272851
x
′
1 =1.323x1 − 0.803x1x2
x
′
2 = − 1.732x2 + 0.585x1x2
Multiplicative 0.1% −297.8189774
x
′
1 =1.301x1 − 0.892x1x2
x
′
2 = − 1.801x2 + 0.791x1x2
Multiplicative 0.5% −254.144023
x
′
1 =1.305x1 − 0.890x1x2
x
′
2 = − 1.800x2 + 0.784x1x2
Multiplicative 1% −214.856131
x
′
1 =1.310x1 − 0.888x1x2
x
′
2 = − 1.799x2 + 0.775x1x2
Multiplicative 5% N/A Not found
Multiplicative 10% N/A Not found
Multiplicative 20% N/A Not found
Table B.4: Best (according to AICc) correct ODE models recovered from Lotka-Volterra datasets
using base SINDy.
115
Noise type Noise level Lowest AICc for
given noise model Recovered model
Additive 0.1% −432.5755145
x
′
1 =1.297x1 − 0.909x1x2
x
′
2 = − 1.812x2 + 0.798x1x2
Additive 0.5% N/A Not found
Additive 1% −299.504492
x
′
1 =1.274x1 − 0.869x1x2
x
′
2 = − 1.780x2 + 0.658x1x2
Additive 5% −39.08409104
x
′
1 =1.296x1 − 0.715x1x2
x
′
2 = − 1.772x2 + 0.746x1x2
Additive 10% −147.4397894
x
′
1 =1.329x1 − 0.969x1x2
x
′
2 = − 1.942x2 + 1.057x1x2
Additive 20% N/A Not found
Multiplicative 0.1% −332.4479059
x
′
1 =1.301x1 − 0.896x1x2
x
′
2 = − 1.801x2 + 0.789x1x2
Multiplicative 0.5% −318.4897128
x
′
1 =1.306x1 − 0.903x1x2
x
′
2 = − 1.821x2 + 0.842x1x2
Multiplicative 1% −81.38255749
x
′
1 =1.310x1 − 0.817x1x2
x
′
2 = − 1.903x2 + 0.955x1x2
Multiplicative 5% N/A Not found
Multiplicative 10% N/A Not found
Multiplicative 20% N/A Not found
Table B.5: Best (according to AICc) correct ODE models recovered from Lotka-Volterra datasets
using pure neural network formulation.
116
Rank Hyperparameters AICc Recovered model
1
Step size: 0.2
Basis library: hill_2
α: 10
−22.67366208
x
′
1 = − 2.152/(1 + x
2
2) + 9.303/(1 + x
2
3)
x
′
2 =7.755/(1 + x
2
1)
x
′
3 = − 1.491/(1 + x
2
1) + 10.563/(1 + x
2
2) − 1.897/(1 + x
2
3)
2
Step size: 0.1
Basis library: hill_1
α: 0.05, 0.1, 0.5, 1, 5, 10
−20.7715785
x
′
1 = − 3.861/(1 + x1) − 4.308/(1 + x2) + 13.194/(1 + x3)
x
′
2 =12.594/(1 + x1) − 4.110/(1 + x2) − 3.268/(1 + x3)
x
′
3 = − 4.412/(1 + x1) + 14.670/(1 + x2) − 5.194/(1 + x3)
3
Step size: 0.2
Basis library: hill_1
α: 0.05, 0.1, 0.5, 1, 5, 10
−20.67030125
x
′
1 = − 3.850/(1 + x1) − 4.320/(1 + x2) + 13.193/(1 + x3)
x
′
2 =12.583/(1 + x1) − 4.096/(1 + x2) − 3.277/(1 + x3)
x
′
3 = − 4.365/(1 + x1) + 14.618/(1 + x2) − 5.186/(1 + x3)
4
Step size: 0.2
Basis library: hill_3
α: 0.05, 0.1, 0.5, 1, 5, 10
−17.81670648
x
′
1 =9.725/(1 + x
3
3)
x
′
2 =9.577/(1 + x
3
1)
x
′
3 =10.247/(1 + x
3
2)
5
Step size: 0.1
Basis library: hill_3
α: 0.05, 0.1, 0.5, 1, 5, 10
−17.68727673
x
′
1 =9.724/(1 + x
3
3)
x
′
2 =9.615/(1 + x
3
1)
x
′
3 =10.253/(1 + x
3
2)
6
Step size: 0.1
Basis library: hill_2
α: 10
−8.603097632
x
′
1 = − 1.143/(1 + x
2
1) − 1.591/(1 + x
2
2) + 9.762/(1 + x
2
3)
x
′
2 =7.759/(1 + x
2
1)
x
′
3 = − 1.515/(1 + x
2
1) + 10.586/(1 + x
2
2) − 1.896/(1 + x
2
3)
7
Step size: 0.2
Basis library: hill_2
α: 5
−8.236993872
x
′
1 = − 1.134/(1 + x
2
1) − 1.600/(1 + x
2
2) + 9.762/(1 + x
2
3)
x
′
2 =7.755/(1 + x
2
1)
x
′
3 = − 1.491/(1 + x
2
1) + 10.563/(1 + x
2
2) − 1.897/(1 + x
2
3)
8
Step size: 0.2
Basis library: hill_2
α: 0.05, 0.1, 0.5, 1
50.57451531
x
′
1 = − 1.134/(1 + x
2
1) − 1.600/(1 + x
2
2) + 9.762/(1 + x
2
3)
x
′
2 =8.906/(1 + x
2
1) − 1.477/(1 + x
2
2)
x
′
3 = − 1.491/(1 + x
2
1) + 10.563/(1 + x
2
2) − 1.897/(1 + x
2
3)
9
Step size: 0.1
Basis library: hill_2
α: 0.05, 0.1, 0.5, 1, 5
51.21561571
x
′
1 = − 1.143/(1 + x
2
1) − 1.591/(1 + x
2
2) + 9.762/(1 + x
2
3)
x
′
2 =8.926/(1 + x
2
1) − 1.487/(1 + x
2
2)
x
′
3 = − 1.515/(1 + x
2
1) + 10.586/(1 + x
2
2) − 1.896/(1 + x
2
3)
10
Step size: 0.1
Basis library: hill_max_3
α: 5
51.56505241
x
′
1 =9.724/(1 + x
3
3)
x
′
2 =6.307/(1 + x1) − 1.794/(1 + x
2
1) − 2.104/(1 + x
2
3)
+ 5.203/(1 + x
3
1) − 1.997/(1 + x
3
2)
x
′
3 =1.386/(1 + x2) − 5.322/(1 + x
2
2) + 14.700/(1 + x
3
2)
Table B.6: Top 10 (according to AICc) models recovered from repressilator data with 5% additive
noise.
117
Rank Hyperparameters AICc Recovered model
1
Step size: 0.2
Basis library: hill_2
α: 10
−24.58871036
x
′
1 = − 2.129/(1 + x
2
2) + 9.345/(1 + x
2
3)
x
′
2 =9.502/(1 + x
2
1) − 2.427/(1 + x
2
3)
x
′
3 = − 2.177/(1 + x
2
1) + 10.042/(1 + x
2
2)
2
Step size: 0.1
Basis library: hill_2
α: 0.05, 0.1, 0.5, 1, 5, 10
−20.5825071
x
′
1 = − 2.123/(1 + x
2
2) + 9.340/(1 + x
2
3)
x
′
2 =10.004/(1 + x
2
1) − 1.203/(1 + x
2
2) − 1.877/(1 + x
2
3)
x
′
3 = − 1.692/(1 + x
2
1) + 10.513/(1 + x
2
2) − 1.152/(1 + x
2
3)
3
Step size: 0.2
Basis library: hill_2
α: 0.05, 0.1, 0.5, 1, 5
−20.57472423
x
′
1 = − 2.129/(1 + x
2
2) + 9.345/(1 + x
2
3)
x
′
2 =9.993/(1 + x
2
1) − 1.201/(1 + x
2
2) − 1.874/(1 + x
2
3)
x
′
3 = − 1.682/(1 + x
2
1) + 10.501/(1 + x
2
2) − 1.151/(1 + x
2
3)
4
Step size: 0.2
Basis library: hill_1
α: 0.05, 0.1, 0.5, 1, 5, 10
−2.956417432
x
′
1 = − 3.618/(1 + x1) − 4.380/(1 + x2) + 13.044/(1 + x3)
x
′
2 =14.126/(1 + x1) − 4.181/(1 + x2) − 5.105/(1 + x3)
x
′
3 = − 4.782/(1 + x1) + 14.346/(1 + x2) − 4.176/(1 + x3)
5
Step size: 0.1
Basis library: hill_1
α: 0.05, 0.1, 0.5, 1, 5, 10
−1.955841306
x
′
1 = − 3.621/(1 + x1) − 4.361/(1 + x2) + 13.031/(1 + x3)
x
′
2 =14.139/(1 + x1) − 4.189/(1 + x2) − 5.107/(1 + x3)
x
′
3 = − 4.805/(1 + x1) + 14.372/(1 + x2) − 4.180/(1 + x3)
6
Step size: 0.2
Basis library: hill_3
α: 0.05, 0.1, 0.5, 1, 5, 10
28.74298589
x
′
1 =10.404/(1 + x
3
3)
x
′
2 =9.797/(1 + x
3
1)
x
′
3 =10.722/(1 + x
3
2)
7
Step size: 0.1
Basis library: hill_3
α:
28.81695585
x
′
1 =10.404/(1 + x
3
3)
x
′
2 =9.809/(1 + x
3
1)
x
′
3 =10.723/(1 + x
3
2)
8
Step size: 0.1
Basis library: hill_max_3
α: 0.1
79.43703754
x
′
1 =5.295/(1 + x1) − 4.002/(1 + x
2
1) − 5.449/(1 + x
2
2)
+ 3.413/(1 + x
3
2) + 9.543/(1 + x
3
3)
x
′
2 = − 34.742/(1 + x1) + 45.179/(1 + x2) + 4.220/(1 + x3)
+ 45.946/(1 + x
2
1) − 49.284/(1 + x
2
2) − 16.262/(1 + x
2
3)
− 15.055/(1 + x
3
1) + 15.100/(1 + x
3
2) + 8.016/(1 + x
3
3)
x
′
3 =3.746/(1 + x1) − 6.886/(1 + x
2
1) − 2.487/(1 + x
2
3)
+ 3.432/(1 + x
3
1) + 10.515/(1 + x
3
2) + 2.621/(1 + x
3
3)
9
Step size: 0.2
Basis library: hill_max_3
α: 0.05
79.73919266
x
′
1 =5.332/(1 + x1) − 4.026/(1 + x
2
1) − 5.516/(1 + x
2
2)
+ 3.466/(1 + x
3
2) + 9.541/(1 + x
3
3)
x
′
2 = − 34.252/(1 + x1) + 44.754/(1 + x2) + 4.193/(1 + x3)
+ 45.436/(1 + x
2
1) − 48.847/(1 + x
2
2) − 16.224/(1 + x
2
3)
− 14.887/(1 + x
3
1) + 14.936/(1 + x
3
2) + 7.987/(1 + x
3
3)
x
′
3 =3.737/(1 + x1) − 6.872/(1 + x
2
1) − 2.497/(1 + x
2
3)
+ 3.429/(1 + x
3
1) + 10.516/(1 + x
3
2) + 2.633/(1 + x
3
3)
Continued on the next page
118
Rank Hyperparameters AICc Recovered model
10
Step size: 0.2
Basis library: hill_max_3
α: 0.1
85.97584009
x
′
1 =6.907/(1 + x1) − 1.736/(1 + x3) − 4.838/(1 + x
2
1)
− 5.065/(1 + x
2
2) + 3.142/(1 + x
3
2) + 10.705/(1 + x
3
3)
x
′
2 = − 34.252/(1 + x1) + 44.754/(1 + x2) + 4.193/(1 + x3)
+ 45.436/(1 + x
2
1) − 48.847/(1 + x
2
2) − 16.224/(1 + x
2
3)
− 14.887/(1 + x
3
1) + 14.936/(1 + x
3
2) + 7.987/(1 + x
3
3)
x
′
3 =3.737/(1 + x1) − 6.872/(1 + x
2
1) − 2.497/(1 + x
2
3)
+ 3.429/(1 + x
3
1) + 10.516/(1 + x
3
2) + 2.633/(1 + x
3
3)
Table B.7: Top 10 (according to AICc) models recovered from repressilator data with 10% additive
noise.
119
Rank Hyperparameters AICc Recovered model
1
Step size: 0.2
Basis library: hill_2
α: 0.05, 0.1, 0.5, 1
−25.74424163
x
′
1 = − 1.873/(1 + x
2
2) + 9.551/(1 + x
2
3)
x
′
2 =10.506/(1 + x
2
1) − 1.660/(1 + x
2
2) − 1.610/(1 + x
2
3)
x
′
3 = − 1.821/(1 + x
2
1) + 10.355/(1 + x
2
2) − 1.087/(1 + x
2
3)
2
Step size: 0.1
Basis library: hill_2
α: 5, 10
−25.71786398
x
′
1 = − 1.874/(1 + x
2
2) + 9.555/(1 + x
2
3)
x
′
2 =10.521/(1 + x
2
1) − 1.675/(1 + x
2
2) − 1.605/(1 + x
2
3)
x
′
3 = − 2.307/(1 + x
2
1) + 9.942/(1 + x
2
2)
3
Step size: 0.2
Basis library: hill_2
α: 5, 10
−25.71164361
x
′
1 = − 1.873/(1 + x
2
2) + 9.551/(1 + x
2
3)
x
′
2 =10.506/(1 + x
2
1) − 1.660/(1 + x
2
2) − 1.610/(1 + x
2
3)
x
′
3 = − 2.290/(1 + x
2
1) + 9.923/(1 + x
2
2)
4
Step size: 0.1
Basis library: hill_2
α: 0.05, 0.1, 0.5, 1
−25.63659228
x
′
1 = − 1.874/(1 + x
2
2) + 9.555/(1 + x
2
3)
x
′
2 =10.521/(1 + x
2
1) − 1.675/(1 + x
2
2) − 1.605/(1 + x
2
3)
x
′
3 = − 1.838/(1 + x
2
1) + 10.372/(1 + x
2
2) − 1.090/(1 + x
2
3)
5
Step size: 0.2
Basis library: hill_1
α: 0.05, 0.1, 0.5, 1, 5, 10
−16.34122650
x
′
1 = − 3.689/(1 + x1) − 4.070/(1 + x2) + 13.008/(1 + x3)
x
′
2 =14.532/(1 + x1) − 4.931/(1 + x2) − 4.623/(1 + x3)
x
′
3 = − 4.997/(1 + x1) + 14.225/(1 + x2) − 4.097/(1 + x3)
6
Step size: 0.1
Basis library: hill_1
α: 0.05, 0.1, 0.5, 1, 5, 10
−16.13434648
x
′
1 = − 3.678/(1 + x1) − 4.070/(1 + x2) + 13.000/(1 + x3)
x
′
2 =14.541/(1 + x1) − 4.952/(1 + x2) − 4.609/(1 + x3)
x
′
3 = − 5.034/(1 + x1) + 14.270/(1 + x2) − 4.108/(1 + x3)
7
Step size: 0.2
Basis library: hill_3
α: 0.05, 0.1, 0.5, 1, 5, 10
−0.17817327
x
′
1 =11.224/(1 + x
3
3)
x
′
2 =10.628/(1 + x
3
1)
x
′
3 =10.740/(1 + x
3
2)
8
Step size: 0.1
Basis library: hill_3
α: 0.05, 0.1, 0.5, 1, 5, 10
−0.12239612
x
′
1 =11.226/(1 + x
3
3)
x
′
2 =10.648/(1 + x
3
1)
x
′
3 =10.740/(1 + x
3
2)
9
Step size: 0.2
Basis library: hill_max_3
α: 0.1
12.20357058
x
′
1 =5.562/(1 + x1) − 2.773/(1 + x3) − 4.082/(1 + x
3
1)
− 1.689/(1 + x
3
2) + 11.105/(1 + x
3
3)
x
′
2 = − 58.308/(1 + x1) + 70.190/(1 + x2) − 0.715/(1 + x3)
+ 64.896/(1 + x
2
1) − 76.735/(1 + x
2
2) − 3.163/(1 + x
2
3)
− 17.729/(1 + x
3
1) + 26.598/(1 + x
3
2)
x
′
3 =1.973/(1 + x1) − 2.301/(1 + x
2
1) + 10.076/(1 + x
3
2)
10
Step size: 0.1
Basis library: hill_max_3
α: 0.5
58.68262370
x
′
1 =5.187/(1 + x1) − 1.217/(1 + x3) − 2.274/(1 + x
2
3)
− 4.063/(1 + x
3
1) − 1.861/(1 + x
3
2) + 12.083/(1 + x
3
3)
x
′
2 = − 57.836/(1 + x1) + 69.989/(1 + x2) − 0.730/(1 + x3)
+ 64.231/(1 + x
2
1) − 76.587/(1 + x
2
2) − 3.195/(1 + x
2
3)
− 17.422/(1 + x
3
1) + 26.548/(1 + x
3
2)
x
′
3 =10.740/(1 + x
3
2)
Table B.8: Top 10 (according to AICc) models recovered from repressilator data with 10% multiplicative noise.
120
Rank Hyperparameters AICc Recovered model
1
Step size: 0.2
Basis library: hill_2
α: 10
−16.96979235
x
′
1 =7.800/(1 + x
2
3)
x
′
2 =7.514/(1 + x
2
1)
x
′
3 =7.401/(1 + x
2
2)
2
Step size: 0.2
Basis library: hill_2
α: 1, 5
−1.435309974
x
′
1 =7.800/(1 + x
2
3)
x
′
2 =10.740/(1 + x
2
1) − 1.543/(1 + x
2
2) − 1.966/(1 + x
2
3)
x
′
3 = − 1.750/(1 + x
2
1) + 8.924/(1 + x
2
2)
3
Step size: 0.1
Basis library: hill_2
α: 5, 10
−1.424593855
x
′
1 =7.799/(1 + x
2
3)
x
′
2 =10.740/(1 + x
2
1) − 1.543/(1 + x
2
2) − 1.967/(1 + x
2
3)
x
′
3 = − 1.753/(1 + x
2
1) + 8.926/(1 + x
2
2)
4
Step size: 0.1
Basis library: hill_3
α: 0.05, 0.1, 0.5, 1, 5, 10
10.76638232
x
′
1 =11.973/(1 + x
3
3)
x
′
2 =11.165/(1 + x
3
1)
x
′
3 =9.406/(1 + x
3
2) + 1.521/(1 + x
3
3)
5
Step size: 0.2
Basis library: hill_3
α: 0.05, 0.1, 0.5, 1, 5
10.79154978
x
′
1 =11.975/(1 + x
3
3)
x
′
2 =11.166/(1 + x
3
1)
x
′
3 =9.407/(1 + x
3
2) + 1.520/(1 + x
3
3)
6
Step size: 0.2
Basis library: hill_max_3
α: 10
22.40843998
x
′
1 =3.012/(1 + x
2
3) + 7.410/(1 + x
3
3)
x
′
2 =13.640/(1 + x1) − 30.949/(1 + x
2
1) + 30.783/(1 + x
3
1)
− 1.220/(1 + x
3
2) − 1.957/(1 + x
3
3)
x
′
3 =1.476/(1 + x2) + 2.962/(1 + x
2
2) − 1.557/(1 + x
3
1)
+ 4.631/(1 + x
3
2)
7
Step size: 0.1
Basis library: hill_max_3
α: 5
22.46295063
x
′
1 =3.007/(1 + x
2
3) + 7.416/(1 + x
3
3)
x
′
2 =13.655/(1 + x1) − 31.008/(1 + x
2
1) + 30.838/(1 + x
3
1)
− 1.219/(1 + x
3
2) − 1.956/(1 + x
3
3)
x
′
3 =1.493/(1 + x2) + 2.920/(1 + x
2
2) − 1.560/(1 + x
3
1)
+ 4.659/(1 + x
3
2)
8
Step size: 0.1
Basis library: hill_1
α: 0.05, 0.1, 0.5, 1, 5, 10
39.68654699
x
′
1 = − 4.009/(1 + x1) − 3.588/(1 + x2) + 12.897/(1 + x3)
x
′
2 =14.773/(1 + x1) − 4.593/(1 + x2) − 5.291/(1 + x3)
x
′
3 = − 4.879/(1 + x1) + 12.309/(1 + x2) − 2.569/(1 + x3)
9
Step size: 0.2
Basis library: hill_1
α: 0.05, 0.1, 0.5, 1, 5, 10
39.83585410
x
′
1 = − 4.012/(1 + x1) − 3.585/(1 + x2) + 12.896/(1 + x3)
x
′
2 =14.774/(1 + x1) − 4.593/(1 + x2) − 5.291/(1 + x3)
x
′
3 = − 4.875/(1 + x1) + 12.303/(1 + x2) − 2.568/(1 + x3)
10
Step size: 0.2
Basis library: hill_1
α: 0.05, 0.1, 0.5
41.12183361
x
′
1 = − 1.620/(1 + x
2
1) + 9.418/(1 + x
2
3)
x
′
2 =10.740/(1 + x
2
1) − 1.543/(1 + x
2
2) − 1.966/(1 + x
2
3)
x
′
3 = − 1.750/(1 + x
2
1) + 8.924/(1 + x
2
2)
Table B.9: Top 10 (according to AICc) models recovered from repressilator data with 20% additive
noise.
121
Rank Hyperparameters AICc Recovered model
1
Step size: 0.1
Basis library: hill_2
α: 10
−13.97239256
x
′
1 = − 1.887/(1 + x
2
2) + 9.666/(1 + x
2
3)
x
′
2 =8.569/(1 + x
2
1)
x
′
3 = − 3.325/(1 + x
2
1) + 11.144/(1 + x
2
2)
2
Step size: 0.2
Basis library: hill_2
α: 5
−13.94812401
x
′
1 = − 1.884/(1 + x
2
2) + 9.657/(1 + x
2
3)
x
′
2 =8.564/(1 + x
2
1)
x
′
3 = − 3.311/(1 + x
2
1) + 11.124/(1 + x
2
2)
3
Step size: 0.2
Basis library: hill_2
α: 10
−13.84497117
x
′
1 =7.949/(1 + x
2
3)
x
′
2 =8.564/(1 + x
2
1)
x
′
3 = − 3.311/(1 + x
2
1) + 11.124/(1 + x
2
2)
4
Step size: 0.1
Basis library: hill_2
α: 5
−12.96165323
x
′
1 = − 1.887/(1 + x
2
2) + 9.666/(1 + x
2
3)
x
′
2 =10.405/(1 + x
2
1) − 1.898/(1 + x
2
2)
x
′
3 = − 3.325/(1 + x
2
1) + 11.144/(1 + x
2
2)
5
Step size: 0.1
Basis library: hill_2
α: 0.05, 0.1, 0.5, 1
−5.807378842
x
′
1 = − 1.887/(1 + x
2
2) + 9.666/(1 + x
2
3)
x
′
2 =10.948/(1 + x
2
1) − 1.333/(1 + x
2
2) − 1.132/(1 + x
2
3)
x
′
3 = − 2.764/(1 + x
2
1) + 11.728/(1 + x
2
2) − 1.168/(1 + x
2
3)
6
Step size: 0.2
Basis library: hill_2
α: 0.05, 0.1, 0.5, 1
−5.800124182
x
′
1 = − 1.884/(1 + x
2
2) + 9.657/(1 + x
2
3)
x
′
2 = − 1.884/(1 + x
2
2) + 9.657/(1 + x
2
3)
x
′
3 = − 2.747/(1 + x
2
1) + 11.714/(1 + x
2
2) − 1.174/(1 + x
2
3)
7
Step size: 0.2
Basis library: hill_3
α: 0.05, 0.1, 0.5, 1, 5
−3.961642952
x
′
1 =1.731/(1 + x
3
1) + 10.462/(1 + x
3
3)
x
′
2 =13.276/(1 + x
3
1)
x
′
3 =12.384/(1 + x
3
2)
8
Step size: 0.1
Basis library: hill_3
α: 0.05, 0.1, 0.5, 1, 5, 10
−3.468926352
x
′
1 =1.712/(1 + x
3
1) + 10.492/(1 + x
3
3)
x
′
2 =13.280/(1 + x
3
1)
x
′
3 =12.394/(1 + x
3
2)
9
Step size: 0.2
Basis library: hill_1
α: 0.05, 0.1, 0.5, 1, 5, 10
−1.26505732
x
′
1 = − 2.457/(1 + x1) − 5.191/(1 + x2) + 12.801/(1 + x3)
x
′
2 =14.107/(1 + x1) − 4.479/(1 + x2) − 4.039/(1 + x3)
x
′
3 = − 6.695/(1 + x1) + 15.906/(1 + x2) − 4.067/(1 + x3)
10
Step size: 0.1
Basis library: hill_1
α: 0.05, 0.1, 0.5, 1, 5, 10
−1.229384092
x
′
1 = − 2.460/(1 + x1) − 5.191/(1 + x2) + 12.807/(1 + x3)
x
′
2 =14.102/(1 + x1) − 4.480/(1 + x2) − 4.032/(1 + x3)
x
′
3 = − 6.713/(1 + x1) + 15.923/(1 + x2) − 4.065/(1 + x3)
Table B.10: Top 10 (according to AICc) models recovered from repressilator data with 20% multiplicative noise.
122
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Switching dynamical systems with Poisson and multiscale observations with applications to neural population activity
PDF
Dynamic network model for systemic risk
PDF
Dynamical representation learning for multiscale brain activity
PDF
From least squares to Bayesian methods: refining parameter estimation in the Lotka-Volterra model
PDF
Latent space dynamics for interpretation, monitoring, and prediction in industrial systems
PDF
Stochastic inference for deterministic systems: normality and beyond
PDF
Systems approaches to understanding metabolic vulnerabilities in cancer cells
PDF
Model selection methods for genome wide association studies and statistical analysis of RNA seq data
PDF
Dealing with unknown unknowns
PDF
Validating structural variations: from traditional algorithms to deep learning approaches
PDF
Conformalized post-selection inference and structured prediction
PDF
Leveraging structure for learning robot control and reactive planning
PDF
Statistical and computational approaches for analyzing metagenomic sequences with reproducibility and reliability
PDF
Scalable latent factor models for inferring genetic regulatory networks
PDF
Robust causal inference with machine learning on observational data
PDF
Feature engineering and supervised learning on metagenomic sequence data
PDF
Investigating hematopoietic stem cells via multiscale modeling, single-cell genomics, and gene regulatory network inference
PDF
Machine learning of DNA shape and spatial geometry
PDF
Closing the reality gap via simulation-based inference and control
PDF
Comparative transcriptomics: connecting the genome to evolution
Asset Metadata
Creator
Wu, Xiaojun
(author)
Core Title
Data-driven learning for dynamical systems in biology
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Computational Biology and Bioinformatics
Degree Conferral Date
2024-05
Publication Date
03/27/2024
Defense Date
03/22/2024
Publisher
Los Angeles, California
(original),
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Bayesian inference,biological network inference,dynamical system,machine learning,model discovery,OAI-PMH Harvest,ordinary differential equation,single-cell transcriptomics,systems biology
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
MacLean, Adam (
committee chair
), Chen, Liang (
committee member
), Newton, Paul (
committee member
), Sun, Fengzhu (
committee member
)
Creator Email
competer900@gmail.com,xiaojunw@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC113859182
Unique identifier
UC113859182
Identifier
etd-WuXiaojun-12726.pdf (filename)
Legacy Identifier
etd-WuXiaojun-12726
Document Type
Dissertation
Format
theses (aat)
Rights
Wu, Xiaojun
Internet Media Type
application/pdf
Type
texts
Source
20240328-usctheses-batch-1132
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
Bayesian inference
biological network inference
dynamical system
machine learning
model discovery
ordinary differential equation
single-cell transcriptomics
systems biology