Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Switching dynamical systems with Poisson and multiscale observations with applications to neural population activity
(USC Thesis Other)
Switching dynamical systems with Poisson and multiscale observations with applications to neural population activity
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Switching Dynamical Systems with Poisson and Multiscale Observations
with Applications to Neural Population Activity
by
Christian Yao Song
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(ELECTRICAL AND COMPUTER ENGINEERING)
December 2023
Copyright 2023 Christian Yao Song
In dedication to my family, to my friends,
whose love and support I will always cherish.
ii
Table of Contents
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Modeling and inference methods for switching dynamical systems . . . . . . 1
1.2 Unsupervised learning of stationary and switching dynamical systems from
Poisson observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Chapter 2: Inference for Switching Multiscale Dynamical Systems . . . . . . . . . . . 4
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.1 Switching multiscale dynamical system (SMDS) . . . . . . . . . . . . 9
2.2.2 Inference algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.3 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2.4 Numerical simulation framework . . . . . . . . . . . . . . . . . . . . . 25
2.2.5 Experimental data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.3.1 Switching inference improves estimation by tracking regime-dependent
non-stationarity in simulations . . . . . . . . . . . . . . . . . . . . . . 32
2.3.2 Estimation performance improves by combining multiscale observations 35
2.3.3 Inference with switching multiscale smoother (SMS) is reliable across
conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.3.4 SMF detects task-related regimes in spike-field NHP data . . . . . . . 38
2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.5 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.5.1 Details on simulation settings . . . . . . . . . . . . . . . . . . . . . . 43
Chapter 3: Unsupervised learning of stationary and switching dynamical system models from Poisson observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.2.1 Background: Numerical integration based on spherical-radial cubature
rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
iii
3.2.2 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.2.3 Derivation of the novel Poisson Cubature Filter (PCF) . . . . . . . . 56
3.2.4 Unsupervised Learning with Expectation-Maximization . . . . . . . . 62
3.2.5 Extending to Switching Dynamical Systems . . . . . . . . . . . . . . 63
3.2.6 Numerical Simulation Framework . . . . . . . . . . . . . . . . . . . . 67
3.2.7 Experimental Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.3.1 PCF yields high accuracy parameter learning in stationary simulations 76
3.3.2 PCF yields high accuracy parameter learning in switching simulations 76
3.3.3 PCF learning enables better prediction and movement decoding in
motor cortical data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.3.4 PCF enables discovering interpretable task-relevant neural population
regimes in motor cortical data . . . . . . . . . . . . . . . . . . . . . . 83
3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.5 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.5.1 Example cubature point set . . . . . . . . . . . . . . . . . . . . . . . 90
3.5.2 Derivation of cross-covariance term in PCF . . . . . . . . . . . . . . . 91
3.5.3 Implementation details and the information form of PCF . . . . . . . 91
3.5.4 M-step equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
3.5.5 Additional equations for switching smoother for M-step terms . . . . 94
Chapter 4: Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
iv
List of Figures
2.1 Visualization of models of dynamical systems . . . . . . . . . . . . . . . . . 7
2.2 Illustration of inference methods for SMDS models . . . . . . . . . . . . . . 12
2.3 Behavioral task overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4 SMF and SMS estimation performance in simulation . . . . . . . . . . . . . 31
2.5 Detecting regime changes from switching firing rate magnitude . . . . . . . . 32
2.6 SMF data fusion capability in simulation . . . . . . . . . . . . . . . . . . . . 33
2.7 SMS generalizability compared to prior methods . . . . . . . . . . . . . . . . 36
2.8 Real-time decoding in experimental data recordings . . . . . . . . . . . . . . 38
2.9 SMF and SMS performance in CC . . . . . . . . . . . . . . . . . . . . . . . . 44
2.10 SMF data fusion in CC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.11 Conditional probabilities from each individual recording session . . . . . . . 46
2.12 SMF data fusion in low SNR . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.1 Overview of unsupervised learning with PCF . . . . . . . . . . . . . . . . . . 52
3.2 Behavioral task overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.3 PCF learning in simulated stationary systems . . . . . . . . . . . . . . . . . 75
3.4 PCF learning in simulated switching systems . . . . . . . . . . . . . . . . . . 79
3.5 PCF learning in experimental data . . . . . . . . . . . . . . . . . . . . . . . 80
3.6 PCF uncovers regimes in experimental data . . . . . . . . . . . . . . . . . . 82
3.7 Visualization of decoded regimes learned by sPCF-EM . . . . . . . . . . . . 83
v
Abstract
Switching dynamical system models are powerful tools for investigations of neural population
dynamics and applications in brain machine interfaces as they can model switches between
distinct regimes, e.g., due to changes in task or lapses in engagement. However, realizing
neurotechnologies that can use such models requires overcoming challenges on multiple fronts
from modelling and inference to learning system parameters.
First, the distributions that best capture the fast timescale and discrete nature of spikes
differ significantly from those that capture slower large-scale population activity measured
with continuous local field potential (LFP) signals. Prior real-time inference methods that
can estimate both behavior and regimes focus on continuous Gaussian observations which
cannot be applied to discrete spiking activity. Further, the heterogeneity in LFP and spiking distributions causes a challenge in fusing information in what we call multiscale observations. To address these challenges, we develop a novel Switching Multiscale Dynamical
System model and the associated filtering and smoothing methods. These methods solve the
challenge of estimating both behavior and regimes from multiscale observations comprised
of both spikes and LFP. We validate the methods in numerical simulations and prefrontal
spike-field data and show that the methods can successfully combine the spiking and field
potential observations to simultaneously track the regime and behavior states accurately.
Next, focusing on discrete Poisson observations, experimental data often lacks reliable
trial or regime labels, necessitating accurate unsupervised learning methods to find model
parameters. While prior learning methods exist, their inference step relies on the Laplace
approximation which may fail to capture broader properties of densities and lead to inaccurate model learning. To achieve accurate model learning, we derive a novel inference method
vi
based on deterministic sampling for Poisson observations called the Poisson Cubature Filter
(PCF) and embed it in unsupervised learning frameworks. We perform validation in numerical simulations and motor spiking data and show that the developed methods are more
accurate and data efficient than prior methods and can uncover behavior-relevant regimes
completely unsupervised.
Combined, the developed modeling, inference, and learning methods solve challenges on
multiple fronts to enable switching dynamical systems for non-Gaussian observations. This
complete framework can facilitate basic neuroscience investigations of switching neural population dynamics by uncovering latent regimes with inference methods that use accurately
learned model parameters. Further, the developed methods have the potential to improve future brain-machine interfaces by enabling real-time decoding of both spiking and multiscale
data in more naturalistic scenarios where regime-dependent behavior can arise.
vii
Chapter 1
Introduction
Investigating neural dynamics underpinning naturalistic behaviors requires models of neural population activity that can balance explanatory power and interpretability. One class
of models that can offer such a balance is a switching dynamical system model1–5. These
switching models often use linear dynamics to describe the time evolution of neural population activity in terms of an underlying latent state; however, they allow for a switch
non-stationarity such that a series of distinct linear dynamics called regimes can be pieced
together to capture more complex neural dynamics. Such models can have important applications in both basic neuroscience and neurotechnology. However, how and which neural
observations are incorporated into these models can lead to several challenges on multiple
fronts. These include challenges in the inference of desired behavior and regime states from
neural activity which is the focus of Chapter 2 to challenges in learning the model parameters
that inference methods need which is the focus of Chapter 3 .
1.1 Modeling and inference methods for switching
dynamical systems
With respect to neural observations, it is known that behavior is encoded across multiple
spatiotemporal scales of neural activity6–12. One such scale is field potentials – for example
1
in the form of local field potentials (LFP) or electrocorticogram (ECoG) – which reflects the
activity of larger neural populations. For these continuous signals, extensive prior methods
for switching systems with continuous Gaussian observations, also known as switching linear
dynamical systems (SLDS)3,5,13,14, exist. However, neural population activity can also be
measured in the form of spiking activity15 which is the activity of individual neurons. Spiking
activity can be modelled as either a binary point-process given small enough time bins or
otherwise a discrete Poisson process16–22. Regardless of which, such an observation modality
is generally non-conjugate with the underlying brain/latent state of dynamical system models
which is assumed to be Gaussian, leading to challenges in both inference and learning.
Further, advances in neural devices have enabled the simultaneous recording of both spiking
activity and field potentials23 which we call multiscale observations. This leads to additional
challenges of then how to effectively combine information across scales in both the model and
inference methods. In Chapter 2, we discuss the work in1 where we develop new modelling
and inference methods for switching systems that can accommodate not only spikes but also
multiscale observations.
1.2 Unsupervised learning of stationary and switching
dynamical systems from Poisson observations
With respect to model parameters, non-conjugate observations and switching dynamics can
also lead to challenges in learning. Learning accurate model parameters is critical not only
for accurate inference but also for gaining insight into neural population dynamics during
naturalistic behavior. However, recordings of naturalistic behavior may not have reliable trial
or regime labels which would normally simplify the learning process of switching systems
greatly. This motivates the need for accurate unsupervised learning methods for switching
systems. Prior learning methods for both stationary and switching dynamical systems that
model spikes as discrete Poisson observations which are non-conjugate with Gaussian latent
2
states do exist but use the Laplace approximation for inference4,16,18,24–27. This approximation has the potential to lead to inaccurate learning as we will also show in Chapter 3 since
it may not be able to capture the broader properties of the target of approximation28. Thus,
there is a need for more accurate learning methods that can better learn the model parameters for both stationary and switching dynamical system models, and in turn facilitate the
investigations of neural dynamics underlying behavior. In Chapter 3, we discuss the work in2
where we address this challenge by developing a novel deterministic sampling-based filter for
Poisson observations and embedding it in unsupervised learning frameworks. We show with
validation in both simulated and experimental neural data that not only do the developed
methods yield more accurate model parameter estimates, but also that the new methods are
critical for uncovering regimes when data is limited as is often the case with neural datasets.
3
Chapter 2
Inference for Switching Multiscale Dynamical Systems
2.1 Introduction
Linear dynamical systems (LDS) have served as the foundational models for brain-machine
interfaces (BMIs) to control neural prosthetic devices and for the study of neural population
dynamics underlying behavior7,15,29–31. An LDS assumes that an unobserved continuousvalued state related to behavior is linearly encoded in the neural observations and that this
state evolves over time with time-invariant dynamics. We refer to this state as the brain
state, which can either directly represent behavior parameters such as movement kinematics, or be a latent state that gives rise to neural activity and/or behavior15,30,32. Inference
algorithms such as the Kalman filter then rely on these LDS model assumptions to efficiently
estimate brain states from neural observations. However, these model assumptions become
less appropriate as neurotechnologies measure multiple spatiotemporal scales of neural activity using simultaneous field potential and spiking modalities, and as they do so during
more complex and naturalistic behaviors.
First, when both spiking and field potentials are being recorded, the linear observation
model in LDS will need to be modified to accommodate for the statistical and time-scale
differences of these modalities, with field potentials being continuous signals and spikes being
binary action potential events with faster time-scales. We refer to the combination of both
spiking and field potential signals with their different statistical and timescale profiles as
4
multiscale observations. Second, for more naturalistic or complex behaviors, neural dynamics
may change over time depending on the regime of behavior. We refer to periods of time with
unique brain state dynamics and encoding as regimes, which we label at each time point by
a discrete-valued variable termed the regime state. Thus, there is an unmet need to develop
modeling and inference methods that not only use multiscale observations but also relax
stationarity assumptions to include regime-dependent dynamics and encoding.
This need is motivated by the observation that behavior is encoded across multiple spatiotemporal scales of neural activity6–12. It is also motivated by advances in neural devices
that can simultaneously record spiking activity from individual neurons and field potentials
– in the form of local field potentials (LFP) or electrocorticogram (ECoG) – that reflect the
activity of larger neural populations23. Prior studies have developed modeling and filtering
methods for multiscale spike-field observations called the multiscale dynamical system model
and the multiscale filter, respectively6,8,25, which model the field features as a Gaussian process as in a traditional LDS model but also model the spiking activity as an additional point
process that nonlinearly encodes the brain state. These studies found that spiking and LFP
reflect a shared low-dimensional neural state that explains naturalistic behavior8
. They also
found that the multiscale filter improves movement decoding compared to both a Kalman
filter and a point process filter6
, which both use only a single scale of activity in the form
of field features or spikes respectively. Thus, explanatory power and decoding performance
can improve by developing a model that can effectively combine information from multiple
neural activity scales. However, these studies also assumed stationarity over time. Thus,
a major question is how to extend these multiscale methods to allow for non-stationarity,
which is especially critical in more naturalistic setups.
One way to relax the stationarity assumption is to develop adaptive methods, which
has been done for single-scale state-space models in BMIs (e.g.,17,33–39). Adaptive methods
are suitable for cases where the model might slowly change over time due to for example
neural plasticity or motor learning but not for all cases where the stationarity assumption
5
must be relaxed. Specifically, sometimes the model could abruptly switch parameters due
to a regime change, where a regime can represent, for example, the stage of a task, or
an internal state such as stress, engagement, or attention40–43. For the case of single-scale
LDS, prior works in neural engineering and computer vision14,44 have handled cases with a
switching regime-dependent nature using the switching linear dynamical system (switching
LDS) model. A switching LDS augments an LDS by assuming that there exists an additional
governing regime state dictating which parameters to use based on which regime is in power.
The regime state can then switch between a discrete finite set of regimes with Markovian
dynamics. However, this extension has not been done for cases with multiscale observations
that do not fit within the LDS framework and remains a challenge.
Figure 2.1 depicts the appropriate use case for a unified model where the behaviorrelated brain state is encoded in multiscale observations with an overarching switching regime
state that governs the dynamical and encoding parameters. Current inference methods for
switching LDS can only handle a single scale of observations3,45,46, while current methods
for multiscale dynamical systems assume the model is stationary8,25. Thus, novel inference
algorithms must be developed to estimate both the behavior-related brain state and the
regime state given simultaneous continuous-valued Gaussian observations and binary-valued
point process observations.
Here, we develop the modeling and inference methods for the switching multiscale dynamical system (SMDS) and demonstrate them in both extensive numerical simulations and
monkey spike-LFP activity during a saccadic eye movement task with various regimes. The
SMDS model extends the LDS model by appending an additional binary-valued point process observation modality and an additional discrete-valued regime state, which dictates the
dynamical and encoding parameters. The brain state is encoded nonlinearly in the instantaneous firing rate of the point process and linearly in the Gaussian process while the regime
state evolves as a first order Markov chain. We also derive both a filter and a smoother
for the SMDS model. The filter termed the Switching Multiscale Filter (SMF) is causal
6
and computationally tractable and thus practical to use in real-time decoding and BMI applications. The smoother termed the Switching Multiscale Smoother (SMS) can leverage
the entire length of data to reliably perform non-causal and more accurate estimation when
real-time processing is not needed.
We first validate the developed methods with extensive numerical spike-field simulations
and show their success and robustness in estimating both brain and regime states across
diverse settings and parameter ranges. Specifically, SMF and SMS more accurately estimate
the brain and regime states compared to multiscale inference methods that use the same
Time
Time
Multiple regimes
Brain state activity
(a) Diagram of Switching Multiscale Dynamical System
(b) Diagram of Linear Dynamical System
Spike-eld obs.
Single regime
Brain state activity
Continuous obs.
s = 1 s = 2
s = 1
x1
x1 x2 x3 x4 x5 x6
y1 y2 y3 y4 y5 y6
x2
x3 x4 x5
x6
y1
n1
y2
n2
y3
n3
y5
n5
y6
n6
y4
n4
Figure 2.1: Visualization of models of dynamical systems
(a) Model of a Switching Multiscale Dynamical System (SMDS) in which the regime in power
s dictates both the dynamics of the brain state x and the encoding in multiscale spike-field
observations y and n. (b) Model of a Linear Dynamical System (LDS) with stationary brain
state dynamics and encoding in only a single continuous observation scale.
7
spike-field activity but assume stationarity. Also, the new methods successfully combine
information from multiple neural observation scales to better estimate the brain and regime
states compared to when using a single scale of observation. Finally, the SMS smoother
reliably yields more accurate estimation of both brain and regime states compared to the
SMF filter across every single parameter setting tested. Indeed, we show that even in the
single-scale case, prior smoothing methods do not generalize as well as SMS across system
settings. We then show the success of the SMF method in multiscale spike-LFP data recorded
from a monkey performing saccades for fluid rewards. Specifically, SMF can detect different
regimes in multiscale spike-LFP dynamics that are associated with different task stages,
and the decoded brain states can accurately classify the direction of the executed saccade.
Finally, SMF successfully combines information from spiking activity and LFP features to
improve estimation of regime and brain states in terms of the accuracy of the detected
regimes (using the accuracy measure) and the accuracy of saccade direction classification
(using the area under the curve, or AUC, measure), respectively.
Specifically, SMF can detect different regimes in multiscale spike-LFP dynamics that
are associated with different task stages and successfully combines information from spiking
activity and LFP features to improve estimation of regime and brain states. Finally, the
decoded brain states can accurately classify the direction of the executed saccade.
2.2 Methods
In this section, we present the modeling and inference methods for the SMDS model with an
illustration of the developed inference methods in Figure 2.2. We then set up the validation
methods consisting of extensive numerical simulations of SMDS models and applications of
our methods in data collected from a non-human primate (NHP). We denote column vector
values with lowercase bold letters, matrices with uppercase bold letters, continuous-valued
densities with f(·), and discrete-valued distributions with P(·).
8
2.2.1 Switching multiscale dynamical system (SMDS)
We first detail the switching multiscale dynamical system (SMDS) model. The regime state
st
is a discrete random variable that can take one of M possible values from 1 to M and
describes the current regime of the system. For example, this regime state can represent the
current stage within a multi-stage task. This regime state evolves with first order Markovian
dynamics as:
P(s
(j)
t
|s
(i)
t−1
) = Φj,i (2.1)
where Φ is the transition matrix, and s
(j)
t
and s
(i)
t−1
respectively denote st = j and st−1 = i.
The diagonal elements Φj,j describe the probability of staying in the same regime j while
the off-diagonal elements Φj,i describe the probability of transitioning from regime i to j at
every time evolution. We denote the initial distribution of the regime state s1 as π. The
regime state then dictates the parameters of the dynamical model at every point in time.
The brain state represents the state of neural activity that relates to the behavior being
performed, e.g. eye movement direction or arm movement kinematics. We model the brain
state xt as a random walk driven by zero mean Gaussian noise wt with covariance Q(st):
xt = A(st)xt−1 + wt (2.2)
The dynamics A(st) can take on one of M possibilities {A
(1)
, . . . , A
(M)
} where A(s
(j)
t
) = A
(j)
.
The same notation holds for the noise covariance Q(st). The initial brain state x0 is assumed
to be Gaussian with mean µ0 and covariance Λ0. The above constitutes the brain state
transition model. We refer to the parameters associated with the brain state dynamics in
(2.2) as the dynamics equation parameters.
9
We model the encoding of xt
in simultaneously recorded field potential features and
spiking activity. Prior work has shown that behavior can be linearly mapped to the logpower of various frequency bands of field potential activity8–10,32,47–53. Thus, we model the
field features yt as:
yt = C(st)xt + vt (2.3)
where vt
is Gaussian noise independent of brain state noise wt with zero mean and covariance
R(st). The parameters C(st) and R(st) are chosen from M possibilities by the regime state.
The spiking activity for neuron c is then modeled as a point process with xt encoded in
the instantaneous firing rate λc
6,8,17,19–22,25,35. We denote the time-step by ∆ and choose it
small enough to contain at most one spike, thus generating a binary-valued signal indicating
the presence or absence of a spike. The point process likelihood distribution of the spiking
activity from C neurons nt = [n
1
t
, · · · , nC
t
]
T
is:
P(nt
|xt
, st) = Y
C
c=1
λc(xt
, st)∆n
c
t
exp
−λc(xt
, st)∆
(2.4)
with the firing rate in the form of an exponential function:
λc(xt
, st)∆ = exp(αc(st) + βc(st)
T
xt) (2.5)
The parameters αc(st) and βc(st) are each chosen from M possibilities by the regime state.
Here, all neurons are modeled to encode the same brain state xt and regime state st
. Prior
works have shown that the activity of neurons can be approximately modeled as conditionally
independent conditioned on the brain states17,20–22. Consistent with these works and to
enable tractable recursions for the estimators, we also assume that the activities of the
neurons are conditionally independent of each other conditioned on the brain state xt and
regime state st
. This means that conditioned on knowing the values for xt and st
, how likely
neurons are to spike only depends on what their common encoded states, xt and st
, are. We
refer to the parameters related to observation encoding in (2.3) and (2.5) as the observation
equation parameters.
We can now combine the two observation modalities into a multiscale observation modality with a joint likelihood density, which is a key feature of the SMDS model. Prior studies
have shown that a reasonable assumption is that the field features and spiking activity are
conditionally independent given xt
6,8. So, we can write the joint likelihood density as
f(nt
, yt
|xt
, st) = P(nt
|xt
, st)f(yt
|xt
, st) (2.6)
We emphasize that the two observation modalities – spiking and field potential signals –
encode the same brain state xt and the same regime state st and thus are coupled/dependent
through these states. However, they are assumed conditionally independent conditioned on
knowing these states.
The SMDS model also allows for the possibility that the time-scale of field features may be
longer than that of the recorded spiking activity, meaning that the former may have a longer
time-period between their consecutive observation samples. In this case, since we have to
set the time-step to be the shorter time-period for consecutive spike events, there exist timesteps with no field observation. We thus model unobserved field features by setting C = 0 or
R = ∞. Finally, we denote the multiscale spike-field observation at time t as ht = [yt
, nt
]
T
.
Together, (2.1) – (2.6) constitute the SMDS model. We highlight that this model extends
both the switching LDS and the multiscale dynamical system by accommodating not only
multiscale observations but also regime-dependent non-stationarity.
2.2.2 Inference algorithms
We develop inference algorithms for the SMDS model that aim to obtain the minimum mean
squared error (MMSE) estimate of the brain and regime states given available multiscale
observations. The MMSE state estimator based on observations up to time τ is known to
11
be the mean of the posterior density3
, and thus we aim to find the following terms for the
brain and regime state estimation, respectively:
E[xt
|h1:τ ] (2.7)
P(s
(j)
t
|h1:τ ) (2.8)
where τ is equal to the present time t for filters and equal to some future time T for smoothers.
Computing the above terms exactly, however, requires an exponentially increasing computational cost as time grows. To see this, focusing on the decoding problem for (2.7), we
can expand it to consider each potential regime state sequence:
E[xt
|h1:t
] = X
M
s1=1
. . .X
M
st=1
E[xt
|h1:t
, s1:t
]P(s1:t
|h1:t) (2.9) Decoded
Regime states
Decoded
Brain states
Smoothed
Regime states
Smoothed
Brain states
Spike-eld obs.
Switching
Multiscale
Filter
Switching
Multiscale
Smoother
Figure 2.2: Illustration of inference methods for SMDS models
The switching multiscale filter (SMF) takes spike-field activity to causally decode unobserved
brain and regime states. The values calculated in SMF can then be used in the switching
multiscale smoother (SMS) to further smooth estimates of brain and regime states noncausally using future spike-field activity.
12
In the inner expectation term E[xt
|h1:t
, s1:t
], given each potential s1:t sequence, the regime
is known for each time t. So finding the inner expected brain state term is a simple filtering
problem. However, the number of potential sequences and thus the number of computations
grows exponentially with time, where at time t with M possible regimes, Mt
sequences must
be considered. Thus, the inference algorithms that we develop must approximate the above
terms to reconcile this challenge of intractability and ensure practicality.
2.2.2.1 Switching multiscale filter
We first formulate the overall recursive approach that the switching multiscale filter takes
and introduce the following notation:
xˆ
(i)
t−1|t−1 ≜ E[xt−1|h1:t−1, s
(i)
t−1
] (2.10)
xˆ
(−,j)
t−1|t−1 ≜ E[xt−1|h1:t−1, s
(j)
t
] (2.11)
xˆ
(j)
t|t−1 ≜ E[xt
|h1:t−1, s
(j)
t
] (2.12)
Through the rest of this work, i will be used to index the regime state at time t − 1 and j
for time t.
We start by expanding the posterior density of the brain state over the potential regimes
at time t − 1 :
f(xt−1|h1:t−1) = X
M
i=1
f(xt−1|h1:t−1, s
(i)
t−1
)P(s
(i)
t−1
|h1:t−1) (2.13)
where the mean of (2.13) would be the MMSE estimate of the brain state at time t − 1. We
assume that each f(xt−1|h1:t−1, s
(i)
t−1
) is a Gaussian with mean and covariance xˆ
(i)
t−1|t−1
and
Λˆ (i)
t−1|t−1
, respectively. Thus, (2.13) can be viewed as a mixture of Gaussians whose density is
completely described by the below sufficient statistics comprised of the mean and covariance
of each Gaussian and its relative weight P(s
(i)
t−1
|h1:t−1):
{xˆ
(i)
t−1|t−1
, Λˆ (i)
t−1|t−1
, P(s
(i)
t−1
|h1:t−1)}i∈1:M (2.14)
13
This weight is also critical in finding the estimate of the regime state. We then perform the
same expansion of the brain state density at time t:
f(xt
|h1:t) = X
M
j=1
f(xt
|h1:t
, s
(j)
t
)P(s
(j)
t
|h1:t) (2.15)
We also assume that each f(xt
|h1:t
, s
(j)
t
) is also a Gaussian and that (2.15) is a mixture of
Gaussians with sufficient statistics:
{xˆ
(j)
t|t
, Λˆ (j)
t|t
, P(s
(j)
t
|h1:t)}j∈1:M (2.16)
The inference goal is to compute (2.16) from (2.14) recursively. The approach of SMF then
starts each recursion with the terms (2.14) being known and incorporates the new multiscale
observations at time t, i.e. ht
, to calculate the next set of terms (2.16) to complete the
recursion. Through this causal approach, SMF can decode brain states and regime states
at time t using only spike and field potential observations up to the present time t without
requiring observations from future times great than t. A summary of SMF can be seen in
Algorithm 1, which we now derive.
We begin the derivation by incrementing the regime state for each f(xt−1|h1:t−1, s
(i)
t−1
) such
that it is conditioned on s
(j)
t
instead of the current s
(i)
t−1
to yield the density f(xt−1|h1:t−1, s
(j)
t
).
This is done through marginalization and Markov conditional independence properties:
f(xt−1|h1:t−1, s
(j)
t
) = X
M
i=1
f(xt−1|h1:t−1, s
(i)
t−1
)P(s
(i)
t−1
|s
(j)
t
, h1:t−1) (2.17a)
P(s
(i)
t−1
|s
(j)
t
, h1:t−1) ∝ P(s
(j)
t
|s
(i)
t−1
)P(s
(i)
t−1
|h1:t−1) (2.17b)
We can see that (2.17a) is a mixture of Gaussians because we assumed f(xt−1|h1:t−1, s
(i)
t−1
)
is a Gaussian in (2.13) and that the weights of the mixture (2.17b) can be calculated from
14
(2.1) and (2.14). To ensure tractability, we approximate this mixture of Gaussians as a single
Gaussian whose mean and covariance equal those of the mixture.
The overall mean and covariance, µ and Λ, of a general mixture of M Gaussians with
individual means, covariances, and weights denoted by {µ
(i)
, Λ
(i)
, pi}i∈[1,M] can be found
through54:
µ =
X
M
i=1
µ
(i)
pi
Λ =
X
M
i=1
h
Λ
(i) + (µ
(i) − µ)(µ
(i) − µ)
T
i
pi
(2.18)
We will denote this process of finding the first and second moments of a mixture of Gaussians
as:
{µ, Λ} = getMoments(ϕ
(1), . . . , ϕ(M)
)
ϕ
(i) ≡ {µ
(i)
, Λ
(i)
, pi}
(2.19)
Thus, we approximate (2.17a) with a single Gaussian with mean and covariance:
{xˆ
(−,j)
t−1|t−1
, Λˆ (−,j)
t−1|t−1
} = getMoments(ϕ
(1), . . . , ϕ(M)
)
ϕ
(i) ≡ {xˆ
(i)
t−1|t−1
, Λˆ (i)
t−1|t−1
, P(s
(i)
t−1
|s
(j)
t
, h1:t−1)}
(2.20)
This is repeated for each regime j ∈ [1, M].
We now incorporate the next observation ht and update the brain state density from
f(xt−1|h1:t−1, s
(j)
t
) to f(xt
|h1:t
, s
(j)
t
) for each regime j. Because the regime is fixed at j for
the above two densities, we do this by adapting a recently developed filter for stationary
multiscale observations6 due to the multiscale nature of ht and using model parameters
15
associated with regime j. The first step involves a prediction of the brain state to get the
prediction density f(xt
|h1:t−1, s
(j)
t
) with mean and covariance:
xˆ
(j)
t|t−1 = A
(j)
xˆ
(−,j)
t−1|t−1
Λˆ (j)
t|t−1 = A
(j)Λˆ (−,j)
t−1|t−1A
(j)
T
+ Q
(j)
(2.21)
We then update the brain state to f(xt
|h1:t
, s
(j)
t
) by expanding the density with Bayes rule:
f(xt
|h1:t
, s
(j)
t
) = f(ht
|xt
, s
(j)
t
)f(xt
|h1:t−1, s
(j)
t
)
f(ht
|h1:t−1, s
(j)
t
)
(2.22)
and approximating the density as a Gaussian with Laplace’s approximation with mean and
covariance:
xˆ
(j)
t|t = xˆ
(j)
t|t−1 + Λˆ (j)
t|tC
(j)
T
R
(j)−1
yt − C
(j)
xˆ
(j)
t|t−1
+ Λˆ (j)
t|t
X
C
c=1
β
(j)
c
n
c
t − λc(xˆ
(j)
t|t−1
, j)∆
(2.23a)
Λˆ (j)−1
t|t = Λˆ (j)−1
t|t−1 + C
(j)
T
R
(j)−1
C
(j) +
X
C
c=1
β
(j)
c β
(j)
T
c λc(xˆ
(j)
t|t−1
, j)∆ (2.23b)
We note that without first introducing s
(j)
t dependence while removing s
(i)
t−1 dependence from
f(xt
|h1:t−1, s
(i)
t−1
) in (2.17) and (2.20), this filtering step would result in a brain state density
dependent on both s
(i)
t−1
and s
(j)
t
and thus would require additional steps to later remove the
s
(i)
t−1 dependence. However, here, the above steps result in the desired mean and covariance
terms in (2.16) without such a dependence.
We now focus on calculating the regime state distribution P(s
(j)
t
|h1:t). We first apply
Bayes rule to the distribution:
P(s
(j)
t
|h1:t) ∝ P(s
(j)
t
|h1:t−1)f(ht
|h1:t−1, s
(j)
t
) (2.24a)
P(s
(j)
t
|h1:t−1) = X
M
i=1
P(s
(j)
t
|s
(i)
t−1
)P(s
(i)
t−1
|h1:t−1) (2.24b)
16
where (2.24b) (the first term in (2.24a)) is found from (2.1) and (2.14). However, the second
term in (2.24a) is difficult to find due to the multiscale nature of the observations in the
SMDS model. To overcome this challenge, we recognize that this second term is also the
partition function (i.e. denominator) of the update density (2.22). Thus, we solve for it
using (2.22) in terms of the prediction and update densities, which were both approximated
as Gaussians in the Laplace approximation with means and covariances given by (2.21)
and (2.23). We then evaluate this second term at xˆ
(j)
t|t with some additional basic algebraic
operations to derive the following solution:
f(ht
|h1:t−1, s
(j)
t
) = f(yt
|xˆ
(j)
t|t
, s
(j)
t
)P(nt
|xˆ
(j)
t|t
, s
(j)
t
)
×
vuut
det Λˆ (j)
t|t
det Λˆ (j)
t|t−1
exp
−
1
2
(xˆ
(j)
t|t − xˆ
(j)
t|t−1
)
TΛˆ (j)−1
t|t−1
(xˆ
(j)
t|t − xˆ
(j)
t|t−1
)
(2.25)
We use (2.25) with (2.24b) to calculate (2.24a) for each regime j to obtain the remaining
terms in (2.16) and complete the recursion.
Finally, we calculate the MMSE estimate of the brain state f(xt
|h1:t) as in (2.7) by
obtaining the mean of the mixture (2.15):
{xˆt|t
, Λˆ
t|t} = getMoments(ϕ
(1), . . . , ϕ(M)
)
ϕ
(j) ≡ {xˆ
(j)
t|t
, Λˆ (j)
t|t
, P(s
(j)
t
|h1:t)}
(2.26)
where ϕ
(j)
is already available from above. The regime state estimate is then set to the
regime with the largest posterior:
sˆt|t = argmax
j∈[1,M]
P(s
(j)
t
|h1:t) (2.27)
Together, (2.17) – (2.27) detail the SMF algorithm. The architecture for this method
was inspired by a decoding algorithm for switching linear dynamical systems (LDS) called
17
the interacting multiple model (IMM)3,45 algorithm. However, this IMM method only accepts Gaussian observations and thus is not suited for the SMDS model with multiscale
observations.
Algorithm 1 Switching Multiscale Filter
1: for t do1T
2: //Approximate
3: for j do1M
4: Find {xˆ
(−,j)
t−1|t−1
, Λˆ (−,j)
t−1|t−1
} ▷ (2.17),(2.20)
5: end for
6: Get ht
7: //Filter
8: for j do1M
9: Find {xˆ
(j)
t|t
, Λˆ (j)
t|t
} ▷ (2.21),(2.23)
10: Find P(s
(j)
t
|h1:t) ▷ (2.24),(2.25)
11: end for
12: //Estimate
13: Find {xˆt|t
, Λˆ
t|t} ▷ (2.26)
14: Find ˆst|t
, ▷ (2.27)
15: end for
2.2.2.2 Switching multiscale smoother
We now present a fixed-interval smoother for the SMDS model termed the switching multiscale smoother (SMS). We run this method after SMF to obtain estimates of the brain and
regime states at time t as in (2.7) and (2.8) given observations up to a fixed future time
T ≥ t.
Compared to derivations of prior smoothing methods for switching LDS systems for
single-scale observations, SMS introduces a key additional adjustment as described below.
These prior smoothing methods for switching LDS in46, which we call Kim’s method, and in55
,
which is called Expectation Correction (EC), do not directly require any of the observations
– whether multiscale or not. Instead, they rely only on the sufficient statistics computed by
a filter like SMF and the assumption that (2.15) is a mixture of Gaussians. However, we find
that directly applying these methods to the SMDS models or even to single-scale models
18
(whether point process or Gaussian) leads to inconsistent/unreliable results across different
system settings, with the prior smoothers doing worse than even filtering methods in some
settings (see Results Section 2.3.3). Thus, we introduce a key additional adjustment in SMS
which empirically yields a more reliable method that generalizes well across diverse settings
(Results Section 2.3.3).
We introduce the overall approach of SMS first with the following notation:
xˆ
(j)
t|T ≜ E[xt
|h1:T , s
(j)
t
] (2.28)
xˆ
(i,j)
t|T ≜ E[xt
|h1:T , s
(i)
t−1
, s
(j)
t
] (2.29)
xˆ
(i,j)
t−1|T ≜ E[xt−1|h1:T , s
(i)
t−1
, s
(j)
t
] (2.30)
We begin by expanding the density of the brain state at time t:
f(xt
|h1:T ) = X
M
j=1
f(xt
|h1:T , s
(j)
t
)P(s
(j)
t
|h1:T ) (2.31)
where the mean of (2.31) would be the MMSE estimate of the brain state given observations
up to a fixed future time T. Similar to (2.13), we assume each f(xt
|h1:T , s
(j)
t
) is a Gaussian
with mean and covariance xˆ
(j)
t|T
and Λˆ (j)
t|T
respectively, and thus we can view (2.31) as a
mixture of Gaussians with sufficient statistics:
{xˆ
(j)
t|T
, Λˆ (j)
t|T
, P(s
(j)
t
|h1:T )}j∈1:M (2.32)
We then repeat the expansion on the brain state density at time t − 1 with similar Gaussian
assumptions, whose density and sufficient statistics are:
f(xt−1|h1:T ) = X
M
i=1
f(xt−1|h1:T , s
(i)
t−1
)P(s
(i)
t−1
|h1:T ) (2.33)
{xˆ
(i)
t−1|T
, Λˆ (i)
t−1|T
, P(s
(i)
t−1
|h1:T )}i∈1:M (2.34)
19
The inference goal is then to take the set of sufficient statistic terms in (2.32) and calculate
the next set in (2.34) to complete the recursion. We note that the set (2.32) is calculated
in the final recursion of SMF when t reaches T. Thus, after SMF is run in the forward time
direction, SMS can then be run non-causally in the backwards time direction to yield more
accurate estimates of the brain and regime states.
We start the derivation of SMS by focusing on the brain state. With the ultimate goal
of finding the mean and covariance of f(xt−1|h1:T , s
(i)
t−1
) as in (2.34), we begin by finding the
mean and covariance of an intermediate density f(xt−1|h1:T , s
(i)
t−1
, s
(j)
t
) such that we can later
remove the s
(j)
t dependence. We first follow the derivation to the Rauch-Tung-Striebel (RTS)
smoother and marginalize the intermediate density over xt
:
f(xt−1|h1:T , s
(i)
t−1
, s
(j)
t
) = Z
f(xt−1|xt
, h1:t−1, s
(i)
t−1
, s
(j)
t
)f(xt
|h1:T , s
(i)
t−1
, s
(j)
t
)dxt (2.35a)
=
Z
f(xt
|xt−1, s
(j)
t
)f(xt−1|h1:t−1, s
(i)
t−1
) (2.35b)
×
f(xt
|h1:T , s
(i)
t−1
, s
(j)
t
)
f(xt
|h1:t−1, s
(i)
t−1
, s
(j)
t
)
dxt (2.35c)
where (2.35b) and (2.35c) come from applying Bayes rule to the first density in (2.35a). As
in the RTS smoother, the density’s mean and covariance can then be found with multivariate
Gaussian properties28:
xˆ
(i,j)
t−1|T = xˆ
(i)
t−1|t−1 + Jt−1
xˆ
(i,j)
t|T − xˆ
(i,j)
t|t−1
(2.36a)
Λˆ (i,j)
t−1|T = Λˆ (i)
t−1|t−1 + Jt−1(Λˆ (i,j)
t|T − Λˆ (i,j)
t|t−1
)J
T
t−1
(2.36b)
Jt−1 = Λˆ (i)
t−1|t−1A
(j)
T
Λˆ (i,j)−1
t|t−1
(2.36c)
where, in (2.35b), A
(j)
is the dynamics matrix associated with f(xt
|xt−1, s
(j)
t
), and xˆ
(i)
t−1|t−1
and Λˆ (i)
t−1|t−1
are the moments of f(xt−1|h1:t−1, s
(i)
t−1
). For (2.35c), xˆ
(i,j)
t|T
and Λˆ (i,j)
t|T would be
the mean and covariance of the smoothed density f(xt
|h1:T , s
(i)
t−1
, s
(j)
t
) in the numerator while
xˆ
(i,j)
t|t−1
and Λˆ (i,j)
t|t−1 would be the moments of the prediction density f(xt
|h1:t−1, s
(i)
t−1
, s
(j)
t
) in the
2
denominator. However, the challenge associated with these equations is that the moments
for the smoothed density f(xt
|h1:T , s
(i)
t−1
, s
(j)
t
) in (2.35c) are not actually known nor trivially
computable, so we make the following assumption:
f(xt
|h1:T , s
(i)
t−1
, s
(j)
t
) ≈ f(xt
|h1:T , s
(j)
t
) (2.37)
where the right density is known from (2.31) with mean and covariance xˆ
(j)
t|T
and Λˆ (j)
t|T
as in
(2.32), and use these moments to replace xˆ
(i,j)
t|T
and Λˆ (i,j)
t|T
. This assumption is also used in
Kim’s method and EC and allows the mean and covariance of the intermediate density to
now be computable.
However, we find that with certain model parameters, using these calculations as is can
lead to poor estimation performance regardless of observation modality – whether singlescale or multiscale – as detailed in Section 2.3.3, and this motivated our key new additional
adjustment. We hypothesized the problem lies in a discrepancy in (2.35c) between the prediction density f(xt
|h1:t−1, s
(i)
t−1
, s
(j)
t
) having an s
(i)
t−1 dependence while the smoothed density
f(xt
|h1:T , s
(i)
t−1
, s
(j)
t
) having its s
(i)
t−1 dependence assumed away in (2.37). Thus, to remedy
this discrepancy, we make the prediction and smoothed densities consistent by making an
additional equivalent assumption for the prediction density:
f(xt
|h1:t−1, s
(i)
t−1
, s
(j)
t
) ≈ f(xt
|h1:t−1, s
(j)
t
) (2.38)
21
and replace the moment terms xˆ
(i,j)
t|t−1
and Λˆ (i,j)
t|t−1 with those of f(xt
|h1:t−1, s
(j)
t
), i.e. xˆ
(j)
t|t−1
and
Λˆ (j)
t|t−1
, which we find in (2.21). The final equations for finding the mean and covariance of
the intermediate density f(xt−1|h1:T , s
(i)
t−1
, s
(j)
t
) for SMS is then:
xˆ
(i,j)
t−1|T = xˆ
(i)
t−1|t−1 + Jt−1
xˆ
(j)
t|T − xˆ
(j)
t|t−1
(2.39a)
Λˆ (i,j)
t−1|T = Λˆ (i)
t−1|t−1 + Jt−1(Λˆ (j)
t|T − Λˆ (j)
t|t−1
)J
T
t−1
(2.39b)
Jt−1 = Λˆ (i)
t−1|t−1A
(j)
T
Λˆ (j)−1
t|t−1
(2.39c)
Thus, all (i,j)
terms on the right hand side of (2.36) with s
(i)
t−1
and s
(j)
t dependence become (j)
terms with only s
(j)
t dependence. We show in Section 2.3.3 that this adjustment allows SMS
to reliably smooth decoded estimates in cases where prior methods (Kim’s method and EC)
could perform worse than even a causal filter.
We then continue the derivation by changing focus to the regime state. With the goal of
finding P(s
(i)
t−1
|h1:T ) as in (2.34), we first find the joint distribution P(s
(i)
t−1
, s
(j)
t
|h1:T ):
P(s
(i)
t−1
, s
(j)
t
|h1:T ) = P(s
(i)
t−1
|s
(j)
t
, h1:T )P(s
(j)
t
|h1:T ) (2.40)
and make the following assumption to simplify computation:
P(s
(i)
t−1
|s
(j)
t
, h1:T ) ≈ P(s
(i)
t−1
|s
(j)
t
, h1:t−1) (2.41)
with the term on the right found in (2.17b). We repeat this for each (i, j) pair.
Finally, with P(s
(i)
t−1
, s
(j)
t
|h1:T ) in (2.41) and the moments of f(xt−1|h1:T , s
(i)
t−1
, s
(j)
t
) in (2.39)
known, we remove their s
(j)
t dependence to find the terms in (2.33) and ultimately the
sufficient statistics (2.34) to complete the recursion. For the regime state, we marginalize
the distribution over s
(j)
t
:
P(s
(i)
t−1
|h1:T ) = X
M
j=1
P(s
(i)
t−1
, s
(j)
t
|h1:T ) (2.42)
2
We similarly marginalize the brain state over s
(j)
t
:
f(xt−1|h1:T , s
(i)
t−1
) = X
M
j=1
f(xt−1|h1:T , s
(i)
t−1
, s
(j)
t
)P(s
(j)
t
|s
(i)
t−1
, h1:T ) (2.43a)
P(s
(j)
t
|s
(i)
t−1
, h1:T ) = P(s
(i)
t−1
, s
(j)
t
|h1:T )
P(s
(i)
t−1
|h1:T )
(2.43b)
We assume f(xt−1|h1:T , s
(i)
t−1
, s
(j)
t
) is Gaussian with moments xˆ
(i,j)
t−1|T
and Λˆ (i,j)
t−1|T
from (2.39).
Thus, (2.43a) is a mixture of Gaussians which we approximate with a single Gaussian with
the same mean and covariance xˆ
(i)
t−1|T
and Λˆ (i)
t−1|T
:
{xˆ
(i)
t−1|T
, Λˆ (i)
t−1|T
} = getMoments(ϕ
(1), . . . , ϕ(M)
)
ϕ
(j) ≡ {xˆ
(i,j)
t−1|T
, Λˆ (i,j)
t−1|T
, P(s
(j)
t
|s
(i)
t−1
, h1:T )}
(2.44)
where the terms in ϕ
(j)
are found from (2.39a), (2.39b), and (2.43b). We repeat this for each
regime i ∈ [1, M] to get all terms in (2.34) from (2.42) and (2.44) and thus complete the
recursion.
Finally, we calculate the MMSE estimate of the brain state density f(xt−1|h1:T ) in (2.33)
as:
{xˆt−1|T , Λˆ
t−1|T } = getMoments(ϕ
(1), . . . , ϕ(M)
)
ϕ
(i) ≡ {xˆ
(i)
t−1|T
, Λˆ (i)
t−1|T
, P(s
(i)
t−1
|h1:T )}
(2.45)
where ϕ
(i)
are found from (2.44) and (2.42), and estimate the regime state as that with the
largest posterior:
sˆt−1|T = argmax
i∈[1,M]
P(s
(i)
t−1
|h1:T ) (2.46)
A summary of the algorithm can be seen in Algorithm 2. The main difference between SMS
and prior smoothing methods is in the new additional assumption made in (2.38), which as
we show in Section 2.3.3 is critical to enable reliable performance and generalizability under
23
diverse system settings. We note that EC makes a different assumption in the joint regime
state calculation in (2.41) with details in55
.
Algorithm 2 Switching Multiscale Smoother
1: Run filter for SMDS ▷ Alg. 1
2: for t doT1
3: //Smooth
4: for j do1M
5: for i do1M
6: Find {xˆ
(i,j)
t−1|T
, Λˆ (i,j)
t−1|T
} ▷ (2.39)
7: Find P(s
(i)
t−1
, s
(j)
t
|h1:T ) ▷ (2.40),(2.41)
8: end for
9: end for
10: //Approximate
11: for i do1M
12: Find P(s
(i)
t−1
|h1:T ) ▷ (2.42)
13: Find {xˆ
(i)
t−1|T
, Λˆ (i)
t−1|T
} ▷ (2.44)
14: end for
15: //Estimate
16: Find {xˆt−1|T , Λˆ
t−1|T } ▷ (2.45)
17: Find ˆst−1|T , ▷ (2.46)
18: end for
2.2.3 Performance Metrics
In simulations where the ground truth brain state is known, we compare the estimated
brain states to the true brain states using the normalized root mean-square error (NRMSE)
metric25:
NRMSE(x1:T , xˆ1:T ) =
qPT
t=1(||xt − xˆt
||2)
2
qPT
t=1(||xt − x¯||2)
2
(2.47)
where x1:T and xˆ1:T are the true brain states and estimated brain states respectively, || · ||2
is the Euclidean norm, and x¯ is the mean of the true brain states. This metric reflects more
accurate estimation the lower it is, with perfect estimation at 0 and a chance level of 1, and
with values above 1 representing worse than chance performance. To more clearly illustrate
differences with prior works, we also evaluate the accuracy of brain state estimation using
24
the average correlation coefficient (CC) metric. We find this metric by calculating the CC
between the time series of the estimated and true brain states for each brain state dimension
and then averaging over dimensions.
We evaluate the estimated regime states based on accuracy, which is found from the
proportion of estimated regime states that match the true regime states:
Accuracy(s1:T , sˆ1:T ) = X
T
t=1
[st = ˆst
]/T (2.48)
where [st = ˆst
] is 1 if the values match and 0 otherwise. Accuracy ranges from 0 to 1 and
has a chance level of 1/M where M is the number of regimes.
2.2.4 Numerical simulation framework
We validate the developed inference algorithms using extensive Monte-Carlo numerical simulations. We simulate spike-field activity from randomly generated SMDS models under
various simulation settings. We then measure the performance of the inference algorithms
on the simulated data with the metrics in Section 2.2.3 and perform comparisons to prior
methods, which either assume stationarity or allow switching but only use a single scale of
observations. We ensure fair comparison of stationary and switching methods by estimating
the parameters of both the stationary models and the switching models using maximum
likelihood techniques56. For stationary methods, we fit a single set of parameters based on
the entire training data, and for switching methods we fit the model parameters for each
regime separately.
2.2.4.1 Numerical simulation settings
For numerical simulations, we randomly form SMDS models under diverse simulation settings
that we sweep. Details on specific values can be found in the Appendix and important
parameters we sweep are summarized below. Across simulations, we assume the time bin ∆
25
is 2 ms, and for each simulation setting, we randomly form 40 separate systems and simulate
40 × M seconds of data where M is the true number of regimes.
For the regime state settings, we sweep the number of regimes M and the dwell time
tdwell, which is the average time spent within the same regime before a switch occurs. The
dwell time can be calculated for regime j by tdwell = ∆/(1 − Φj,j ) where Φj,j is from (2.1).
We then assume that the probability of switching out of regime j to i is equal for all i ̸= j
and simulate regime state realizations using (2.1).
For the dynamics equation parameters, we sweep the dimension of xt
: dim(xt) = d and
randomly form the dynamics matrix A
(j)
and the noise covariance matrix Q
(j)
per regime
j. We repeat the below operations for each regime j, so the notation (j) will be omitted
for clarity. Motivated by8,25, we assume the eigenvalues of A are stable and comprised of
d/2 complex-conjugate pairs {rie
±jθi}i∈[1,d/2] as ri and θi characterize the steady-state time
dynamics of xt
. The term ri dictates how fast the signal decays; for a purely real eigenvalue,
a signal decays exponentially with a half-life of t1/2 = ln 2 −∆
ln ri
. The term θi dictates the
frequency with which the signal oscillates, given by θi
2π∆
time units. In our simulations,
we randomly select eigenvalues with decay half-lives between [27, 277] ms (i.e. a range of
[0.95,0.995] for ri) and frequencies between [0.8,5] Hz (i.e. a range of [0.010, 0.063] for θi)
based on values seen in prior literature8,31
.
We then randomly select eigenvectors for each eigenvalue such that the dynamics matrix
A can be decomposed into A = UDU−1 where D is a diagonal matrix of our selected
eigenvalues and the columns of U are their corresponding eigenvectors. This is also known
as the eigendecomposition of A. We call the collection of eigenvectors U for a given regime
the eigenbasis and refer to the collection of U matrices across regimes as the eigenbases of the
system. Unless otherwise stated, we assume the eigenbases are the same across regimes (i.e.
stationary) to minimize transient signals that arise when regimes switch as the transients can
interact with other settings and decrease interpretability of results. In some cases though,
26
we also let the eigenbases switch to explore this effect; if we specify eigenbases can switch,
we form A matrices such that the below holds to keep transient effects bounded:
{A : max
t
(svd(At
)) < σ} (2.49)
where svd gives the singular values of the input matrix and σ is what we call a transient
control value that we choose ad hoc. We also randomly form the noise covariance Q with
eigenvalues randomly chosen between 0.01 and 0.04. Finally, we simulate the brain states
using (2.2) and the previously generated regime states.
For the observation equation parameters, we sweep various settings related to the simulated spike-field observations including the field feature sampling frequency and signal-tonoise ratio (SNR), and the maximum firing rate of the spiking activity with ranges that
are common in neural data6,8. For the field features, we assume the number of features to
be 60 and randomly select parameters per regime based on the swept SNR to generate the
observations using (2.3). For the spiking activity, we assume the number of neurons to also
be 60 and randomly choose a base firing rate between 3-5 Hz. We then randomly select the
parameters based on the base and maximum firing rates and generate observations using
(2.5).
2.2.5 Experimental data
To demonstrate how our method performs on regime classification by fusing information
from multiscale observations in neurophysiological data, we apply it to spiking and local
field potential (LFP) activity in the lateral prefrontal cortex (lPFC) of one male rhesus
macaque monkey (Macaca mulatta) performing saccadic eye movements for fluid rewards.
All surgical and experimental procedures were performed in compliance with the National
Institute of Health Guide for Care and Use of Laboratory Animals and were approved by
the New York University Institutional Animal Care and Use Committee.
27
Each trial started with the monkey fixating on a central fixation point for a Baseline
interval (400-800 ms). Following the Baseline interval, the central fixation point was extinguished and two response targets were illuminated at random locations on a circle separated
by at least 60 degrees. Response targets were centered on fixation with eccentricity 10 degrees of visual angle. Following response target onset, which served as a Go cue, the monkey
executed a saccade to one of the two targets for a fluid reward. In the following, we discretize
the direction of the saccadic response by sub-dividing the response target circle into eight
equal-sized bins as seen in Figure 2.3. The Start, End, and Target Onset/Go task events
are defined as the trial start time, trial end time, and target onset time, respectively (Figure
2.3). Thus, each trial has two regimes, corresponding to the time periods before and after
response target onset and referred to as Start-Go and Go-End regimes, respectively. The
average trial length was 1.36 seconds, with the average time from the start of a trial to
the Go cue being 0.697 seconds, the average time from the Go cue to the saccadic response
Center Fixation Go Cue Saccade
Start-Go Period Go-End Period Start Target Onset/Go End
Reward
Figure 2.3: Behavioral task overview
A monkey performed a behavioral task to earn fluid rewards by executing a saccadic eye
movement response. The direction of the saccadic response was discretized by sub-dividing
the response target circle into eight equal-sized bins as shown in the figure. In each trial,
the saccadic response was to one of two peripheral targets presented following fixation of a
central target for 500-800 ms. This saccade direction was then discretized into the eight bins
as described above. Each trial concluded 400-500 ms following the saccadic eye movement.
The period between the start of a trial labeled Start and the Go cue is denoted as the
Start-Go period, and the period between the Go cue and the end of the trial labeled End is
denoted as the Go-End period.
28
being 0.176 seconds, and the average time from the Go cue to the end of a trial being 0.668
seconds.
2.2.5.1 Data preprocessing
During task performance, extracellular potentials were acquired by a 32-electrode microdrive
(Gray Matter Research)57 and sampled at 30 kHz. Eye position was continuously tracked
by an infrared eye-tracking system (ISCAN, USA) and sampled at 120 Hz. We extracted
LFP power features after common average referencing from six frequency bands: delta–
alpha (2-12 Hz), beta 1 (12-24 Hz), beta 2 (24-34 Hz), gamma 1 (34-55 Hz), gamma 2 (65-95
Hz), and gamma 3 (130-170 Hz)8,32 with short-time Fourier transform estimates from 100 ms
causal moving windows every 40 ms. We extracted spiking activity by band-pass filtering the
raw neural signals between 300-6600 Hz and detecting the threshold crossings 3.5 standard
deviations below the signal mean. We then downsampled the spiking activity to 250 Hz
as there were rarely more than 1 spike per time bin. The resulting data thus had 4 ms ∆
time-bins while the field features were observed at a slower 40 ms time-scale. We analyzed
data from eight experimental sessions. These sessions contained a combined 3411 trials.
We analyzed each session individually with a five-fold cross-validation. Here, we separated
each session into five folds. We estimated the model parameters using the training set
comprised of four folds, constructed the SMF with these parameters, and then applied this
SMF on the test set comprised of the remaining withheld fold which was in no-way used in
parameter estimation. For each training set and test set pair, we selected the top 40 LFP
power features and the top 20 spiking channels based on saccade direction classification in the
training set. Specifically, for each individual feature/channel and in the training set, we used
linear discriminant analysis (LDA) with an inner four-fold cross-validation scheme to classify
saccade direction and selected the features/channels with the highest average classification
accuracy – note this selection was purely based on the training set. In the training set, the
LDA classifier either used the average value of the LFP power feature or the average spiking
29
rate over the Go-End regime in a one-versus-rest scheme, i.e. classify the correct direction
among the 8 possible directions58. We use this same LDA classification to decode the brain
states in the test set within cross-validation as detailed in Section 2.2.5.3.
2.2.5.2 Model training
The behavioral task featured a switch between two regimes, which we term the Start-Go
and the Go-End regimes, respectively. Thus, in the training set, we label these periods
within each trial. We then assume we do not know the labels in the test set and when
performing regime and brain state decoding. Having regime labels in the training set, we
use an expectation-maximization (EM) learning algorithm for multiscale observations25 to
learn the dynamics equation parameters per regime and the observation equation parameters
across regimes. We take the dimension of the brain state as d = 10 based on common values
used in prior works8,32. We then learn the regime state parameters using the regime labels
concatenated over trials and maximum-likelihood techniques, and assume the initial regime
state distribution is uniform for fairness.
2.2.5.3 Switching multiscale filter (SMF) evaluation and performance metrics
We then apply SMF to the observations in the test sets to show that it can decode regimes
and fuse information from multiscale observations. We quantify the regime state decoding
using the accuracy metric in (2.48) to calculate how well the estimated regime states match
the withheld regime labels. To assess the quality of brain state decoding, we see how well the
decoded state averaged over the Go-End period could classify the direction of the saccadic
eye movement response using LDA – note that the ground-truth brain state itself is latent
and thus we quantify quality based on observed saccade directions. We do so in a oneversus-rest scheme58 where for each possible direction we classify which of the saccades in
the test set were in that direction to get an AUC metric, and then repeat this for each of
the 8 directions and average the AUC scores.
30
2.3 Results
We show in both numerical simulations and in neural activity recorded from prefrontal cortex
during behavior that the developed inference methods are capable of estimating both brain
and regime states and fusing information from both Gaussian and point process observations.
We use the Wilcoxon signed-rank test for all statistical comparisons and control for false
discovery rate using the Benjamini-Hochberg Procedure59. Details on simulation settings for
0.7
0.8
0.9
1.0
0.5
0.4
0.6
0.7
0.8
0.9
Brain State Estimation
NRMSE (lower is better)
Regime State Estimation
Accuracy (higher is better)
(a) *** *** *** *** *** ***
Legend:
Field Features
Spiking Activity
Stationary
Multiscale
Filter Smoother
Field Features
Spiking Activity
Switching
Multiscale
Base Simulation Settings:
xt dimension: d=8
st # of regimes: M=3
st dwell time: t(sec)=2
*** *** *** *** *** ***
*** *** *** *** *** ***
*** *** *** *** *** ***
(b)
(c)
(d)NRMSE
d=4 d=8 d=12 d=4 d=8 d=12
NRMSE
M=3 M=4 M=5 M=3 M=4 M=5
0.8
0.6
0.4
0.8
0.6
0.4
0.8
0.6
0.4
NRMSE
t=1/2 t=2 t=20 t=1/2 t=2 t=20
Figure 2.4: SMF and SMS estimation performance in simulation
SMF and SMS improve estimation by tracking regime-dependent non-stationarity in simulations. (a) Brain state estimation comparison between stationary and switching inference
methods across observation modalities: only field features, only spiking activity, and multiscale. Bars indicate the median NRMSE metric, box edges indicate the 25th and 75th
percentiles, and whiskers indicate minimum and maximum values excluding outliers that
exceed 1.5 times the interquartile distance. Base simulation settings are used in simulations throughout the figure unless otherwise overwritten by settings listed underneath bars.
Asterisks indicate statistical significance with ∗ ∗ ∗p < 0.0005. (b) Comparison between
multiscale stationary and multiscale switching inference NRMSE as brain state dimension
sizes increase. Trend lines connect at the median. (c) Same as (b) but when the number
of regimes M increases. (d) Same as (b) but when the dwell time t increases. For each
statistical test, n = 40 and p = 3.89 × 10−8
.
31
each section can be found in the appendix. Results figures with statistical tests performed
contain p-values in their captions.
2.3.1 Switching inference improves estimation by tracking regimedependent non-stationarity in simulations
We first find in simulations that the developed switching inference methods, SMF and SMS,
significantly outperform their stationary counterparts across all swept simulation settings
(Figure 2.4; p = 3.89×10−8
). Compared to prior stationary multiscale methods, the median
reduction in error (NRMSE) when using the novel switching methods is 23.7% and could
st Estimation Performance
0 π 2π 0 π 2π
(a) (b) (c)
% Dierence in Firing Rate Between Regimes % Dierence in Firing Rate Between Regimes
0.5
0.6
0.7
0.8
0.9
0
0.01
0.03
0.02
NRMSEstat. -NRMSEswch.
Accuracy
20 40 60 80 20 40 60 80
Directional tuning
[xt]
1
[xt]
2
0 π 2π
Switching ring rate
atan([xt]
2
\[xt]
1
)
Dierence in xt Estimation Performance
50%
50% 50%
Figure 2.5: Detecting regime changes from switching firing rate magnitude
SMF can detect changes in regime corresponding to switches in even just the magnitude of
spike firing rate in simulations. We set the dimension of the brain state xt to 2, the number of
regimes to 2, and the number of neurons to 15. (a) Illustration of how the magnitude of firing
rates switch for the two regimes: orange and red. Top left shows the directional tuning of
neurons, that is the xt vector directions corresponding to maximum firing rate for all neurons,
with each neuron associated with a different vector. The orange and red correspond to the
two different regimes for each neuron, i.e., the two different maximum firing rates. Right and
bottom show the firing rates for example individual neurons at the two regimes at a given
xt magnitude. Spike parameters are chosen for the first regime and then scaled down for the
second such that the ratio of firing rates between the two is a specified percentage for all
xt
. We refer to this percentage as the percent difference between the regimes. (b) Regime
state estimation as the percent difference in firing rate between regimes increases. Percent
differences are chosen at 10 percent intervals between 0 and 90 with 40 systems simulated
for each difference. Center line shows the median while shaded areas show 25th and 75th
percentiles. (c) Difference in error (NRMSE) between stationary and switching methods as
percent difference in firing rate increases. Positive values indicate better performance with
SMF. We find significant differences between stationary and switching at percent differences
≥ 30.
32
be as large as 34.4%. Further, the median CC achieved by the switching methods with
multiscale observations across swept simulation settings is 0.820, which is a 25.1% increase
over that from stationary multiscale methods. This reduction in error and increase in CC
is due to the switching methods properly tracking regime-dependent non-stationarity as
0.6
0.5
1.0
0.9
0.8
0.7
0.6
1.0
0.9
0.8
0.7
20 30 40 50 60 0.1 0.2 0.3 0.4 10 20 50 100 1/25 1/5 1 5 25
Regime State Estimation
Accuracy (higher is better)
Regime State Estimation
Accuracy (higher is better)
nt Maximum Firing Rate (Hz) yt Signal-to-Noise Ratio yt Sampling Frequency (Hz) st Dwell Time (sec)
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
0.6
1.0
0.9
0.8
0.7
0.6
1.0
0.9
0.8
0.7
20 30 40 50 60 0.1 0.2 0.3 0.4 10 20 50 100 1/25 1/5 1 5 25
Brain State Estimation
NRMSE (lower is better)
Brain State Estimation
NRMSE (lower is better)
nt Maximum Firing Rate (Hz) yt Signal-to-Noise Ratio yt Sampling Frequency (Hz) st Dwell Time (sec)
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
Legend:
Field Features
Spiking Activity
Multiscale
Base Simulation Settings:
50 Hz
0.2
50 Hz
1 sec
nt maximum firing rate:
yt signal-to-noise ratio:
yt sampling frequency:
st dwell time:
(a) (c)
(b) (d) Figure 2.6: SMF data fusion capability in simulation
Switching methods improve estimation by combining multiscale observations in simulations.
Bar, box, whisker, outlier, and asterisk conventions are the same as in Figure 2.4. Base
simulation settings in box are used throughout the figure unless otherwise overwritten by
individual settings listed underneath bars. (a) Brain state estimation performance across
observation modalities under various observation parameter settings. Performance with multiscale observations is compared to that with only field features yt and that with only spiking
activity nt
. For spike parameters, different maximum firing rates are simulated. For field
feature parameters, different signal-to-noise ratios and different sampling frequency are separately simulated. (b) Regime state estimation performance across observation modalities
under the same observation parameter settings used in (a). The number of regimes in these
simulations is 3 and thus chance accuracy is 1/3. (c) Brain state estimation as regime state
dwell time increases. (d) Regime state estimation as regime state dwell time increases similar
to (c). For each statistical test, n = 40 and p ≤ 3.12 × 10−7
.
33
can be seen in Figure 2.4(a) and Figure 2.9(a), where the median accuracy of regime state
estimation exceeds 0.9 for the base simulation settings. We also find that when we also
switch the eigenbasis as described in the Methods, we get similar significant reductions in
error of 25.0% and as large as 36.6% (p ≤ 3.89×10−8
) when compared to stationary methods.
Further, the benefit of switching methods increases as the number of regimes increase as seen
in Figure 2.4(c), due to the stationary methods having to generalize for all regimes with only
a single set of stationary parameters. Also, as seen in Figure 2.4(d), the performance of the
switching methods improves as the dwell time increases as longer dwell times allow for longer
periods of observations for regime identification. Finally, as seen in Figure 2.4(b), we find
that the benefit of switching methods over stationary methods stays consistent even as the
brain state dimension increases, which increases the difficulty of estimation since the same
number of observation channels must encode increasing amounts of brain state information.
We then find that even with limited differences between regimes, SMF can still meaningfully detect the changes in regime. Specifically, to simulate limited changes, we use only
spiking observations with 2 regimes and restrict what can switch to only the maximum firing
rate as illustrated in Figure 2.5(a) such that the firing rate under the second regime is a
fixed percentage smaller than that under the first regime, which we refer to as the percent
difference between the regimes. This results in median errors in brain state estimation from
a stationary filter that are within 0.03 NRMSE of those from a switching filter – thus this
particular simulation setup generates only limited differences between regimes as intended.
This simulation scenario was motivated by prior evidence43 of similar switches in firing rates
in spiking data recorded from the medial entorhinal cortex of mice in a random foraging
task. Even with such limited differences between regimes, SMF could still achieve meaningful regime estimation accuracy. Indeed, when the percent difference in firing rate between
regimes is greater than 40%, median accuracy values are greater than 0.8 as seen in Figure
2.5. Thus, SMF can yield useful estimates of the regime state even when differences between
regimes are limited.
34
2.3.2 Estimation performance improves by combining multiscale
observations
We find that the new multiscale switching methods can robustly fuse information from Gaussian and point process observations and improve estimation performance. We sweep various
settings such as maximum firing rate, field feature SNR, field feature sampling frequency,
and regime state dwell time as seen in Figure 2.6. Across all settings, SMF and SMS get
significantly better brain and regime state estimation when provided multiscale observations
compared to that when provided only single-scale observations (p ≤ 3.12×10−7
; Figure 2.6).
Specifically, when compared to prior switching linear observation cases (equivalent to SMF
and SMS being provided with only Gaussian observations), the developed switching multiscale methods can reduce error (NRMSE) by a median of 25.1% over the 40 systems with the
lowest Gaussian observation SNR using SMS by fusing the additional information available
from the spiking activity. This improvement over the single-scale linear observation case is
reflected in the CC metric as well where for the lowest SNR systems, CC with multiscale
is 82.8% larger than that with only linear Gaussian observations as seen in Figure 2.10(a).
Overall, we find that even if estimation performance with only field features is considerably
worse than that with only spiking activity, or vice-versa, SMF and SMS can still yield improved performance with multiscale observations. This suggests that as long as there is some
information in the worse performing single-scale modality, the multiscale switching methods
can still fuse it with the better modality to achieve improved performance.
2.3.3 Inference with switching multiscale smoother (SMS) is
reliable across conditions
Our smoother is different from prior methods in two aspects. First, prior switching methods
were for single-scale observations whereas our methods allow for multiscale observations.
35
Second, even considering the smoothing inference step alone, we make a different approximation compared to prior methods – Kim’s method46 and Expectation Correction (EC)55 –
as described in Equation (2.38). We make this new approximation to address the challenge
Switching Eigenbases of Dynamics Matrices (A) Stationary Eigenbases of Dynamics Matrices (A)
0.1
-0.3
-0.2
-0.1
0
0.1
0
0.4
0
0.1
0.2
0.3
0
0.6
0.1
0.2
0.4
0.9
0.4
0.8
0.7
0.6
0.5
xt Estimation Performance Difference with Filter Difference with SMS
st Estimation Performance Difference with Filter Difference with SMS st Estimation Performance Difference with Filter Difference with SMS
Kim EC SMF SMS Kim EC
SMS Kim EC Kim EC
SMF SMS/Kim EC SMS/Kim EC EC SMF SMS/Kim EC
SMS/Kim EC EC
Δ Accuracy (smoother - decoder)
Δ Accuracy
Δ Accuracy
Δ Accuracy (positive = SMS better)
Accuracy (higher is better)
0.1
0.2
Δ NRMSE
0
Δ NRMSE (positive = SMS better)
0
0.1
0.2
0
0.1
Δ NRMSE
0.4
0.6
0.8
1
0.9
0.4
0.8
0.7
0.6
0.5
Accuracy (higher is better) NRMSE (lower is better) (b)
(c) (d) (e)
xt Estimation Performance Difference with Filter Difference with SMS
(f) (g) (h)
(a)
0.1
-0.2
-0.4
0
SMS Kim EC
Δ NRMSE (decoder - smoother)
0.4
0.6
0.8
1
SMF SMS Kim EC
NRMSE (lower is better)
-0.01
0
-0.02
0.02
0
Legend:
Filter Smoother
Field Features
Spiking Activity
Multiscale
Base Simulation Settings:
50 Hz
0.2
50 Hz
1/5
nt maximum firing rate:
yt signal-to-noise ratio:
yt sampling frequency:
st dwell time (sec):
* * * * * * * * * * * *
* * *
* * *
* * * * * * * *
* * *
* * * * * *
* * *
* *
* * *
0
-10
-20
10
20
2
1
0
-10
-20
10
20
2
1
0
-10
-20
10
20
2
1
0
-10
-20
10
20
2
1
t 1 2 3 4 5 (sec) t 1 2 3 4 5 (sec) t 1 2 3 4 5 (sec) t 1 2 3 4 5 (sec)
xt
st
xt
st
xt
st
xt
st st Dwell Time: 2 seconds st Dwell Time: 0.2 seconds st Dwell Time: 2 seconds st Dwell Time: 0.2 seconds
Figure 2.7: SMS generalizability compared to prior methods
SMS smoothing reliably improves estimation compared to filtering across diverse settings,
thus enabling generalizability unlike prior smoothing methods. Bar, box, whisker, and outlier
conventions are the same as in Figure 2.4. (a) Example traces when the eigenbases, comprised
of the eigenvectors of the dynamics matrix A, can switch across regimes. (b) Example regime
and brain state traces when the eigenbases are stationary across regimes. (c) Brain (top)
and regime (bottom) state estimation using the filter SMF and the three smoothing methods,
SMS, Kim’s method, and EC, in the case with switching eigenbases and a dwell time of 0.2
seconds. SMS outperforms Kim’s method and EC in smoothing. (d) Pair-wise differences in
brain and regime state estimation between the filter SMF and the three smoothing methods.
Positive values indicate better performance using a smoother. (e) Pair-wise differences in
estimation between SMS and other smoothing methods. Positive values indicate better
performance using SMS. SMS outperforms prior smoothing methods. (f–h) Equivalent plots
and conventions to (c–e) applied to systems with stationary eigenbases. Insets in (h) show
zoomed in versions of their respective underlying plots as SMS performs very closely to prior
smoothing methods even in this case while allowing for generalizability to other settings as
seen in panels (c–e). Simulation settings used through this figure listed in Base Simulation
Settings box. Asterisks show whether significant or not (n=40,*p ≤ 0.05).
36
of reliable performance across different simulation scenarios, which we refer to as generalizability. In particular, we find that, even with single-scale observations, while prior methods
are accurate in some simulation scenarios, they can lead to unreliable smoothing under other
simulation scenarios. Here, we find that our new switching smoother – whether multiscale or
single-scale – can address this challenge of reliable performance across diverse settings and
thus generalize well.
Under a setting tested from Section 2.3.2 with additionally switching eigenbases, we
find that prior smoothing methods – Kim’s method46 and EC55 – yield significantly worse
error compared to SMS across all observation types whether single-scale or multiscale (p ≤
9.83 × 10−7
) with settings and plots in Figure 2.7(e). We note that this is the case even
though the settings are not extreme in value, with all systems being numerically bounded.
Furthermore, we see that prior smoothing methods can yield worse error than even their
filtering counterparts. For example, for the spiking observation modality, this occurs for
more than 50% of the systems tested (25/40 systems for Kim’s method, 29/40 for EC) as seen
in Figure 2.7(d). In comparison, SMS reduces error and increases accuracy compared to its
filtering counterpart (SMF) across all observation types, whether single-scale or multiscale,
and for every system tested not only under these settings but also for all settings in Section
2.3.2. Also importantly, this improvement in generalizability across diverse settings comes
at only a slight price in performance in some specific settings. In particular, for settings
from Section 2.3.2 which have stationary eigenbases and thus yield systems that are easier
to track for prior methods, SMS only has a 4.0 × 10−4 and 3.16 × 10−4 median increase
in NRMSE compared to Kim’s method and EC respectively. Further, SMS’s accuracy in
regime estimation in these settings is comparable to Kim’s methods and only 4.1 × 10−4
lower compared with EC. A subset of these comparisons shown on the right side of Figure
2.7. These results show that SMS smoother reliably improves estimation compared to the
SMF filter and can robustly generalize across diverse conditions unlike prior methods.
37
2.3.4 SMF detects task-related regimes in spike-field NHP data
We then find that using the developed SMF method with real spike-field neural observations,
we can causally and accurately detect regimes related to the saccade task performed by a
monkey (Figure 2.8). We train and test the switching multiscale dynamical system (SMDS)
Figure 2.8: Real-time decoding in experimental data recordings
The switching multiscale filter (SMF) method can perform real-time multiscale decoding
of task-related brain and regime states in data recordings from a non-human primate and
combine information from spike and LFP scales. (a) Resulting conditional probability of
the Go-End regime from SMF time-locked to the Go cue. Center lines represent the median
across trials from all sessions while shaded areas represent 25th and 75th percentiles. Perfect
decoding would be a step signal 0 valued prior to Go and 1 valued after Go. (b) Individual
trials from the five test folds in one example session with colors indicating the conditional
probability of the Go-End regime according to the color bar. Trials are time-locked to
the Go cue and from SMF being applied to multiscale spike-LFP observations. (c) The
same probability traces as in (a) but time-locked to detected saccade onset at time 0. (d)
Performance in decoding Start-Go and Go-End regimes across observation modalities. Bar,
box, whisker, outlier, and asterisk conventions are the same as in Figure 2.4. Right side
shows pair-wise differences between performance with multiscale to that with LFP (blue) and
spiking (red) with positive values indicating better multiscale performance. (e) Performance
in classifying saccade direction using the mean of decoded brain state signal from Go-End
regime (see Methods) across observation modalities with pair-wise differences on right as in
(d). n = 40 for all statistical tests.
38
models in cross-validation (see Section 2.2.5.2). We find that the switching multiscale filter
(SMF) can successfully decode whether multiscale spike-LFP activity belong to the Start-Go
or the Go-End regimes of the task with a median accuracy of 0.808 as seen in Figure 2.8(d).
This estimation is done based on SMF computing the conditional probabilities of the two
task-related regimes, and then picking the one with the higher conditional probability (i.e.
Start-Go regime and Go-End regime, see Section 2.2.5.2).
Furthermore, the computed conditional probability of the Go-End regime for even individual trials clearly and rapidly increase once the saccade targets appear at the Go cue as
seen in Figure 2.8(b) and Figure 2.11. The median time-locked trace of these probabilities
are shown in Figure 2.8(a). We note that this increase in probability occurs prior to saccade
initiation as seen in Figure 2.8(c), suggesting that SMF can detect changes in multiscale
neural dynamics prior to any detectable movement.
We also find that not only could we classify regimes related to the behavioral task, but
also we could use the decoded brain state from SMF to accurately classify the direction of
the saccade with a median AUC of 0.935 as seen in Figure 2.8(e). This AUC is significantly
better than the AUC from stationary models (n = 40, p = 0.00838). This result suggests that
SMF can accurately find task-relevant brain and regime states from multiscale observations.
Finally, consistent with our results in Section 2.3.2, we find that regime and saccade direction classification performance is significantly better using multiscale spike-LFP observations
compared to using single-scale observations (whether spike or LFP) as seen throughout Figure 2.8. We also find that this result holds even with reduced LFP SNR as seen in Figure
2.12 where we synthetically reduce the SNR of the field features by adding white Gaussian
noise. Here, even with the reduced performance of LFP-only decoding, SMF is still able
to fuse information across LFP and spiking activities to yield higher regime and saccade
direction classification performance compared to using each modality alone (Figure 2.12).
This result is again consistent with our extensive numerical simulation results in Figure 2.6
showing that across a diverse range of field potential SNRs – covering situations with field
39
potential decoding being significantly better, worse, or about the same as spike decoding –,
SMF improves the decoding performance by aggregating information from both field potential and spiking observation modalities. Overall, we find that the performance of SMF on
spike-field activity in prefrontal cortex yields results that are consistent with results based on
simulated data. The results show the potential of the developed switching multiscale inference methods for investigations of neural activity with switching dynamics and/or encoding
and for brain-machine interface applications.
2.4 Discussion
We developed the switching multiscale dynamical system (SMDS) model that generalizes
the common linear dynamical system (LDS) and developed companion filtering/decoding
and smoothing algorithms. Our model incorporates two advances: it augments the LDS by
modeling an additional binary-valued point process observation modality; it further relaxes
the stationarity assumption of the resulting multiscale dynamical system model with the
inclusion of an additional regime state that governs the model parameters. We also designed a real-time decoding algorithm for the SMDS model termed the switching multiscale
filter (SMF), which simultaneously estimates regime and brain states from spike-field activity while maintaining a tractable computational cost. We then developed a fixed-interval
smoothing algorithm termed the switching multiscale smoother (SMS), which can further
increase estimation accuracy compared to SMF by utilizing the entire length of data across
a broad range of model settings when causal estimation is not needed.
To validate the methods, first, we conducted extensive numerical simulations and compared with stationary multiscale methods or switching single-scale models (e.g. switching
LDS) that use only one observation modality. Doing so, we showed that our framework
40
improves estimation for spike-field activity with regime-dependent dynamics by successfully leveraging these different observation modalities simultaneously to both track regimedependent non-stationarity and decode brain states. Second, we tested our methods under
extensive simulation settings to show their robustness and characterize how various settings
can impact estimation performance. We also showed that even in the single-scale case when
just using linear Gaussian observations, SMS can successfully operate under more diverse
settings compared to prior switching smoothing methods. This success is quantified by examining whether SMS can increase accuracy compared to SMF across diverse settings. This
result shows that the new switching smoother even in the case of linear Gaussian observations alone can better generalize to diverse system settings compared to prior switching
smoothers. Finally, we used experimental recordings of neural activity in the prefrontal
cortex of one non-human primate and validated that our methods can perform real-time
decoding and simultaneously estimate task-related brain and regime states from multiscale
spike-LFP activity.
Our work extends traditional regime-detection algorithms for BMIs. Prior work has
used hidden-Markov models (HMM’s) for task-related regime detection in parallel with a
continuous state inference algorithm60. But in this case, the HMM does not influence the
model used by the inference algorithm and so the inference algorithm does not directly
consider the switching. Other prior work has addressed the case where a single but unknown
regime is in effect over the entire course of a trial61. This is done by maintaining a finite
number of separate filters each corresponding to one possible regime and, on a trial-by-trial
basis, updating the likelihood of which regime is in effect for that entire trial61. However,
this approach is designed for situations where only one regime is in effect over the entirety
of a trial and not for situations where the regime can actually switch over the course of
a trial or during longer naturalistic recordings that lack a controlled trial-based structure.
Applying the approach of this prior work to the latter situations can lead to error in the
filters accumulating since the separate filters do not interact with each other; this error
41
accumulation has been shown in the classical switching LDS systems3
. Finally, other prior
work has considered situations where the regimes can switch, but the algorithm only takes
single-scale Gaussian observations and is not applicable to the multiscale case as it does not
perform inference on simultaneous Gaussian and point process observations14. In summary,
the modeling and inference framework developed here has the potential to improve BMIs
in more naturalistic scenarios and to further take advantage of multiscale measurements to
enhance performance and robustness.
Motivated by prior work that show spiking activity and field features could be approximately conditionally independent conditioned on the unobserved brain states6,8, we made a
similar assumption in our inference algorithms. Consistently, we showed in the experimental
data from prefrontal cortex that our models and inference methods were successful in estimating both task-related regimes as well as saccade direction from spike-field activity. To
develop neurotechnologies, this assumption also yields computational advantages in terms of
limiting the number of parameters to learn and thus reducing the amount of training data
and improving model accuracy and generalizability. Nevertheless, future work that further
models any potential conditional dependencies could provide a scientific tool to explore the
functional connectivity between spiking activity and field features62–65. However, any such
future work will have to address the issue of having substantially larger number of parameters that need to be fitted with limited training data, which would make the models prone
to overfitting. Thus, new learning methods may be required to address this issue.
In the experimental data from prefrontal cortex, we could label the regimes based on
external task cues in the training set, and thus learn our models parameters supervised with
respect to the regime state. However, in other experimental paradigms, task or neural regimes
may be due to internal rather than external cues. In these scenarios without reliable regime
or task labels, there is a need for unsupervised learning methods for switching dynamical
systems. We tackle this challenge in the following chapter. Once accurate model parameters
are learned, our developed filter and smoother will then provide a complementary module
42
for efficient causal and non-causal brain and regime state estimation using the learned model
parameters.
2.5 Appendix
2.5.1 Details on simulation settings
For the main set of comparisons in Section 2.3.1, we sweep all combinations of xt dimension
d = [4, 8, 12], number of regimes M = [3, 4, 5], and regime dwell times tdwell = [0.2, 2, 20] seconds. For observation equation parameters, we set yt SNR randomly to between [0.2,0.24],
yt sampling frequency to 50Hz, and the maximum firing rate for nt to 40Hz. For comparisons
where we allow for switching eigenbases, we set observation equation parameters to be stationary and set d = 8, M = 3, the transient control term σ = 3, and swept tdwell = [0.2, 2, 20]
seconds. For comparisons where only the magnitude of spiking activity parameters switch,
we set d = 2, M = 2, tdwell = 2, and the number of neurons to 15.
For the main set of comparisons in Section 2.3.2, as base settings, we set d = 8, M = 3,
tdwell = 1 second, maximum firing rate of nt to 50Hz, yt SNR to 0.2, and yt sampling
frequency to 50Hz. We then individually sweep values for the following settings while holding
the remaining at the base settings: maximum firing rate of nt
, yt SNR, yt sampling frequency,
and tdwell. For maximum firing rate of nt
, we swept frequencies 20 – 60Hz in 10Hz increments.
For yt SNR, we swept values 0.1 – 0.4 in 0.1 increments. For yt sampling frequency, we swept
frequencies [10, 20, 50, 100]Hz. For tdwell, we swept times [1/25, 1/5, 1, 5, 25] seconds.
For Section 2.3.3 where we allow eigenbases to switch, we set d = 8, σ = 3, M = 3,
tdwell = 0.2 seconds, maximum firing rate of nt to 50Hz, yt SNR to 0.2, and yt sampling
frequency to 50Hz.
43
0.7
0.8
0.9
1.0
0.5
0.4
0.6
0.7
0.8
0.9
Brain State Estimation
CC (higher is better)
Regime State Estimation
Accuracy (higher is better)
(a) *** *** *** *** *** ***
Legend:
Field Features
Spiking Activity
Stationary
Multiscale
Filter Smoother
Field Features
Spiking Activity
Switching
Multiscale
Base Simulation Settings:
xt dimension: d=8
st # of regimes: M=3
st dwell time: t(sec)=2
*** *** *** *** *** ***
*** *** *** *** *** ***
*** *** *** *** *** ***
(b)
(c)
(d)CC CC CC
d=4 d=8 d=12 d=4 d=8 d=12
M=3 M=4 M=5 M=3 M=4 M=5
0.8
0.6
0.4
0.8
0.6
0.4
0.8
0.6
0.4
t=1/2 t=2 t=20 t=1/2 t=2 t=20
Figure 2.9: SMF and SMS performance in CC
SMF and SMS improve estimation based on average correlation coefficient (CC) metric by
tracking regime-dependent non-stationarity in simulations. Plotting conventions are the
same as in Figure 2.4. In contrast to the normalized root mean-square error (NRMSE)
metric where the lower the value the better the estimation, CC is the opposite with the
higher the value the better. Thus, trends in NRMSE seen in Figure 2.4 are consistent and,
as expected, reversed for trends in CC seen here. (a) Brain state estimation comparison
between stationary and switching inference methods across observation modalities: only
field features, only spiking activity, and multiscale. The metric is found by calculating the
CC per brain state dimension and then averaging over dimensions. (b) Comparison between
multiscale stationary and multiscale switching inference CC as brain state dimension sizes
increase. (c) Same as (b) but when the number of regimes M increases. (d) Same as (b) but
when the dwell time t increases.
44
0.6
0.5
1.0
0.9
0.8
0.7
0.6
1.0
0.9
0.8
0.7
20 30 40 50 60 0.1 0.2 0.3 0.4 10 20 50 100 1/25 1/5 1 5 25
Regime State Estimation
Accuracy (higher is better)
Regime State Estimation
Accuracy (higher is better)
nt Maximum Firing Rate (Hz) yt Signal-to-Noise Ratio yt Sampling Frequency (Hz) st Dwell Time (sec)
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
0.4
0.7
0.6
0.5
0.2
0.8
0.6
0.4
20 30 40 50 60 0.1 0.2 0.3 0.4 10 20 50 100 1/25 1/5 1 5 25
Brain State Estimation
CC (higher is better)
Brain State Estimation
CC (higher is better)
nt Maximum Firing Rate (Hz) yt Signal-to-Noise Ratio yt Sampling Frequency (Hz) st Dwell Time (sec)
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
Legend:
Field Features
Spiking Activity
Multiscale
Base Simulation Settings:
50 Hz
0.2
50 Hz
1 sec
nt maximum firing rate:
yt signal-to-noise ratio:
yt sampling frequency:
st dwell time:
(a) (c)
(b) (d) Figure 2.10: SMF data fusion in CC
Switching methods improve brain state estimation by combining multiscale observations in
simulations based on average correlation coefficient (CC). Plotting conventions and simulation settings used are the same as in Figure 2.6. (a) Brain state estimation performance
across observation modalities under various observation parameter settings. CC is calculated
as described in 2.9. (b) Regime state estimation performance across observation modalities
under the same observation parameter settings used in (a). (c) Brain state estimation as
regime state dwell time increases. (d) Regime state estimation as regime state dwell time
increases similar to (c).
45
Figure 2.11: Conditional probabilities from each individual recording session
Top panel per session: Conditional probability time-locked to the Go cue across observation
modalities. Bottom panel per session: Individual trials from the five test folds from that
session.
46
Figure 2.12: SMF data fusion in low SNR
SMF can combine information from spike and LFP scales even with reduced LFP SNR in
recordings from a non-human primate with synthetically added LFP noise. The synthetic
noise was added to each LFP channel such that the resulting low SNR was thus 20% of the
original SNR. (a) Resulting conditional probability of the Go-End regime from SMF timelocked to the Go cue. The reduced SNR LFP features were used along with the original
spiking activity data. (b) Individual trials from the five test folds in one example session
with colors indicating the conditional probability of the Go-End regime according to the
color bar. (c) The same probability traces as in (a) but time-locked to detected saccade
onset at time 0. (d-f) Conditional probabilities found from the original spiking activity and
original LFP features with no added white noise which is identical to Figure 2.8(a-c). (g)
Performance in decoding Start-Go and Go-End regimes with both the low SNR LFP features
on the left and the original LFP features on the right. (h) Performance in classifying saccade
direction using the mean of decoded brain state signal from Go-End regime (see Methods
2.2). n = 40 for all statistical tests.
47
Chapter 3
Unsupervised learning of stationary and switching
dynamical system models from Poisson observations
3.1 Introduction
Investigating neural dynamics underpinning naturalistic behaviors requires models of neural
population activity that can balance explanatory power and interpretability. One class of
models that can offer such a balance is a switching dynamical system model1,3–5. These
switching models often use linear dynamics to describe the time evolution of neural population activity in terms of an underlying latent state; however, they allow for a switch
non-stationarity such that a series of distinct linear dynamics called regimes can be pieced
together to capture more complex neural dynamics. The ability of these models to capture switch non-stationarity could enable investigations of complex movement patterns and
switching tasks5,14,66,67, switches in attention and decision-making strategies4,42,68, and even
switches in how behavior is encoded in neural activity43. However, learning accurate switching models based on neural population activity data is a necessity for such endeavors, especially in the absence of reliable trial labels and/or regime labels.
Neural population activity is typically measured in the form of spiking activity15. The
binary nature of a spike event16–22 can introduce challenges in the inference stage of various
learning frameworks aimed at training data-driven models. We define inference as using
48
observed neural activity to estimate latent states that give rise to both behavior and neural
activity15,30,32,69 given model parameters. Prior learning methods for both stationary and
switching dynamical systems that model spikes as discrete Poisson observations have used
the Laplace approximation for inference4,16,18,24–27. However, this approximation has the
potential to lead to inaccurate learning as we will also show in our Results since it may not
be able to capture the broader properties of the target of approximation28. Thus, there is a
need for more accurate learning methods that can better learn the model parameters for both
stationary and switching dynamical system models, and in turn facilitate the investigations
of neural dynamics underlying behavior.
For unsupervised learning, expectation-maximization (EM) is a commonly used two-step
iterative framework. EM finds the model parameters according to a cost function that
maximizes the log-likelihood of the observed neural activity in the training data70, without
any information about the latent states or regimes. The two steps are the expectation step
(E-step) and the maximization step (M-step), where the E-step involves the inference of
unobserved latent states. For switching systems, such exact inference is inherently intractable
regardless of the observation modality3
, thus requiring an additional layer of approximation.
Prior work called variational Laplace-EM (vLEM)4 has made great progress using a meanfield variational approximation71,72 in a variational EM framework. However, variational EM
has to modify the cost function to maximize the lower bound to the log-likelihood rather
than the log-likelihood itself58,72,73. More importantly, in the E-step, vLEM also uses the
Laplace approximation, which has the potential to lead to inaccurate learning.
An alternative to a variational EM framework is what we will call a switch EM framework.
The switch EM framework uses switching filters and smoothers1,3,45,46,55 in the E-step to
directly maximize the log-likelihood rather than its lower-bound as is done in variational EM.
Switch EM has been successful for continuous Gaussian observations13,14 because accurate
stationary filters can be incorporated in the switching filter in this case, specifically the
Kalman filter which is optimal for Gaussian observations58,74. Our work in1 has extended
49
switching filters to allow for Poisson observations using the Laplace approximation in the
spirit of prior point process filters (PPF)17,20,21,24–26. However, because the switch filtering
method in1
relies on incorporating stationary filters that are Laplace approximation-based,
it does not have sufficient accuracy for model learning as we will show in our Results. Thus,
there is a need for a more accurate filter to enable a successful switch EM framework for
Poisson observations.
With the goal to enable accurate unsupervised learning of both stationary and switching
dynamical systems with Poisson observations, here we develop a novel deterministic sampling
filter for Poisson observations called the Poisson Cubature Filter (PCF). The development
of PCF for learning was motivated by the empirical results using a Laplace approximationbased filter in EM learning frameworks, which suggested the need for a more accurate filter for
learning purposes. Prior deterministic sampling filters75–78, however use a specific non-linear
observation formulation that is not applicable to Poisson observations, thus necessitating a
new filter. We start the development of PCF from a minimum mean-squared error (MMSE)
approach to estimation which consists of various MMSE terms. However, these necessary
MMSE terms are difficult to calculate analytically for Poisson observations. PCF overcomes
this challenge by converting these terms using novel derivations into a form that enables
the use of a numerical integration tool to approximate the terms. We then embed PCF in
both stationary and switch EM frameworks as overviewed in Figure 3.1 to enable accurate
unsupervised learning of model parameters from just spiking activity observations.
We first validate our methods with numerical simulations. We show that EM frameworks
with PCF successfully learn accurate model parameters for both stationary and switching
systems. Compared with learning methods that use the Laplace approximation for inference,
the new method learns the parameters significantly more accurately and improves the neural
self-prediction, behavior decoding, and regime decoding. Also, comparisons with the same
switch EM framework that instead uses the Laplace approximation-based PPF shows that
successful learning depends on using the new PCF. We finally show that switch EM with PCF
50
is more data-efficient and better identifies the regimes in data compared to prior learning
methods.
We then show the success of both stationary and switch EM learning with PCF in motor
cortical spiking activity data from a non-human primate performing a continuous pointto-point reach task overviewed in Figure 3.2. We find that the new PCF-based learning
method for switching models succeeds in tracking regimes as evident by its significantly
better decoding metrics compared to learned stationary models. We also find that PCFbased learning methods outperform Laplace approximation-based methods in terms of the
neural self-prediction and behavior decoding ability of learned parameters for both stationary
and switching dynamical system models. Interestingly, despite being fully unsupervised with
respect to behavior, we find that PCF-based learning uncovers interpretable and behaviorrelevant regimes unlike prior learning methods. These results show the potential of the
PCF-based EM frameworks for learning stationary and switching dynamical system models
from neural datasets.
3.2 Methods
In this section, we introduce the tool of numerical integration based on cubature rules and
provide details on how we use it in our approach to deriving the novel Poisson Cubature
Filter (PCF). We then show how PCF can be embedded in unsupervised learning frameworks
for both stationary and switching dynamical systems with Figure 3.1 serving as a visual
summary. We finally detail the validation methods including performance metrics, simulated
data setup, and applications of our methods on experimental data.
51
3.2.1 Background: Numerical integration based on sphericalradial cubature rules
We first briefly introduce the tool of numerical integration which we use to approximate
important terms when deriving the PCF (see references in75–77 for details). Numerical integration tools approximate solutions to complex integrals with weighted sums over a sparse
set of points. One use case of this is to approximate expected values of functions of Gaussians
Spiking activity
Poisson
Cubature
Filter (PCF)
Decoded
latent states
Parameters E-Step
M-Step
PCF
RTS Smoother
Spiking activity
Update parameters
E-Step
M-Step
Switching Smoother
RTS
Spiking activity
Update parameters
Switching Filter
PCF
Estimated
Estimated
(c) Stationary EM Learning (d) Switch EM Learning
(b) PCF Overview
(a) Causal Filtering
Prediction Measurement
Update
Cubature weights
and points
Past Future
Spiking
activity
Figure 3.1: Overview of unsupervised learning with PCF
Poisson Cubature Filter (PCF) and how it enables EM learning for stationary and switching systems with Poisson observations. (a) PCF takes spiking activity modeled as Poisson
observations and model parameters to causally decode unobserved latent states. (b) PCF is
a recursive filter that is comprised of a prediction step followed by a measurement update
step that is performed by utilizing deterministically sampled points based on cubature rules
from numerical integration techniques. (c) PCF is embedded in an EM framework to enable
accurate unsupervised learning of model parameters using only spiking activity as input. It
is paired with an RTS smoother in the E-Step of EM. (d) PCF is also embedded in a switch
EM framework to enable unsupervised learning of parameters for a switching dynamical system with Poisson observations. It is first embedded within a switching filter and then paired
with a switching smoother to comprise the E-Step of switch EM.
52
since expected values of Gaussians are integrals over the Gaussian probability distribution.
For PCF, we use the 5th-degree spherical-radial cubature rule of76 to deterministically generate weights and points. As provided in76, for a Gaussian random variable X ∈ R
d with
mean and covariance µ and Λ, the expected value of g(X) where g(·) is an arbitrary function
can be approximated as:
Target
Onset
Next Target
Onset Reach
t
Current Target Cursor Position
Acquire
(450 ms)
Figure 3.2: Behavioral task overview
Behavioral task overview. The task is a point-to-point task consisting of a continuous series
of reaches to targets. To acquire the current target, the subject must hold on the target for
450 ms. Upon doing so, a new random target from the 8-by-8 grid is then presented, thus
starting the next trial. The task is continuous and contains no pre-movement delay periods.
53
EX[g(X)] ≈
2d
X2+1
i=1
wig(µ +
√
Λξi)
w1 =
2
d + 2
w2:2d+1 =
4 − d
2(d + 2)2
w2d+2:2d
2+1 =
1
(d + 2)2
ξ1 = 0
ξ2:2d+1 = {±√
d + 2 · ej
: j = 1, 2, . . . , d}
ξ2d+2:2d
2+1 = {±√
d + 2 ·
1
√
2
(ek ± el) : k < l, k, l = 1, 2, . . . , d}
(3.1)
where wi and ξi ∈ R
d are the weights and points from the 5th-degree cubature rule. Here,
√
Λ can be found from the Cholesky decomposition of √
Λ and ej are standard unit vectors.
Further, these weights and points can be separately generated offline. See Appendix section
3.5.1 for an example point set for dimension d = 3. There exists alternative options for the
weights and points such as the Gauss-Hermite quadrature (GHQ) rule75. However, our choice
of the cubature rule is important as the number of points given by GHQ scales exponentially
with d, thus making it prohibitively expensive for high latent state dimensions. The 5th
degree spherical cubature rule in comparison has polynomial scaling with 2d
2 + 1 points
while offering comparable accuracy. We expand on this in Appendix section 3.5.1.
3.2.2 The Model
We now detail the stationary form of the linear dynamical system model with Poisson observations which the filter is derived for with extensions to switching systems detailed in
Methods section 3.2.5. We start by modelling the latent brain state xt ∈ R
d which forms the
54
latent representation of a behavior being performed as a random walk driven by zero mean
Gaussian noise wt with covariance Q:
xt = Axt−1 + wt (3.2)
Here, the dynamics matrix A dictates how the latent state lineally evolves in time. The initial
state x0 is assumed to be Gaussian with mean µ0 and covariance Λ0. The choice of using
linear dynamics is motivated by the benefits they can provide in terms of interpretability,
computational efficiency in both learning and inference, data efficiency, and ability to extend
to real-time applications such as brain-machine interfaces (BMIs)3,7,12,18,29–31,33–35,49,50,53,79
.
Indeed, prior works in real-time neural engineering applications have shown that models
with linear dynamics can enable accurate decoding and modeling of neural data12,33,35,49,53,79
.
Moreover, our extension to switching linear systems can still maintain similar benefits due
to linear dynamics per regime, while further increasing explanatory power (as seen in Figure
3.5b) by approximating complex non-linear dynamics through a switching mechanism (See
Methods section 3.2.5). We refer to the parameters associated with the latent state dynamics
in (3.2) as the dynamics equation parameters.
We use a Poisson observation modality to model recorded spiking activity. We assume
for neuron i that the latent state xt
is encoded in the instantaneous firing rate λi and that
the number of spikes within a time bin ∆ is Poisson distributed with rate λi(xt)∆. With
small-valued ∆, this is equivalent to a point process observation modality1,6,17,19–22,24,25. The
conditional likelihood distribution of the spiking activity from C neurons nt = [n
1
t
, . . . , nC
t
]
T
is then:
P(nt
|xt) = Y
C
i=1
λi(xt)∆n
i
t
exp
−λi(xt)∆
n
i
t
!
(3.3)
Here, we use a log-link function for the firing rate:
λi(xt)∆ = exp(αi + β
T
i xt) (3.4)
We assume the spiking activity of separate neurons are conditionally independent of each
other conditioned on the latent state xt as consistent with prior works17,20–22. As such, the
likelihood mean and covariance of nt
is as follows:
E[nt
|xt
] = [λ1(xt)∆, . . . , λC(xt)∆]T
V[nt
|xt
] = diag(λ1(xt)∆, . . . , λC(xt)∆)
(3.5)
We refer to the parameters related to observation encoding in (3.3) and (3.4) as the observation equation parameters. Together, (3.2) – (3.4) constitute the model which is also referred
to as a Poisson Linear Dynamical System (PLDS) model4,16,18,24,25,80
.
While for continuous observations there are efficient and accurate filters like the unscented
Kalman filter (UKF) and cubature Kalman filter (CKF)58,75–77,81,82, these filters cannot be
applied to Poisson observations. This is because these traditional non-linear filters generally
assume the observation equation is of the form nt = h(xt) + vt where h(·) non-linearly but
deterministically transforms the latent state and vt
is some additive noise. However, for
Poisson and other discrete observation modalities, the latent state is encoded in a probability distribution as in (3.3) with the actual observation being randomly drawn from that
distribution. In other words, these modalities cannot be written in the above form, i.e., h(·)
does not exist for Poisson nor is there an easy modification to directly allow for Poisson.
Therefore, we need to derive a new filter starting from fundamental estimation principles to
allow for more accurate unsupervised learning with Poisson observations.
3.2.3 Derivation of the novel Poisson Cubature Filter (PCF)
We now provide the derivation for the Poisson Cubature Filter (PCF). PCF’s goal in causal
inference, also known as filtering, is to find an estimate xˆt of an unobserved state xt given
Poisson observations up to the present time n1:t
in a recursive fashion. We denote the
estimation of latent state xt given observations n1:τ as xˆt|τ with estimated state covariance
similarly denoted as Λˆ
t|τ . We assume at time t − 1 that the mean and covariance xˆt−1|t−1
56
and Λˆ
t−1|t−1 of the posterior density f(xt−1|n1:t−1) are known. The goal of a recursive filter
is then to incorporate the next observation nt and find the mean and covariance xˆt|t and Λˆ
t|t
of the posterior density at the next time step f(xt
|n1:t) to complete the recursion. As an
intermediate step, we first find the mean and covariance xˆt|t−1 and Λˆ
t|t−1 of the prediction
density f(xt
|n1:t−1) as follows because the model in (3.2) uses linear dynamics:
xˆt|t−1 = Axˆt−1|t−1
Λˆ
t|t−1 = AΛˆ
t−1|t−1A
T + Q
(3.6)
We note that we approximate f(xt−1|n1:t−1) as Gaussian from the previous recursion and
thus f(xt
|n1:t−1) is also approximately Gaussian because the model uses linear dynamics.
The challenge in deriving the PCF then lies in the following measurement update step
where we incorporate the new Poisson observation nt and find the updated state estimate
xˆt|t with covariance Λˆ
t|t
. If nt was Gaussian and thus conjugate with the Gaussian prediction
density, we could analytically find xˆt|t and Λˆ
t|t using the Kalman update. However, because
nt
is Poisson and nonconjugate with Gaussians, approximations are necessary. Therefore, we
take a minimum mean-squared error approach that is agnostic to the observation modality
to find the estimate of the latent state xˆt|t that minimizes the expected mean-squared error
with xt
. In this approach, we constrain the estimator to be of the form xˆt|t = Mnt + b.
Within this class of estimators, a known set of equations minimize the mean-squared error
given by:
xˆt|t = xˆt|t−1 + ΛxnΛ
−1
nn(nt − nˆt|t−1)
Λˆ
t|t = Λˆ
t|t−1 − ΛxnΛ
−1
nnΛ
T
xn
(3.7)
57
where the individual terms are as follows:
nˆt|t−1 = E[nt
|n1:t−1]
Λnn = V[nt
|n1:t−1]
Λxn = E[(xt − xˆt|t−1)(nt − nˆt|t−1)
T
|n1:t−1]
(3.8)
For an estimator of the above form, these equations are known to minimize the mean squared
error of the estimated state regardless of the observation modality distribution3,58,75,83,84
.
Thus, now the challenge of incorporating a nonconjugate observation modality becomes a
challenge of how to find the terms in (3.8) for the Poisson observations. We emphasize with
the subscripts n and x, n in En and Ex,n that these expectations are taken with respect to
nt
|n1:t−1 and the joint density of xt
, nt
|n1:t−1, which are generally difficult to analytically
calculate for nonconjugate observations. Importantly, however, from the prediction step, the
density of xt
|n1:t−1 is Gaussian and thus the numerical integration techniques from section
3.2.1 can give accurate approximations of expected value terms for this density. Thus, we
will overcome the challenge of incorporating Poisson observations and finding the terms in
(3.8) by providing a novel set of derivations to convert the terms in (3.8) to expectations
with respect to xt
|n1:t−1, thus allowing us to use the numerical integration tool.
Explicitly, the goal with these derivations is to convert each term in (3.8) to an expected
value with respect to the Gaussian density of xt
|n1:t−1. We define the following notation for
the likelihood mean and variance of nt
|xt
, which we emphasize are functions of xt
:
p(xt) ≜ E[nt
|xt
]
q(xt) ≜ V[nt
|xt
]
(3.9)
For Poisson observations, these are given in (3.5). We will use the law of total expectation
and the law of total variance85 which for random variables X and Y state that EY [Y ] =
58
EX[E[Y |X]] and VY [Y ] = EX[V[Y |X]]+VX[E[Y |X]], respectively. We start with deriving the
first term in (3.8) by using the law of total expectation followed by conditional independence:
nˆt|t−1 = En[nt
|n1:t−1]
= Ex[E[nt
|xt
, n1:t−1]|n1:t−1]
= Ex[E[nt
|xt
]|n1:t−1]
= Ex[p(xt)|n1:t−1]
(3.10)
We emphasize that while initially the expectation is taken with respect to nt
|n1:t−1 on the
first line, the final line is written as an expectation with respect to the Gaussian xt
|n1:t−1
which is the goal of this derivation. To derive the second term in (3.8), we start with the
law of total variance and conditional independence:
Λnn = Vn[nt
|n1:t−1]
= Ex[V[nt
|xt
]|n1:t−1] + Vx[E[nt
|xt
]|n1:t−1]
= Ex[q(xt)|n1:t−1] + Vx[p(xt)|n1:t−1]
= Ex[q(xt)|n1:t−1] + Ex[p(xt)p(xt)
T
|n1:t−1] − Ex[p(xt)|n1:t−1]Ex[p(xt)|n1:t−1]
T
= Ex[q(xt)|n1:t−1] + Ex[p(xt)p(xt)
T
|n1:t−1] − nˆt|t−1nˆ
T
t|t−1
(3.11)
The fourth line results from expanding Vx[p(xt)|n1:t−1] using the definition of variance. Thus,
the final line consists of nˆt|t−1 which is found in (3.10) and two expected values with respect
to the Gaussian xt
|n1:t−1, thus satisfying the goal for this term. To derive the last term in
59
(3.8), we show a full proof in Appendix section 3.5.2 based on the law of total expectation,
which gives the following:
Λxn = Ex,n[(xt − xˆt|t−1)(nt − nˆt|t−1)
T
|n1:t−1]
= Ex,n[xtn
T
t
|n1:t−1] − xˆt|t−1nˆ
T
t|t−1
= Ex[xtE[nt
|xt
]
T
|n1:t−1] − xˆt|t−1nˆ
T
t|t−1
= Ex[xtp(xt)
T
|n1:t−1] − xˆt|t−1nˆ
T
t|t−1
(3.12)
The final line is an expected value with respect to the Gaussian xt
|n1:t−1, satisfying the
goal of the derivation for the last term. Together, (3.10)–(3.12) give the final equations
to be approximated with the numerical integration tool in 3.2.1. We emphasize that these
equations are analytically derived and require only the likelihood mean and covariance of
the observation modality as in (3.5), which we know for a Poisson observation model.
3.2.3.1 Completed Poisson Cubature Filter
We now combine the numerical integration tool of 3.2.1 with the newly derived expected
value terms of (3.10)–(3.12) that are now all with respect to a Gaussian to yield the full
Poisson Cubature Filter (PCF). The approximate terms for (3.8) are then as follows:
xi ≜ xˆt|t−1 +
q
Λˆ
t|t−1ξi
nˆt|t−1 ≈
2d
X2+1
i=1
wi
· p(xi)
Λnn ≈
2d
X2+1
i=1
wi
· (q(xi) + p(xi)p(xi)
T
) − nˆt|t−1nˆ
T
t|t−1
Λxn ≈
2d
X2+1
i=1
wi
· xip(xi)
T − xˆt|t−1nˆ
T
t|t−1
(3.13)
with wi and ξi given in 3.1 and the likelihood mean and covariance of Poisson p(·) and
q(·) given in (3.5). Thus, the new filter PCF consists of a prediction step given by the
60
equations in (3.6) followed by the update step which incorporates the next observation nt
using the equations in (3.7) with terms approximated here in (3.13) with deterministically
sampled points given in 3.1. We emphasize that prior non-linear filters that use numerical
integration75–78 are formulated for non-linear observations with the specific form of nt =
h(xt) + vt where vt
is additive noise with covariance R. This form is distinct from the
Poisson observation form of 3.3 which does not have an h(·) function nor additive noise with
an R term. Thus, neither the equations nor the derivations from these prior works could
be used for Poisson observations, and we needed to derive a new filter, which led to PCF.
Thus, one contribution here is in the novel derivations of 3.10-3.12 that have not been shown
by prior works and show how to more generally expand the terms in 3.8 through analytical
techniques to be able to accommodate more general non-linear observations that don’t take
the specific form of nt = h(xt) + vt
. A high-level overview of PCF is visualized in Figure
3.1a-b with additional implementation details given in Appendix section 3.5.3. Note that
PCF is a causal filter.
Unlike the new PCF, prior inference methods for Poisson observations often rely on the
distinct method of Laplace approximation, leading to the widely-used causal point process
filter (PPF)17,20,21,26. We find empirically that Laplace-based methods methods can lead to
inaccurate learning, especially in switching systems when the PPF is used in EM learning
as we show in our below Results. We hypothesize that this inaccurate learning may be due
to the inaccuracies in the Laplace approximation not being able to accurately capture first
and second moments of densities that are being approximated28. More specifically, Laplace
approximation may misrepresent first and second moments as shown in28,58,86 for example in
the case of skewed densities or densities in which the curvature at the mode that approximates
the second-moment does not represent the mass in the tails. In short, Laplace estimates the
mean and covariance based on a single point in the latent space while PCF’s deterministic
sampling samples multiple points from the latent space to help improve accuracy. A tradeoff for this sampling is that PCF incurs additional runtime which scales on the order of the
61
number of sampled points, in this case d
2
. Both PCF and PPF are causal and readily fit
with the Rauch-Tung-Striebel (RTS) smoother3,28,58 to give non-causal estimates. This is
important because non-causal smoothed estimated are necessary for unsupervised learning.
PCF was derived in a way to also be extendable to other observation modalities provided
that the likelihood mean and covariance E[nt
|xt
] and V[nt
|xt
] are known and evaluable.
Further, for future extensions to multimodal discrete-continuous data1,6,25,63–65, PCF enables
an information form78 that we derive in Appendix section 3.5.3.
3.2.4 Unsupervised Learning with Expectation-Maximization
For unsupervised learning, we embed PCF within an expectation-maximization (EM) framework28,58,70,87. The goal and cost function with EM is to learn system parameters ˆθ which
maximize the log-likelihood for a set of given training observations n1:T :
ˆθ = argmax
θ
log f(n1:T ; θ) (3.14)
EM is an iterative framework that alternates between an expectation step (E-step) and
a maximization step (M-step) with theoretical guarantees to increase the log-likelihood in
each iteration assuming no approximations. Assuming we are on iteration k − 1, the E-step
involves using the current estimate of the model parameters θ
(k−1) to find expected values
of the latent state given all observations (e.g. Ex[xtx
T
t−1
|n1:T ]). The M-step involves using
the expected value terms to update the model parameters and obtain their k’th estimate by
maximizing the expected complete data likelihood:
θ
(k) = argmax
θ
∗
Ex[log f(x0:T , n1:T ; θ
∗
)|n1:T ; θ
(k−1)] (3.15)
This can be shown to increase the overall log-likelihood of (3.2.4). A combination of analytical and numerical methods can be used to solve this maximization. We provide all
62
equations in Appendix section 3.5.4. For example, the dynamics matrix A∗
that maximizes
its contribution to (3.2.4) can be found as:
A∗ =
X
T
t=1
Ex[xtxt−1|n1:T ]
! X
T
t=1
Ex[xt−1xt−1|n1:T ]
!−1
(3.16)
For some other parameters, numerical methods must be used to maximize their contribution
to (3.2.4).
We call EM a framework because different inference methods can be used in the E-step
to yield the necessary terms for the M-step. Thus, both PCF and PPF24–26 can be used, and
once combined with the RTS smoother, give the complete E-step for EM. We use PCF-EM
to indicate an EM learning method which uses PCF and PPF-EM to similarly indicate that
using PPF. Comparison of the PCF-EM and PPF-EM then shows the benefit of the novel
PCF for unsupervised learning as the filters are the only distinct elements within the EM
learning framework.
An alternative to this filtering-smoothing (forward-backward) approach is a global approach as detailed in16,18,27. Here, a Laplace approximation is also used but applied over
all available times to f(x1:T |n1:T ; θ) to yield an approximate Gaussian distribution for it.
Necessary terms for M-step can then be extracted from the mean and covariance of this
Gaussian or alternatively through samples drawn from the Gaussian as done in4 and here for
our comparisons. Thus, this global Laplace method is non-causal and makes up the entirety
of the E-step in these methods. We use Laplace-EM (LEM) to indicate an EM method with
global Laplace as the E-step.
3.2.5 Extending to Switching Dynamical Systems
We now extend the developed PCF to switching dynamical systems as overviewed by Figure
3.1d.
63
3.2.5.1 Model for switching dynamical systems
We start with the model for a switching dynamical system with Poisson observations. The
key difference with the stationary model presented in equations (3.2)–(3.4) is the addition of
what we call a regime state st which is a first order Markov chain with M possible regimes
that dictates the parameters to use throughout the remainder of the model. Explicitly, we
write this as:
P(s
(j)
t
|s
(i)
t−1
) = Φj,i
xt = A(st)xt−1 + wt
, wt ∼ N(0, Q(st))
P(nt
|xt
, st) = Y
C
i=1
λi(xt
, st)∆n
i
t
exp
−λi(xt
, st)∆
n
i
t
!
λi(xt
, st)∆ = exp(αi(st) + βi(st)
T
xt)
(3.17)
where Φ is the regime transition matrix, s
(j)
t
and s
(i)
t−1
respectively denote st = j and st−1 = i,
and the initial distribution of the regime state s1 is denoted as π. The dynamics matrix
A(st) can take on one of M possibilities {A
(1)
, . . . , A
(M)
} where A(s
(j)
t
) = A
(j)
. The same
notation holds for the remaining model parameters. We also assume that the spiking activity
is conditionally independent given both the latent state and the regime state.
3.2.5.2 Overview of switching filter and switching smoother
Our switching filter1
for multiscale observations used the Laplace’s approximation. Here,
we show how instead of Laplace’s method, we can use PCF to enable accurate inference for
switching dynamical systems with Poisson observations.
The goal with causal filtering for switching systems is to calculate both the expected
value of the latent state E[xt
|n1:t
] and the probability of the regime state P(st
|n1:t) given the
observations up to the present time n1:t
. As one of the primary steps of the switching filter,
it can be shown that after finding the mean and covariance of an intermediate distribution
f(xt−1|n1:t−1, s
(j)
t
), a stationary filter using parameters associated with regime j is needed to
incorporate the new observation nt to find the mean and covariance of f(xt
|n1:t
, s
(j)
t
) denoted
as xˆ
(j)
t|T
and Λˆ (j)
t|T
. In1
, to get the stationary filter we use the Laplace approximation and thus
a PPF for Poisson observations. Here, we show that instead we can embed the new PCF.
This is repeated for each regime j ∈ [1, M], resulting in a bank of M PCF’s being used with
one for each regime.
In1
, we also derived a switching smoother to yield non-causal estimates of the latent
and regime states. This smoother only requires the outputs of the switching filter and thus
can directly be applied as presented in1
. Compared with prior switching smoothers46,55, the
new smoother was shown to be more robust and generalizable. For the purpose of unsupervised learning, additional expected value terms not directly calculated by the smoother are
needed like E[xtxt−1|n1:T , s
(j)
t
]. For these terms, we run additional RTS smoothers from the
f(xt
|n1:T , s
(j)
t
) densities to directly yield f(xt−1|n1:T , s
(j)
t
) with equations in Appendix section
3.5.5.
3.2.5.3 Unsupervised learning with switch EM framework
We now present the unsupervised learning method for switching dynamical systems with
Poisson observations. Similar to stationary systems, we use an EM framework with the goal
to learn system parameters ˆθ that maximize the log-likelihood of a given set of training
observations n1:T :
ˆθ = argmax
θ
log f(n1:T ; θ) (3.18)
Here, only Poisson observations are available with both latent states and regime states being
unobserved. Thus, this is a method to learn system parameters and subsequently identify
regimes in data that maximize the log-likelihood. The E-step involves finding expected value
65
terms within the expected complete data likelihood using the current estimate of model
parameters. We define the following notation:
E[ · |n1:T ; θ
(k−1)] ≜ ⟨ · ⟩
E[ · |n1:T , s
(j)
t
; θ
(k−1)] ≜ ⟨ · ⟩(j)
(3.19)
The expected complete data likelihood is then:
Ex,s[log f(x0:T , s1:T , n1:T ; θ
∗
)|n1:T ; θ
(k−1)]
=
X
M
j=1
P(s
(j)
1
|n1:T ) log πj +
X
T
t=1
X
M
i,j=1
P(s
(j)
t
, s
(i)
t−1
|n1:T ) log Φj,i
+ ⟨log N(x0; µ0, Λ0)⟩ +
X
T
t=1
X
M
j=1
P(s
(j)
t
|n1:T )⟨log N(xt
; A
(j)
xt−1, Q
(j)
)⟩
(j)
+
X
T
t=1
X
M
j=1
P(s
(j)
t
|n1:T )
X
C
i=1
⟨log Poisson(n
i
t
; exp(α
(j)
i + β
(j)
T
i xt))⟩
(j)
!
(3.20)
The M-step then finds parameters that maximize (3.2.5.3):
θ
(k) = argmax
θ
∗
Ex,s[log f(x0:T , s1:T , n1:T ; θ
∗
)|n1:T ; θ
(k−1)] (3.21)
For example, the dynamics matrix A
(j)
for regime j that maximizes its contribution to
(3.2.5.3) is:
A
(j)
∗
=
X
T
t=1
P(s
(j)
t
|n1:T )Ex[xtx
T
t−1
|n1:T , s
(j)
t
]
! X
T
t=1
P(s
(j)
t
|n1:T )Ex[xt−1x
T
t−1
|n1:T , s
(j)
t
]
!−1
(3.22)
We provide all equations in Appendix section 3.5.4.
The main challenge with unsupervised learning for switching dynamical system is that
finding expected value terms for the E-step exactly is intractable given that the number of
regime combinations over time grows exponentially with time. To address this, we instead
66
find approximate estimates of the expected value terms by running a switching filter with
PCF embedded followed by a switching smoother as shown in the previous section. We refer
to this switch EM framework with PCF as sPCF-EM. While not done in prior work, a PPF
can also be embedded in the switch EM framework, which we refer to as sPPF-EM. Since
the only difference between sPCF-EM and sPPF-EM is the novel PCF, we implement the
sPPF-EM here and use these comparisons to show the critical need for PCF in unsupervised
learning of switching dynamical systems.
To learn switching dynamical systems with Poisson observations, prior studies use a
variational EM framework4,71,72 instead of using switching filters and smoothers for the EM
which we do here. Variational EM is closely related to EM in that it is an unsupervised
learning method that finds model parameters with alternating expectation and maximization
steps. However, the cost function that variational EM maximizes is the lower bound to the
log-likelihood of the observations rather than the log-likelihood itself. This is achieved by
taking expectations with respect to a simpler distribution through a mean-field variational
approximation72. Prior work has developed a method called variational Laplace EM (vLEM)4
that can be applied to the switching dynamical systems of this work, where the simpler
distribution assumes the regime and latent states are independent of each other. This allows
global Laplace to be used in its expectation step to find expected values of the latent state.
While there are differences in the approach to finding expected values between sPCF-EM
and vLEM, both result in estimated parameters of the same model which we focus our
comparisons on.
3.2.6 Numerical Simulation Framework
We use extensive numerical simulations to validate the PCF-based EM frameworks. We
simulate spiking activity from both stationary and switching linear dynamical systems with
Poisson observations at various data lengths for training. For stationary systems, we learn
models using PCF-EM, PPF-EM, and LEM. For switching systems, we learn models using
67
sPCF-EM, sPPF-EM, and vLEM. We also separately simulate testing data for evaluating
learned model parameters.
3.2.6.1 Numerical simulation settings
For stationary systems, we randomly form 60 general systems with a time bin ∆ of 2 ms and
sweep the training sample size across 10k, 31.6k, 100k, and 316k samples, equivalent to 20
seconds at the shortest and 632 seconds at the longest. We separately simulate testing data
per system at 10k samples of length that all learned models per system are evaluated on.
For the dynamics equation parameters, we set the dimension of xt
: dim(xt) ≜ d to 8
based on common values used in prior works1,12,80 and randomly form the dynamics matrix
A and noise covariance matrix Q. Using the procedure described in1,25, we assume the
eigenvalues of A are stable and comprised of d/2 complex-conjugate pairs {rie
±jθi}i∈[1,d/2]
with ri dictating the signal decay time with half life t1/2 = ln 2 −∆
ln ri
and θi dictating oscillation
frequency given by θi
2π∆
. We randomly select eigenvalues with decay half-lives between [13,
277] ms (i.e. a range of [0.9, 0.995] for ri) and frequencies between [0.8, 5] Hz (i.e. a range
of [0.010, 0.063] for θi) based on values seen in prior literature31,80. We also randomly select
the eigenvalues of Q to be between 0.01 and 0.04. We simulate latent states using (3.2).
For the observation equation parameters, we assume the number of neurons to be 60 and
randomly choose a base firing rate between 3 and 5 Hz and a maximum firing rate between
50 and 70 Hz as in1
. We then simulate observations based on (3.3).
For switching systems, we also randomly form 60 general systems with the same time
bin and sweep the same training sample sizes as for the stationary systems. We separately
simulate testing data at 30k samples of length. For the regime state settings, we set the
number of regimes M to be 3 and the dwell time to be 2 seconds for each regime. The
dwell time is the average time spent within the same regime and can be calculated for
regime j by tdwell =
∆
1−Φj,j
where Φj j comes from (3.17). We also assume the probability of
switching from regime j to i is equal for all i ̸= j. For the dynamics equation parameters, we
68
randomly form Q
(j)
and the eigenvalues of A
(j)
for each regime j. Following the procedure
of1
, we also randomly select the eigenvectors of A
(j) with a transient control value of 2.4 such
that transient effects in the latent state signal are bounded. For the observation equation
parameters, we randomly select them for each regime j following the same procedure for
stationary systems. We simulate the data then based on (3.17).
3.2.6.2 Learning and Performance metrics
After simulating data, we then learn system parameters with all methods. We run each
for 300 EM iterations to give each method sufficient time for convergence. We initialize
parameters randomly for each system except for A which is set to 0.9 ∗ I. For switching
systems, we also set initial Φjj ’s to have dwell times of 4 seconds. For fair comparisons
amongst learning methods, we run each for the same number of iterations and provide the
same initial parameters per system.
Once parameters are learned, we evaluate the learning performance using separately
generated test data. As we want this evaluation process to be blind to which method
generated the learned parameters, in test data, we choose to focus on decoding metrics from
the same PPF and switching-PPF decoders1
for all learning methods. This means that once
parameters are learned with any learning method, we use these already-learned parameters
to construct the associated PPF or switching PPF and run these filters on test data. This
choice also allows us to show that the benefits in PCF-based learning translate to benefits
in real-time decoding applications even when Laplace-based decoders are used after model
learning.
The first decoding metric that we calculate is the neural self-prediction metric which is
the accuracy of one-step-ahead prediction of spiking activity nt using past activity n1:t−1)
through the probability P(nt
|n1:t−1). We use the probability P(n
i
t ≥ 1|n1:t−1) per neuron to
classify whether spikes were observed in time bin t or not and obtain an area under the curve
(AUC) measure. We then compute the average AUC over all neurons. We then normalize
69
the average AUC to yield a metric that we call predictive power, which is P P = 2 ∗AUC−1
such that 0 is chance and 1 is perfect prediction.To calculate the above probability, we derive
the following for stationary systems:
P(n
i
t ≥ 1|n1:t−1) = 1 − P(n
i
t = 0|n1:t−1)
= 1 −
Z
P(n
i
t = 0|xt
, n1:t−1)f(xt
|n1:t−1)dxt
= 1 −
Z
P(n
i
t = 0|xt)f(xt
|n1:t−1)dxt
= 1 −
Z
exp(−λi(xt)∆)f(xt
|n1:t−1)dxt
= 1 − Ex[exp(−λi(xt)∆)|n1:t−1]
(3.23)
where we marginalize over xt
in the second line and use conditional independence in the third
line. While there is no closed form solution to the expectation in the fifth line, we can use the
numerical integration tool from Methods section 3.2.1 to yield an accurate approximation
as the expectation is with respect to the Gaussian xt
|n1:t−1. For switching systems, we first
marginalize over the regime state:
P(n
i
t ≥ 1|n1:t−1) = X
M
j=1
P(n
i
t = 1|n1:t−1, s
(j)
t
)P(s
(j)
t
|n1:t−1) (3.24)
P(n
i
t ≥ 1|n1:t−1, s
(j)
t
) = 1 − Ex[exp(−λi(xt
, s
(j)
t
)∆)|n1:t−1, s
(j)
t
] (3.25)
The expectation in the 2nd line is then approximated with 3.1. In simulations where ground
truth parameters are known, we further calculate the predictive power using the true parameters P Ptrue to yield a normalized predictive power metric P P/P Ptrue where 0 is chance
and 1 is true performance. This gives an indicator for if the learned parameters can capture
the data as well as the true parameters can.
For switching systems in simulation, we also evaluate the decoded regime states ˆst which
is taken as the regime with the highest probability ˆst = argmax
j
P(s
(j)
t
|n1:t). We compute
70
the regime accuracy, or Acc as the proportion of estimated regime states that match the true
regime states:
Acc(s1:T , sˆ1:T ) = X
T
t=1
[st = ˆst
]/T (3.26)
where [st = ˆst
] is 1 if the values match and 0 otherwise. As the learning method is unsupervised, the indices of the learned regimes may be a permutation of the true regimes (e.g.,
the 1st learned regime is the 2nd true regime) and thus we calculate the accuracy for all
permutations of 1 : M and choose the maximum. We further find the accuracy using true
parameters to calculate a normalized accuracy. Because chance level accuracy is 1/M for
these simulated systems, we calculate normalized accuracy as (Acc − 1/M)/(Acctrue − 1/M)
such that 0 is chance and 1 is true performance.
To evaluate decoded latent states in simulation, additional methods are required to first
learn a similarity transform because the latent states are unobserved. This means that there
exists an infinite number of equivalent state space models through a reversible similarity
transform32, preventing direct comparisons of the true latent states with the decoded latent
states. Thus, we use the procedure in32 to first separately generate an additional q = 1000d
samples with d being the dimension of the latent states purely to find such a similarity
transform. We then use both the true model parameter and learned model parameters
to find predicted latent state time series xˆ
true
t|t−1
and xˆ
learn
t|t−1
respectively. We then find the
similarity transform which minimizes the mean-squared error between the predicted time
series:
Lˆ = argmin
L
X
T
t=1
|Lxˆ
learn
t|t−1 − xˆ
true
t|t−1
|
2
2
!
= Xˆ trueXˆ learn†
(3.27)
where Xˆ true and Xˆ learn are matrices whose columns are comprised of xˆ
true
t|t−1
and xˆ
learn
t|t−1
respectively. We then apply the similarity transform to the decoded latent states from the test
set and evaluate the accuracy of the decoding using the average correlation coefficient (CC)
71
metric between the decoded and true latent states (Lˆxˆt|t and xt) for each dimension, and
then averaging over dimensions. We further find the average CC using true model parameters, CCtrue and calculate a normalized CC metric CC/CCtrue such that 0 is chance and 1 is
true performance. This metric is found for both stationary and switching systems with the
predicted time series for switching systems found by xˆt|t−1 =
PM
j=1 xˆ
(j)
t|t−1
P(s
(j)
t
|n1:t−1) with
individual terms presented in1
.
For stationary systems, we further use the above similarity transform to directly evaluate
the learned model parameters with respect to the true parameters. We apply the similarity
transform to the learned parameters through a change of variables and quantify the error
for each model parameter Ψ (for example, Q) using the normalized matrix norm:
eΨ =
|Ψlearn − Ψtrue|F
|Ψtrue|F
(3.28)
where | · |F denotes the Frobenius norm. Further details can be found in32. We also quantify
the normalized eigenvalue error:
eeig =
Pd
i=1 |eiglearn
i − eigtrue
i
|
Pd
i=1 |eigtrue
i
|
(3.29)
where | · | for a complex number a + bi denotes the complex magnitude √
a
2 + b
2 and eigi
denotes the ith eigenvalue of dynamics matrix A. We further match the learned eigenvalues
to the true eigenvalues of A such that the normalized error is minimal. For switching
systems, the learned regimes may not correspond to the true regimes. For example, if there
is insufficient data, a method may learn a model that only uses one regime to describe all
the data which could lead to misleading metrics if error is incorporated from misidentified
regimes. Thus, we focus on decoding metrics that can show how well data is explained by a
model even if regimes are not perfectly learned.
72
3.2.7 Experimental Data
We further validate the developed methods using publicly available experimental data from
the Sabes Lab88 recorded from the primary motor cortex (M1) of a monkey (monkey I)
performing a continuous series of reaches in a point-to-point task illustrated in Figure 3.2.
The task was structured such that circular targets were presented in an 8-by-8 square grid
to the monkey in a virtual reality environment88,89. The monkey was trained to make 2D
reaches for targets via a cursor controlled by its fingertip on its left arm with target acquisition
requiring the monkey to hold on the target within a set acceptance zone for 450 ms. Upon
target acquisition, a new target would be randomly chosen and presented. Further details
can be found in88,89. We used a publicly available dataset from the Sabes Lab88,89, for
which all animal procedures were performed in accordance with the U.S. National Research
Council’s Guide for the Care and Use of Laboratory Animals and were approved by the
UCSF Institutional Animal Care and Use Committee.
3.2.7.1 Data preprocessing
Neural activity was acquired from a 96-channel silicon microelectrode array (Blackrock Microsystems, Salt Lake City, UT). We randomly selected a subset of 30 channels for each
recording day and used the multiunit spiking activity per channel as the neural signal comprised of the number of spikes per ∆ = 10 ms time bin. We analyzed data from the first
five sessions in the month of 2016/09 (sessions 20160915 01 to 20160927 06) due to their
consistent data lengths (38k, 45k, 36k, 39k, and 42k samples respectively).
3.2.7.2 Learning procedure
We analyzed each session individually with a five-fold cross-validation scheme. Both stationary and switching learning methods were applied to each of the training sets comprised of
four folds to yield learned model parameters. All learning methods applied were unsupervised and used only the spiking activity to generate the parameter estimates. To focus on
73
switches in latent state dynamics, we set observation equation parameters to be the same
across regimes. Decoded training set latent states were then linearly projected onto the 2D
reach velocity which served as the behavior signal to learn a linear mapping from latent
states to behavior. Based on Results Section 3.3.2 that found that multiple initializations
could improve consistency of results in simulation, 3 different parameter initializations were
randomly generated such that each learning method was applied 3 times on a training set
with these initial parameters. The final learned parameters for a given method was then
chosen from the 3 based on whichever had the highest training set neural self-prediction.
3.2.7.3 Evaluation pipeline
We then evaluate the learned parameters on the held-out test set data using decoding metrics
similar to Methods Section 3.2.6.2. The procedure involved using the learned parameters
with either PPF or switching PPF regardless of learning method. We first find the predictive power of the parameters based on the neural self-prediction metric. We then evaluate
behavior decoding using the average correlation coefficient metric by mapping the decoded
latent states to 2D reach velocity using the projection learned in the training set. We finally
interpret the decoded regime states with results found in Results Section 3.3.4.
3.3 Results
We show in both numerical simulations and in neural spiking activity recorded from primary
motor cortex during behavior that unsupervised EM learning methods embedded with the
developed Poisson Cubature Filter (PCF) successfully learn system parameters and outperform prior methods for both stationary and switching systems. We further show that the
PCF-based switch learning better identifies regimes in both simulated and experimental data
and is more sample/data efficient. We use the Wilcoxon signed-rank test for all statistical
comparisons and control for false discovery rate using the Benjamini–Hochberg Procedure59
.
74
0.8
0.6
10k 31.6k 100k 316k 10k 31.6k 100k 316k
10k 31.6k 100k 316k 10k 31.6k 100k 316k
0.4
0.2
0.20
0.10
0.15
0.05
0.1
0.6
0.5
0.4
0.3
0.2
0
Normalized Neural Self-Prediction
# of training samples # of training samples
# of training samples # of training samples
Normalized Latent State Decoding
Neural Self-Prediction Latent State Decoding
0
(chance)
1
(true)
0.8
0.6
0.4
0.2
0
(chance)
1
(true)
Correlation Coefficient Normalized Correlation Coefficient
Predictive Power Normalized Predictive Power
Learned Parameter Errors
-2.0
-1.5
-1.0
-0.5
0
log10 of normalized error (A)
***
*** ***
*** *** (e) ***
10k 31.6k 100k 316k
# of training samples
-0.6
-0.4
-0.2
0
log10 of normalized error (Q)
***
*** ***
*** ***
*** *** (f) ***
10k 31.6k 100k 316k
# of training samples
-2.2
-1.8
-1.4
-1
log10 of normalized error (α)
*** *** *** ***
*** *** (g) ***
10k 31.6k 100k 316k
# of training samples
-1.0
0
1.0
2.0
log10 of normalized error (β)
***
*** ***
*** ***
*** *** (h) ***
10k 31.6k 100k 316k
# of training samples
Learned Eigenvalue Error
10k 31.6k 100k 316k
# of training samples
-2.4
-2.0
-1.6
-1.2
log10 of normalized error
***
*** ***
*** ***
*** *** (i) ***
PCF-EM
PPF-EM
Laplace-EM
Legend:
Simulation Settings:
xt dimension: d=8
t time bin: Δ=2ms
nt # of neurons: N=60
max firing rate: 50-70Hz
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
(a) (c)
(b) (d) Figure 3.3: PCF learning in simulated stationary systems
PCF leads to improved unsupervised learning capabilities in simulated stationary systems.
(a) Neural self-prediction (predictive power, or P P) on held-out test data using learned parameters normalized by that using ground truth parameters according toP P/P Ptrue. Comparisons per training sample size are made between different learning methods: PCF-EM,
PPF-EM, and Laplace-EM. Bars indicate the median metric, box edges indicate the 25th
and 75th percentiles, and whiskers indicate minimum and maximum values excluding outliers that exceed 1.5 times the interquartile distance. Asterisks indicate statistical significance
with ***p ≤ 0.0005. Trend lines connect at the median. (b) Same as (a) but un-normalized.
(c) Same as (a) but showing the latent state decoding (correlation coefficient, or CC) normalized by the CC using ground truth parameters according to CC/CCtrue. Performance
with learned parameters reaches 98% that with true parameters. (d) Same as (c) but unnormalized (e-h) Normalized error of learned parameters with respect to true parameters
over increasing training sample size. (i) Normalized error of learned eigenvalues of A. For
each panel and statistical test, n=60.
75
3.3.1 PCF yields high accuracy parameter learning in stationary
simulations
We first validate in simulations of stationary systems that PCF-EM can successfully learn accurate system parameters. We find that the latent state decoding and neural self-prediction
metrics reaches 98.1% and 97.4% of that using ground-truth system parameters, respectively
(Figure 3.3a,c). Figure 3.3a and Figure 3.3c further demonstrate that these decoding metrics converge toward the performance with true parameters as more training samples are
provided, indicating learned parameters approaching ground truth values. Also, the learned
parameter error decreases with increasing training data size as shown in Figure 3.3e-i, again
indicating successful parameter learning with PCF-EM (Methods Section 3.2.6.2). Finally,
we find that PCF-EM significantly outperforms prior learning methods LEM and PPF-EM
as seen throughout Figure 3.3. In terms of learned parameter error, PCF-EM yields a 79.7%
decrease in eigenvalue error compared to PPF-EM as shown in Figure 3.3i. In terms of decoding metrics, PCF-EM yields a 9.16% increase in latent state correlation coefficient (CC)
over LEM in the largest training sample size case as shown in the rightmost cluster of Figure
3.3c. We emphasize that all learning methods use the same EM framework but differ in
what inference method is used for the E-step as described in Methods section 3.2.4; also, the
evaluation of learned parameters uses the same pipeline with PPF as described in section
3.2.6.2. As such, all improvement can be attributed to the utilization of the new PCF for
learning.
3.3.2 PCF yields high accuracy parameter learning in switching
simulations
We then find that PCF enables accurate parameter estimation for switching dynamical
systems through a switch EM learning framework (sPCF-EM). Similar to Results section
3.3.1, as more training samples are provided in learning, the evaluation decoding metrics
76
trend towards those obtained from filters using ground truth parameters as seen in Figure
3.4a, 3.4d, and 3.4g. Indeed, the neural self-prediction and regime state decoding metrics
reach 90% of the performance using true parameters while latent state decoding reaches 84%,
with more training samples expected to further increase these values. Thus, the ability for
the parameters learned by sPCF-EM to explain data and infer unobserved states approaches
that of the ground truth parameters, indicating successful parameter learning.
We also find that PCF is critical in enabling a successful switch EM learning framework for
Poisson observations by implementing and comparing to a switch EM that instead uses a PPF
(sPPF-EM). Note that a successful switch EM even with a PPF has not been demonstrated
before, and we implement this here. We find that using the PCF instead of PPF in the
switch EM yields at least a 32% increase in all normalized decoding metrics - neural selfprediction, regime decoding, and latent state decoding – for the largest training sample size
case. This improvement is attributed to PCF because both sPCF-EM and sPPF-EM use the
same switch EM framework with the only difference being whether PPF or PCF is embedded
in the switching filter as described in Methods section 3.2.5.2. Further, the local Laplace
approximation of PPF when embedded in a switching filter in sPPF-EM can even result in
diverging performance as seen in Figure 3.4c. We verify that this is not due to overfitting as
decoding metrics on even the training data diverges as seen in Figure 3.4k. Thus, replacing
the Laplace approximation with the more accurate deterministic sampling methods of PCF
in the switching filter1
is key to enabling a successful switch EM framework for unsupervised
learning.
We also find that sPCF-EM outperforms the prior variational Laplace-EM (vLEM)
method4 which embeds global Laplace in a variational EM framework71,72; indeed, sPCFEM can more than double the normalized regime decoding accuracy for certain data lengths
compared with vLEM. As can be seen in Figure 3.4a and 3.4g, sPCF-EM learns parameters
that yield significantly higher neural self-prediction and latent state decoding compared to
vLEM across all training sample sizes. Critically, we also find in Figure 3.4d that sPCF-EM
77
is more data sample efficient than vLEM, which is critical for neural datasets as training
samples are often limited. Specifically, sPCF-EM is able to reach 80% of normalized regime
decoding accuracy with only 100k samples compared to 316k samples in vLEM (see third vs.
fourth clusters of Figure 3.4d). Also, at 100k samples, normalized regime decoding accuracy
with vLEM does not even reach 20% (Figure 3.4d, third cluster), suggesting that in some
cases, regimes that could otherwise have been uncovered with sPCF-EM would be missed
by vLEM. Overall, these results show that sPCF-EM is especially important for identifying
regimes under limited data length constraints as can be the case when working with experimental brain data. These results collectively suggest that using PCF over global Laplace can
improve learning, as was also seen in stationary systems. Beyond switch EM, an interesting
future direction would be to explore how PCF can be incorporated into the variational EM
frameworks to potentially enable higher accuracy and more data efficiency (see Discussion
3.4).
We also find that the variability of performance can be further reduced by trying more
initializations in sPCF-EM. Initializations can be important to help EM avoid being stuck in
local maxima or saddle points13,58. We thus explore increasing the number of initializations
for this 100k training sample size. We find that as we increase the number of initializations
and choose the one with the best neural self-prediction within the training set, we can reduce
the spread of the regime decoding accuracy such that even the 25th percentile of it exceeds
80% of the true parameter performance using 3 initializations as can be seen in Figure 3.4j.
We repeat this procedure for vLEM with the same initializations that were used for sPCFEM and find that while its regime decoding accuracy also improves with the number of
initializations, the median performance saturates at 35% of the true parameter performance
for vLEM.
78
Normalized Correlation Coefficient
0.8
0.6
0.4
0.2
0
(chance)
1
(true)
0.8
0.6
0.4
0.2
0
(chance)
1
(true)
Normalized Accuracy
Normalized Accuracy
0.8
0.6
0.4
0.2
0
(chance)
1
(true)
Normalized Predictive Power Normalized PP
Normalized Acc
Normalized CC Correlation Coefficient (CC)
Accuracy (Acc)
Predictive Power (PP)
Predictive Power
0
0.05
0.10
0.15
0.3
0.5
0.7
0.9
-0.2
0
0.2
0.4
0.6
0.2
0.4
0.6
0.8
1
0.4
0.5
0.6
0.7
0.8
0.9
0.4
0.5
0.6
0.7
0.8
0.9
10k 31.6k 100k 316k
# of training samples
10k 31.6k 100k 316k
# of training samples
1 2 3 4
# of initializations
10k 31.6k 100k 316k
# of training samples
10k 31.6k 100k 316k
# of training samples
10k 31.6k 100k 316k
# of training samples
10k 31.6k 100k 316k
# of training samples
Neural Self-Prediction Regime State Decoding Latent State Decoding
Regime State Decoding (100k)
over Multiple Initializations
Neural Self-Prediction at 316k Neural Self-Prediction (316k)
on Training Data
Regime State Decoding at 316k Latent State Decoding at 316k
0 100 200 300
# of EM iterations
0 100 200 300
# of EM iterations
0 100 200 300
# of EM iterations
0 100 200 300
# of EM iterations
Switching PCF-EM
Switching PPF-EM
Variational Laplace-EM
Legend:
Simulation Settings
0
0.2
0.4
0.6
0.8
1
0.08
0.1
0.12
0.14
0.16
xt dimension: d=8
st # of regimes: M=3
st dwell time: t(sec)=2
nt # of neurons: N=60
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
*** *** *** ***
(a) (d) (g)
(b) (e) (h)
(j)
(c) (f) (i) (k)
Figure 3.4: PCF learning in simulated switching systems
PCF leads to improvements in unsupervised learning in simulated switching dynamical systems with Poisson observations. Bar, box, whisker, outlier, and asterisk conventions are the
same as in Figure 3.3. (a) Neural self-prediction on held-out test data using learned parameters normalized by that using ground truth parameters over an increasing number of training
samples. Comparisons are made between different learning methods: sPCF-EM, sPPF-EM,
and vLEM. (b) Same as (a) but un-normalized. (c) Normalized neural self-prediction over
EM iterations at the largest training sample size of 316k. (d) Regime state decoding on
test data using learned parameters normalized by that using true parameters according to
(Accuracy − 1/M)/(Accuracytrue − 1/M) where M is the number of regimes such that 0
is chance and 1 is true parameter performance. (e) Same as (d) but un-normalized. (f)
Normalized accuracy over EM iterations. (g-i) Same as (a-c) but for latent state decoding
with learned parameters. (j) Normalized regime state decoding accuracy using parameters
trained with 100k samples as a function of available initializations. After learning each initialization to completion (300 EM iterations), parameters associated with the initialization
with the highest neural self-prediction in training set are chosen. (k) Neural self-prediction
in training set at 316k samples over EM iterations. While sPCF-EM and vLEM show consistent growth or convergence as expected of EM frameworks, sPPF-EM shows divergence
showing the critical need for PCF to enable a switch EM framework. For each panel and
statistical test, n=60.
79
3.3.3 PCF learning enables better prediction and movement
decoding in motor cortical data
We apply sPCF-EM to publicly available population spiking activity recorded from the
motor cortical areas of a monkey during a continuous 2D point-to-point reach task (see
Figure 3.2 and Methods section 3.2.7). First, we find that a switching dynamical system
learned with sPCF-EM explains the data better than a stationary dynamical system (Figure
3.5). To first fix the latent state dimension, we learn stationary systems for a range of
latent dimensions. We find that the cross-validated median neural self-prediction reaches
0.30
0.32
0.34
0.36
0.38
0.40 Stationary and Switching (2 regimes)
Neural Self-Prediction
Stationary and Switching
Velocity Decoding
Predictive Power (PP)
-4
0
4
8 x10-3
ΔPP (switching-stationary)
0.3
0.4
0.5
0.6
Correlation Coefficient (CC)
Switching PCF-EM (sPCF-EM)
Switching PPF-EM (sPPF-EM)
Variational Laplace EM (vLEM)
PCF-EM
PPF-EM
Laplace EM
Legend:
Median E-Step Runtime per Iteration
Latent state dimension d (M=1) Number of regimes M (d=10)
Seconds (sec)
Seconds (sec)
0
40
80
0
100
200
5 10 15 20 1 2 3 4
Pairwise Difference with sPCF-EM
Neural Self-Prediction (Switching)
x10-3
ΔPP (sPCF-EM - □)
0 100 200 300
# of EM iterations
0
4
2
6
Pairwise Difference with sPCF-EM
Velocity Decoding (Switching)
ΔCC (sPCF-EM - □)
0 100 200 300
# of EM iterations
0
.05
0.1
****** ****** ****** ****** ****** ****** ****** ******
(e)
*** ***
***
*** n.s. n.s.
*** ***
**
** * *** ***
(a) (b) (c) (d)
Figure 3.5: PCF learning in experimental data
PCF improves quality of learned parameters in experimental data for both stationary and
switching system models and demonstrates the increased explanatory power of switching
systems over stationary systems. Plotting conventions like bar, box, whisker, and outliers
are the same as in Figure 3.3 (a) Left shows neural self-prediction on held-out test data
from parameters learned with methods listed in the legend. The first cluster is for stationary
methods while the second cluster is for switching methods with the number of regimes
M=2. Note that all metrics within a cluster are obtained from the same filter. Right shows
pairwise differences between stationary methods and their respective switching methods.
Only switching systems that are learned with PCF show significant improvements in neural
self-prediction over stationary systems. (b) Decoding of behavior (2D reach velocity) on heldout test data across methods. (c) Pairwise differences in neural self-prediction of test sets
between sPCF-EM and either sPPF-EM or vLEM over iterations. Higher values indicate
degree to which sPCF-EM outperforms sPPF-EM and vLEM. (d) Same as (c) but with
pairwise differences in velocity decoding. (e) Median runtime of E-step over sweeping latent
state dimension (left) and sweeping number of regimes (right). For statistical tests, n=25,
*p ≤ 0.05, **p ≤ 0.005, ***p ≤ 0.0005.
80
99% of the peak value at a latent dimension of 10, which we fix for subsequent analyses.
We then compare decoding metrics from learned stationary systems to learned switching
systems with 2 regimes in a five-fold cross-validation scheme for each of the 5 experimental
sessions (see Methods section 3.2.7). As further increasing the number of regimes does not
yield significant improvements to decoding metrics, we focus on the 2-regime case. We find
that switching systems learned with sPCF-EM not only yield significantly higher neural
self-prediction compared with stationary systems, but also yield a significant 8% increase in
behavior decoding as seen in Figure 3.5a-b.
Second, similar to our simulation results, we also find that switching systems learned with
sPCF-EM outperform switching systems learned with sPPF-EM and vLEM in terms of both
neural self-prediction and behavior decoding (Figure 3.5a-b). In particular, switching systems learned with sPCF-EM yield a 12% increase in velocity decoding CC over those learned
with vLEM as seen in the right half of Figure 3.5b. In addition to these switching cases,
in the stationary case, PCF-EM yields modest but significant increases in decoding metrics
over PPF-EM and LEM. These improvements in explanatory power with PCF-based methods require higher computation time as seen in Figure 3.5e, giving rise to a speed-accuracy
tradeoff; however, PCF-based methods have the possibility for more efficient implementations through parallelization (see Discussion 3.4) which was not the focus in this work. In
exchange for this trade-off, in both simulation and experimental data, unsupervised EM
learning with PCF yields higher quality parameters in both stationary and switching cases
in terms of neural self-prediction and behavior decoding compared to prior Laplace-based
methods. Further, sPCF-EM reveals behavior-relevant neural regimes that can otherwise be
missed by these prior Laplace-based methods as we show next.
81
Figure 3.6: PCF uncovers regimes in experimental data
PCF enables uncovering interpretable behavior-related neural regimes, completely unsupervised with respect to behavior. Bar, box, whisker, outlier, and asterisk conventions are the
same as in Figure 3.5. (a) Regime interpretation of sPCF-EM for a single representative fold.
Uncovered regimes correspond with reach magnitude. Left panel shows single-trial regime
probabilities and reach velocity magnitudes from held-out test fold time locked to target
onset with values indicated by their respective color bars. Middle panel overlays median
regime probability (red) over median reach magnitude (black). Right panel shows a scatter
of average x,y velocity during the period of expected movement per trial (200 to 600 ms post
target onset) over the entire session. (b) Same analyses as in (a) from sPCF-EM but for a
representative fold with regimes corresponding with reach direction. Right panel shows clear
separation between labelled regimes with AUC of 0.971. Regimes separate to discriminate
a top-right direction vs. a bottom-left direction, with the most discriminant linear boundary overlayed in magenta. (c) Same analyses as in (a,b) but for a representative fold using
vLEM. (d) Scatter of magnitude metrics against direction metrics across all analyzed folds
for sPCF-EM and vLEM. (e) Left panel shows the magnitude metric across folds. Right
panel shows the direction metric across folds. (f) Regime separation boundaries as in the
right panel of (b) for all folds using sPCF-EM with high direction metric (AUC ≥ 0.8). For
statistical tests, n=25.
82
3.3.4 PCF enables discovering interpretable task-relevant neural
population regimes in motor cortical data
We then find that despite being unsupervised with respect to behavior, sPCF-EM uncovers regimes in neural population activity that correspond to interpretable periods in the
reaching behavior. As the task is continuous and self-paced, we first visualize single-trial
decoded regime probabilities by time-locking the probabilities and reach velocity magnitudes
1
0
1
0
40
20
0
1
0
Example identified regime probability traces
from each switch learning method
Corresponding magnitude of reach velocity trace
(a)
(b)Regime probabilities ||Velocityx,y||2 (20160927_04 fold 5)
(c) Raster of observed spikes
1
10
20
30
Multiunit channels
(sec)
(sec)
4978 4979 4980
Time 4978 4979 4980
Time (sec)
Time
4978 4979 4980
sPCF-EM sPPF-EM
vLEM Velocity
Legend:
target onset
decoded regimes with vLEM
decoded regimes with sPCF-EM
10
-10
0
-10
0
10
(d) Example ground truth and decoded velocity traces Velocityx Velocityy
Time 4978 4979 4980 (sec)
Time 4978 4979 4980 (sec)
Identified regime probability trace for vLEM as in (a)
Probability1
0
Time 4978 4979 4980 (sec)
Figure 3.7: Visualization of decoded regimes learned by sPCF-EM
(a) Example decoded regime probabilities using parameters learned from each switching
method: sPCF-EM, sPPF-EM, and vLEM. In these analyses, all methods have a total of
two regimes. Magenta lines indicate target onset times. White patches indicate that regime
1 is decoded and colored patches indicate that regime 2 is decoded. sPPF-EM (green)
fails to identify a second regime for this fold and thus its decoded probability of regime
2 remains largely at 0. (b) Magnitude of reach velocity over the same segment as in (a).
Blue patches that correspond to decoded regime 2 from sPCF-EM largely correspond to
increases in reach velocity magnitude as first explored in Figure 3.6a, showing that sPCFEM can identify magnitude-related regimes. In comparison, orange patches that correspond
to decoded regime 2 from vLEM are not magnitude-related. (c) Raster plot of multiunit
activity over the same segment as (a). (d) Ground truth and decoded reach velocity traces
for x and y reach velocity. Blue patches show the decoded regime 2 from sPCF-EM. Bottom
panel reproduces the decoded regimes from vLEM for ease of visualization and is identical
to the bottom panel of (a).
83
to target-onset event labels per test fold (Figure 3.6a-c). Through this, we find that several
of the analyzed folds exhibit a regime pattern that corresponds to reach velocity magnitude.
Empirically, one regime is dominant while the monkey is holding on the current target and
until it reacts to the onset of the next target (hold regime). A second regime then has its
probability sharply rise at the onset of this next target (movement regime). A representative
fold is shown in Figure 3.6a with additional visualizations of a continuous segment from
that fold shown in Figure 3.7. Since our method is unsupervised with respect to task or
behavioral variables and just looks at neural data, we define interpretability of a decoded
regime as the regime being task-relevant, i.e., interpretable in the context of the task or
behavior. We thus interpret a given regime as being task-relevant when the switches in
that regime correspond to changes in task-relevant behavioral variables, in this case reach
magnitude. We use what we call a magnitude metric to aid in interpretation by quantifying
how well a learned switching model can capture switches in reach velocity magnitude. We
find this metric for each model by finding the cross-validated median decoded regime probability at target onset where movement is minimal and at 400ms post target onset where
movement is expected, and then taking the difference. For example, in Figure 3.6a we find
that the decoded probability is highly different between these two periods, indicating the
ease with which the learned switching model could distinguish the two periods. In contrast,
the regimes found with the prior vLEM did not distinguish movement vs. hold periods (Figure 3.6c). Across all models learned with sPCF-EM vs. vLEM, we find that sPCF-EM is
significantly more reliable in uncovering regimes with higher magnitude distinguishability as
seen in Figure 3.6e. We also find that PCF is key to enabling the switch EM framework to
uncover these magnitude regimes more reliably; this is because the magnitude metrics from
sPCF-EM is significantly larger than those from sPPF-EM, which we also implemented in
this work (Figure 3.6e) (note sPPF-EM has not been shown in prior work).
Interestingly, for folds where the uncovered regimes did not clearly correlate with reach
velocity magnitude, sPCF-EM instead uncovered regimes that better correspond to reach
84
direction. To show this, we decode the regimes in the entire session using the learned
parameters per training set and focus on the 200ms to 600ms movement period post target
onset where reaches are expected. This window is chosen because the median increase in
velocity magnitude is centered on this window as exemplified by the gray traces in the central
panels of Figure 3.6a-c. We then find the average x,y velocity in each trial in that period. We
also separately label each trial by the regime with the highest average decoded probability
in that time period which we emphasize was found unsupervised with respect to behavior.
We then provide the scatter plot for the average x,y velocity and label each point by the
decoded regime. As shown in the rightmost panel of Figure 3.6b, we find that the decoded
regimes can be clearly differentiated by average x,y velocity. Here, reaches in the bottom-left
direction were largely identified as one regime (bottom-left regime) compared to reaches in
the top-right direction (top-right regime).
We use what we call a direction metric to quantify how well a learned switching model
can capture switches in reach direction. This metric is based on the differentiability of the
decoded regimes by average x,y velocity. We find this metric by calculating the mean AUC
from an LDA classifier that predicts the decoded regime from average x,y velocity over five
folds. Here, an AUC of 0.5 indicates chance while 1 indicates perfect linear differentiability.
We find the median AUC over the 25 models to be 0.92 with 10 of the 25 reaching above 0.95
AUC as shown in Figure 3.6d and Figure 3.6f. Further, for models with AUC’s above 0.8, we
find that the regimes do not discriminate arbitrary directions but rather they consistently
discriminate a top-right and bottom-left pattern as shown in Figure 3.6g. This result may
suggest a consistent regime pattern in neural population dynamics that across all sessions
distinguishes reaches towards the body versus away from the body.
Finally, we find that models from prior vLEM do not capture switches in either direction
or magnitude as seen in Figure 3.6c-f. This result can be due to the sample efficiency of
sPCF-EM compared with vLEM, which we also saw in simulations (Figure 4d). While
vLEM can be successful in decoding regimes in simulations (Results section 3.3.2), it needed
85
substantially more training samples to do as well as sPCF-EM as seen in Figure 4d. Together,
these results suggest that sPCF-EM can be successful but vLEM can have difficulty in regime
identification for the average number of samples per session in these experimental datasets
(Methods section 3.2.7.1), similar to what we found in simulation. However, we emphasize
that with sufficient data, vLEM is still a powerful tool and also offers good computational
efficiency. Future work could investigate an integrated framework which can collectively
leverage the individual benefits of PCF-based learning and variational methods.
3.4 Discussion
With the goal to enable accurate unsupervised learning of switching or stationary dynamical
systems from Poisson observations, we developed a novel deterministic sampling filter for
Poisson observations called the Poisson Cubature Filter (PCF) to be used for EM learning.
PCF includes novel derivations to enable a minimum mean square error (MMSE) estimation approach using Poisson observations. It uses numerical integration techniques using a
spherical-radial cubature rule to generate deterministic samples for estimating the necessary
MMSE terms. We demonstrated how PCF can enable EM frameworks for accurate unsupervised learning of stationary (PCF-EM) and switching (sPCF-EM) dynamical systems using
only Poisson spiking activity. We found that PCF enables more accurate learning of both
stationary and switching dynamical systems compared with prior methods, thus improving the neural self-prediction, behavior decoding, and regime decoding in both simulations
and motor cortical data from the primate brain during reaching movements. Also, (s)PCFEM is more data-efficient than prior learning methods, which is important for experimental
datasets. Finally, sPCF-EM enabled discovering regimes in the motor cortical population
activity that were interpretable and behavior-relevant despite being learned fully unsupervised with respect to behavior. These interpretable neural population regimes were missed
by prior learning methods.
86
We first validated the methods using numerical simulations. In stationary systems, we
showed that parameters are successfully learned by PCF-EM and approach ground truth
model parameters. We also showed that PCF-EM significantly outperforms prior methods
that use the Laplace approximation. For switching systems, parameters are again successfully
learned and the gain from sPCF-EM over prior methods is even greater compared with the
stationary case. To show that successful learning depends on the PCF, we also implemented
an EM framework with the PPF and showed that it significantly underperforms the PCFbased EM. Moreover, sPCF-EM outperforms the prior vLEM method, and importantly can
better identify regimes at lower training sample sizes, i.e., sPCF-EM is more data-efficient.
Numerical simulations gave us additional tools to probe the developed methods as true
latent states, regime states, and model parameters are known in simulations. Prior works
have similarly used simulated data to develop new methods for biotechnologies and neural
engineering1,4,6,12,19,25,32,52,64,90–100
.
The unsupervised discovery of regimes enabled by PCF-based EM can serve as a tool to
facilitate basic neuroscience investigations into neural population dynamics. For example,
we found that there exist regime switches in motor cortical dynamics that may correspond
to hold vs. movement and to movements toward vs. away from the body. These regimes
were found fully unsupervised with respect to behavior yet were behavior-relevant and interpretable. Also, these regimes were more reliably found by PCF-based EM and missed by
the prior vLEM method, likely due to PCF-based EM’s data efficiency. Thus PCF-based
EM provides a tool for basic neuroscience investigations.
The improvements to switch learning that PCF enables was driven by a transition from
the Laplace-based approximations to deterministic sampling-based approximations. This
was evident from substantial improvements over the same EM framework that instead of PCF
used the widely used PPF which utilizes the Laplace approximation for Poisson observations.
While global Laplace methods could outperform the local approach of PPF, PCF-based
methods were still largely more accurate in learning. We hypothesize that this improvement
87
was due to the general property of Laplace approximation not being able to always accurately
capture first and second moments due to focusing on a single point, whereas deterministic
sampling methods use multiple points to approximate second moment terms. However, a
topic for future work would be to theoretically characterize the degree of inaccuracy in second
moment terms and how much this impacts learning.
As discussed in Methods section 3.2.2, deterministic sampling-based non-linear filters
like the unscented Kalman filter (UKF) and cubature Kalman filter (CKF) are widely
used58,75,77,82,83,101,102 but cannot be directly applied to Poisson observations, thus requiring
a novel filter to be developed from a fundamental MMSE approach to properly incorporate Poisson observations which are nonconjugate with the Gaussian latent state. While we
performed the derivation for Poisson observations, the derivation is general with the only
requirement being that the likelihood mean and covariance of the observation modality given
the latent state is known and can be evaluated. Future work could then explore extending
this derivation on other observation modalities that do not fit traditional non-linear filters,
such as the Bernoulli observations, and develop deterministic sampling-based methods for
these modalities. For the Bernoulli modality, recent work has also explored using the PolyaGamma augmentation to enable conditionally conjugate inference98,103. Thus, future work
on deterministic sampling-based methods for Bernoulli observations can compare its learning
performance with that from methods that use the Polya-Gamma augmentation.
Our results in both simulated and experimental data show how PCF-based learning
methods can yield high accuracy parameter estimates. However, there are multiple future
directions of extensions to advance these methods. First, a known general problem with EMbased learning methods is that they can converge to local instead of global maxima13,58,87
.
For learning methods for switching systems, this can manifest as regimes not being properly identified especially with insufficient data as seen for vLEM in our experimental results
and for all methods in our simulation results with the shortest training sample size. While
we showed and used a simple process of randomly initializing multiple times to improve
88
consistency, future work could investigate more principled approaches to initialization, including but not limited to deterministic annealing13,58,104. Second, while we found that the
PCF-based methods were more accurate than prior methods, there was a trade-off in terms
of computational efficiency with the prior vLEM method performing significantly faster.
Nevertheless, our use of a spherical-radial cubature rule in PCF instead of Gauss-Hermite
quadrature rule does enable polynomial scaling instead of exponential scaling with latent
state dimension as described in Methods section 3.2.1. Further, at lower state dimensions,
PCF’s runtime was comparable to that of PPF. As the PCF method involves independent
evaluations of the same few functions over many deterministically sampled points before taking weighted sums, implementations that can more efficiently parallelize these evaluations
could see meaningful reductions in runtime in the future. Third, while PCF was embedded
in a switch EM framework here, future work could also explore using PCF in a variational
learning framework, which may have potential benefits such as improved computational efficiency.
Finally, our results show how replacing the Laplace approximation in a causal filter with
deterministic sampling can lead to improvements in learning, especially for switching systems. However, causal filters using the Laplace approximation are still valuable tools. First,
we used PCF just for learning. Once model parameters are learned, we can use the faster
filters based on Laplace approximation for decoding purposes with the learned parameters.
For stationary systems, PPF that uses the Laplace approximation is known to be quite efficient and even implemented in real-time BMIs for millisecond-by-millisecond control and
feedback17,35,63,80,99,105,106. For switching systems, our work in1
showed that, once parameters
are known, switching filters with Laplace can successfully perform causal decoding. Indeed,
all decoding metrics in this work once parameters are learned use filters with Laplace approximation1,17,20. Thus, Laplace approximation-based filters serve as valuable companion
methods to the developed PCF-based learning methods of this work, especially in the context
of real-time decoding applications once parameters are learned.
89
3.5 Appendix
3.5.1 Example cubature point set
We present an example point set from a 5th-degree spherical-radial cubature rule that would
be used in (3.1). For Gaussian random variables X with dimension d = 3:
ξ1 =
h
0
0
0
i
ξ2:2d+1 =
√
3 + 2 nh 1
0
0
i
,
h
−1
0
0
i
, . . . , h
0
0
1
i
,
h
0
0
−1
io
ξ2d+2:2d
2+1 =
√
3 + 2
√
2
nh 1
1
0
i
,
h
1
−1
0
i
,
h
−1
1
0
i
,
h
−1
−1
0
i
,
h
1
0
1
i
, . . . , h
0
−1
1
i
,
h
0
−1
−1
io
(3.30)
To give some further context and background on our choice of a spherical-radial cubature
rule, the two rules that have been used in prior non-linear filters are the Gauss-Hermite
quadrature rule75 and the spherical-radial cubature rule as used in76,77. Briefly, the GaussHermite quadrature approximates 1 dimensional Gaussians with a fixed number of points m;
this results in a tensor product across all dimensions of a d dimensional Gaussian, which leads
to a total of md points and weights, thus scaling exponentially with latent state dimension75
.
For better computational complexity, the spherical-radial rule first uses a change of variable
such that the function is integrated over hyperspheres along a radial axis where the points
are now chosen from the surface of the spheres76,77. Our choice for PCF to use the 5th-degree
spherical-radial cubature rule76 means that we only need 2d
2 + 1 points, thus allowing for
polynomial scaling and avoiding the exponential scaling of the Gauss-Hermite quadrature
rule.
90
3.5.2 Derivation of cross-covariance term in PCF
We derive the cross-covariance term (3.12) used in PCF as follows:
Λxn = Ex,n[(xt − xˆt|t−1)(nt − nˆt|t−1)
T
|n1:t−1]
= Ex,n[xtn
T
t
|n1:t−1] − xˆt|t−1nˆ
T
t|t−1
(3.31)
Where the first term can be expanded as:
Ex,n[xtn
T
t
|n1:t−1] = Z
xt
X
nt
xtn
T
t
f(xt
, nt
|n1:t−1)dxt
=
Z
xt
xt
"X
nt
n
T
t P(nt
|xt
, n1:t−1)
#
f(xt
|n1:t−1)dxt
=
Z
xt
xt
"X
nt
n
T
t P(nt
|xt)
#
f(xt
|n1:t−1)dxt
=
Z
xt
xtE[nt
|xt
]
T
f(xt
|n1:t−1)dxt
= Ex[xtE[nt
|xt
]
T
|n1:t−1]
(3.32)
3.5.3 Implementation details and the information form of PCF
Our choice of a 5th degree spherical-radial cubature rule is motivated by its comparable
accuracy and improved efficiency relative to a 3rd degree Gauss-Hermite quadrature rule as
shown in76. However, one side effect of this is that the weights of the 5th degree cubature
rule includes negative values, which leads to the possibility of instability76. Thus, we include
in our implementation a check that the final update covariance Λˆ
t|t
is positive semi-definite
and use PPF in the rare instances it is not positive semi-definite due to instability.
We also present an equivalent information form of the Poisson Cubature Filter (PCF)
that is used for the results of this work based on the statistical linear error propagation
method78,107,108. The information form is an equivalent form of filters and can be readily
combined with information filters of separate observation modalities3
.
91
To start, after the prediction step to find xˆt|t−1 and Λˆ
t|t−1, we use numerical integration
to approximate the cross-covariance Λxn as in (3.13). We then make the approximation that
this cross-covariance is equal to the following:
Λxn = Ex,n[(xt − xˆt|t−1)(nt − nˆt|t−1)
T
|n1:t−1]
≈ Λˆ
t|t−1C˜ T
(3.33)
Where C˜ comes from an approximately linear likelihood mean of the observation:
E[nt
|xt
] ≈ Cx˜
t (3.34)
Thus, a C˜ that would yield the same cross-covariance found from numerical integration is
as follows:
C˜ =
Λˆ −1
t|t−1ΛxnT
(3.35)
The covariance Λnn using (3.5.3) then becomes:
Λnn = Ex[V[nt
|xt
]|n1:t−1] + Vx[E[nt
|xt
]|n1:t−1]
= Ex[q(xt)|n1:t−1] + C˜ Λˆ
t|t−1C˜ T
≜ R˜ + C˜ Λˆ
t|t−1C˜ T
(3.36)
Where R˜ indicates Ex[q(xt)|n1:t−1] which is found with numerical integration. We can then
apply the matrix inversion lemma58 to yield the information form of the covariance update:
Λˆ
t|t = Λˆ
t|t−1 − ΛxnΛ
−1
nnΛ
T
xn
=
Λˆ −1
t|t−1 + C˜ TR˜ −1C˜
−1
(3.37)
The mean update is then:
xˆt|t = xˆt|t−1 + Λˆ
t|tC˜ TR˜ −1
(nt − nˆt|t−1) (3.38)
92
3.5.4 M-step equations
We present here the equations for the M-step for switching systems. As stationary systems
can be considered a special case of switching systems with the number of regimes M = 1,
the same equations can be used with ⟨s
(j)
t
⟩ set to 1 and all (j)
notation ignored to get the
stationary version.
E[·|n1:T ] ≜ ⟨·⟩
E[·|n1:T , s
(j)
t
] ≜ ⟨·⟩(j)
⟨s
(j)
t
⟩ = P(s
(j)
t
|n1:T )
A
(j)
∗
=
X
T
t=1
⟨s
(j)
t
⟩⟨xtx
T
t−1
⟩
(j)
! X
T
t=1
⟨s
(j)
t
⟩⟨xt−1x
T
t−1
⟩
(j)
!−1
Q
(j)
∗
=
X
T
t=1
⟨s
(j)
t
⟩(⟨xtx
T
t
⟩
(j) − A
(j)
∗
⟨xt−1x
T
t
⟩
(j) − ⟨xtx
T
t−1
⟩
(j)A
(j)
∗T
+ A
(j)
∗
⟨xt−1x
T
t−1
⟩
(j)A
(j)
∗T
)
! X
T
t=1
⟨s
(j)
t
⟩
!−1
µ
∗
0 = ⟨x0⟩, Λ
∗
0 = ⟨x0x
T
0
⟩ − ⟨x0⟩⟨x0⟩
T
Φ
∗
j,i =
X
T
t=2
⟨s
(i)
t−1
s
(j)
t
⟩
! X
t−1
t=1
⟨s
(j)
t
⟩
!−1
π
∗
j = ⟨s
(j)
1
⟩
(3.39)
For the observation equation parameters, we use numerical trust-region methods109 as implemented in MATLAB.
z
(j)
i ≜
α
(j)
i
β
(j)
i
z
(j)
i = argmax
[α
(j)
i
;β
(j)
i
]
X
T
t=1
⟨s
(j)
t
⟩⟨n
i
t
(αi + β
T
i xt) − exp(αi + β
T
i xt)⟩
(j)
= argmax
[α
(j)
i
;β
(j)
i
]
X
T
t=1
⟨s
(j)
t
⟩
n
i
t
(αi + β
T
i xˆ
(j)
t|T
) − exp
αi + β
T
i xˆ
(j)
t|T +
1
2
β
T
i Λˆ (j)
t|Tβi
!
(3.40)
93
3.5.5 Additional equations for switching smoother for M-step
terms
We use additional RTS smoothers to calculate additional terms necessary for the M-step.
Given the mean and covariance xˆ
(j)
t|T
and Λˆ (j)
t|T
of the density f(xt
|n1:T , s
(j)
t
), we apply the RTS
smoother:
xˆ
(−,j)
t−1|T = xˆ
(−,j)
t−1|t−1 + Jt−1(xˆ
(j)
t|T − xˆ
(j)
t|t−1
)
Λˆ (−,j)
t−1|T = Λˆ (−,j)
t−1|t−1 + Jt−1(Λˆ (j)
t|T − Λˆ (j)
t|t−1
)J
T
t−1
Λˆ (−,j)
t,t−1|T = Λˆ (j)
t|T J
T
t−1
Jt−1 = Λˆ (−,j)
t−1|t−1A
(j)
T
Λˆ (j)−1
t|t−1
(3.41)
where xˆ
(−,j)
t−1|T
and Λˆ (−,j)
t−1|T
are the mean and covariance of f(xt−1|n1:T , s
(j)
t
) and Λˆ (−,j)
t,t−1|T
is the
cross-covariance between xt and xt−1 given n1:T and s
(j)
t
. The remaining necessary terms are
found in the switching filter of1 with the same notation. The necessary terms for the M-step
are then as follows:
E[xtx
T
t−1
|n1:T , s
(j)
t
] = Λˆ (−,j)
t,t−1|T + xˆ
(j)
t|T xˆ
(−,j)
T
t−1|T
E[xt−1x
T
t−1
|n1:T , s
(j)
t
] = Λˆ (−,j)
t−1|T + xˆ
(−,j)
t−1|T xˆ
(−,j)
T
t−1|T
(3.42)
94
Chapter 4
Conclusion
This dissertation covers the work in1 and2 detailing how we developed new models, inference
methods, and unsupervised learning methods for switching dynamical systems with nonGaussian observations. From Chapter 2, we developed the switching multiscale dynamical
system (SMDS) model that generalizes switching linear dynamical system (SLDS) models to
allow for multiscale observations. We also designed a real-time decoding algorithm for the
SMDS model termed the switching multiscale filter (SMF), which simultaneously estimates
regime and brain states from spike-field activity while maintaining a tractable computational
cost. SMF can also be used solely with spiking activity as observations. Finally, we designed
a fixed-interval smoothing algorithm termed the switching multiscale smoother (SMS) which
we showed to be more reliable compared to prior switching smoothers. From Chapter 3, we
developed a novel deterministic sampling filter for Poisson observations called the Poisson
Cubature Filter (PCF). We demonstrated how PCF can enable expectation-maximization
(EM) frameworks for accurate unsupervised learning of both stationary (PCF-EM) and
switching (sPCF-EM) dynamical systems using only Poisson spiking activity. These works
were validated with both extensive numerical simulations and experimental neural data from
monkeys performing various tasks.
Combined, these works go hand-in-hand and provide a complete framework to go from
neural observations with little or no labels to a switching dynamical model that captures potentially interesting regimes to real-time decoding applications where both the behavior and
95
the uncovered regimes can be causally decoded from new neural observations. Without our
learning work, future applications of our inference methods would need to use Laplace-based
learning methods which as we showed can be unreliable in uncovering regimes unsupervised.
Similarly, without our inference work, we would not have known how to embed PCF into
a switch EM framework in our learning work, and we would have also needed to settle for
a less reliable switching smoother which could have resulted in less reliable learning. Thus,
the respective works are critical for each others’ future success.
Such methods lend themselves to multiple potential applications. One such application is
with brain machine interfaces (BMIs). As neurotechnologies are developed for long-term use,
inference algorithms must develop in parallel to become more appropriate for naturalistic
multi-regime scenarios where there could exist multiple tasks with different strategies and/or
complex tasks with multiple discrete stages/regimes. Regime-detection methods like SMF
are a potential answer to solving these challenges. In addition, higher level cognitive states
such as a subject’s level of stress, engagement, or attention can induce multiple regimes in the
task by impacting the strategy and thus the dynamics of behavior and neural activity40–42
.
Indeed, not only can the temporal dynamics of brain states change, how they are encoded
in neural observations can also switch due to external or internal cues43. In these situations,
unsupervised methods like sPCF-EM are potential tools to learn different regimes that could
correlate with different mental states using just neural activity and without mental state
labels for supervised training. The presented framework could also become a valuable tool in
estimating mental states such as mood that are represented across distributed brain regions,
such that they can be used as feedback to personalize therapies for mental disorders such as
deep brain stimulation7,49,50,53
.
With that said, there are future directions of research that could further elevate the developed methods. First, the work in Chapter 2 covers modelling and inference methods for not
only spiking activity but also multiscale observations, while the work in Chapter 3 focuses
on learning with spiking activity in the form of Poisson observations. A future direction
96
of work would be to extend the PCF learning to multiscale observations as well. Thankfully, PCF was developed with an MMSE approach and thus has an equivalent information
form. This is helpful as the information form of different filters for different observation
modalities can be combined in a straightforward manner3
, which just leaves embedding the
combined filter into an EM learning framework. Second, while our model allows for nonlinearity by enabling regime switching, it assumes that the dynamics within each regime are
linear. Using linear brain state dynamics per regime can yield benefits like interpretability, computational efficiency in both learning and inference, data efficiency, and extensions
to real-time applications3,7,12,18,29–31,33–35,49,50,53,79. Future work can explore extending the
model and the associated learning methods to those with non-linear dynamics per regime
to further increase explanatory power, though doing so may lead to a tradeoff in terms of
interpretability or computation and data efficiency. Such an extension can be pursued for
example using recurrent neural networks12,110
.
In summary, we have developed methods for non-Gaussian observations that can achieve
accurate unsupervised learning of stationary and switching dynamical system models as well
as methods for efficient causal and non-causal estimation of behavior and regime states using
learned model parameters with both basic science and neurotechnology applications.
97
References
1. Song, C. Y., Hsieh, H.-L., Pesaran, B. & Shanechi, M. M. Modeling and inference
methods for switching regime-dependent dynamical systems with multiscale neural
observations. en. Journal of Neural Engineering 19. Number: 6 Publisher: IOP Publishing, 066019. doi:10.1088/1741-2552/ac9b94 (2022).
2. Song, C. Y. & Shanechi, M. M. Unsupervised learning of stationary and switching
dynamical system models from Poisson observations. Journal of Neural Engineering
(2023).
3. Bar-Shalom, Y., Li, X. R. & Kirubarajan, T. Estimation with applications to tracking
and navigation: theory algorithms and software doi:10.1002/0471221279 (John Wiley
& Sons, 2001).
4. Zoltowski, D., Pillow, J. & Linderman, S. A general recurrent state space framework
for modeling neural dynamics during decision-making in Proceedings of the 37th International Conference on Machine Learning (eds III, H. D. & Singh, A.) 119 (PMLR,
2020), 11680–11691.
5. Pavlovic, V., Rehg, J. M. & MacCormick, J. Learning Switching Linear Models of
Human Motion in Advances in Neural Information Processing Systems (NIPS) (2000).
6. Hsieh, H.-L., Wong, Y. T., Pesaran, B. & Shanechi, M. M. Multiscale modeling and
decoding algorithms for spike-field activity. Journal of Neural Engineering 16, 016018.
doi:10.1088/1741-2552/aaeb1a (2018).
7. Shanechi, M. M. Brain–machine interfaces from motor to mood. Nature neuroscience
22, 1554–1564. doi:10.1038/s41593-019-0488-y (2019).
8. Abbaspourazad, H., Choudhury, M., Wong, Y. T., Pesaran, B. & Shanechi, M. M.
Multiscale low-dimensional motor cortical state dynamics predict naturalistic reachand-grasp behavior. Nature communications 12, 1–19 (2021).
9. Stavisky, S. D., Kao, J. C., Nuyujukian, P., Ryu, S. I. & Shenoy, K. V. A high performing brain–machine interface driven by low-frequency local field potentials alone and together with spikes. J. Neural Eng. 12, 036009. doi:10.1016/j.jneumeth.2007.10.001
(2015).
98
10. Pesaran, B., Vinck, M., Einevoll, G. T., Sirota, A., Fries, P., Siegel, M., et al. Investigating large-scale brain dynamics using field potential recordings: analysis and
interpretation. Nature neuroscience 21, 903–919. doi:10.1038/s41593-018-0171-8
(2018).
11. Eden, U. T., Frank, L. M. & Tao, L. in Dynamic neuroscience 29–52 (Springer, 2018).
doi:10.1007/978-3-319-71976-4_2.
12. Sani, O. G., Pesaran, B. & Shanechi, M. M. Where is all the nonlinearity: flexible
nonlinear modeling of behaviorally relevant neural dynamics using recurrent neural
networks. bioRxiv. doi:10.1101/2021.09.03.458628 (2021).
13. Murphy, K. P. Switching Kalman Filters. Dynamical Systems (1998).
14. Wu, W., Black, M. J., Mumford, D., Gao, Y., Bienenstock, E. & Donoghue, J. P.
Modeling and decoding motor cortical activity using a switching Kalman filter. IEEE
transactions on biomedical engineering 51, 933–942 (2004).
15. Vyas, S., Golub, M. D., Sussillo, D. & Shenoy, K. V. Computation through neural
population dynamics. Annual Review of Neuroscience 43, 249–275 (2020).
16. Macke, J. H., Buesing, L., Cunningham, J. P., Yu, B. M., Shenoy, K. V. & Sahani, M.
Empirical models of spiking in neural populations in Advances in Neural Information
Processing Systems 24 (Curran Associates, Inc., 2011).
17. Shanechi, M. M., Orsborn, A. L. & Carmena, J. M. Robust brain-machine interface
design using optimal feedback control modeling and adaptive point process filtering.
PLoS Comput. Biol. 12, e1004730. doi:10.1371/journal.pcbi.1004730 (2016).
18. MacKe, J. H., Buesing, L. & Sahani, M. in Advanced State Space Methods for Neural
and Clinical Data (ed Chen, Z.) 137–159 (Cambridge University Press, Cambridge,
2015). doi:10.1017/CBO9781139941433.007.
19. Citi, L., Ba, D., Brown, E. N. & Barbieri, R. Likelihood methods for point processes
with refractoriness. Neural Computation 26, 237–263. doi:10.1162/NECO\_a\_00548
(2014).
20. Eden, U. T., Frank, L. M., Barbieri, R., Solo, V. & Brown, E. N. Dynamic Analysis of
Neural Encoding by Point Process Adaptive Filtering. Neural Computation 16, 971–
998. doi:10.1162/089976604773135069 (2004).
21. Truccolo, W., Eden, U. T., Fellows, M. R., Donoghue, J. P. & Brown, E. N. A Point
Process Framework for Relating Neural Spiking Activity to Spiking History, Neural
Ensemble, and Extrinsic Covariate Effects. Journal of Neurophysiology 93, 1074–1089.
doi:10.1152/jn.00697.2004 (2005).
99
22. Coleman, T. P. & Sarma, S. S. A Computationally Efficient Method for Nonparametric
Modeling of Neural Spiking Activity with Point Processes. Neural Computation 22,
2002–2030. doi:10.1162/NECO_a_00001-Coleman (2010).
23. Chiang, C.-H., Won, S. M., Orsborn, A. L., Yu, K. J., Trumpis, M., Bent, B., et al. Development of a neural interface for high-definition, long-term recording in rodents and
nonhuman primates. Science Translational Medicine 12. doi:10.1126/scitranslmed.
aay4682 (2020).
24. Smith, A. C. & Brown, E. N. Estimating a State-Space Model from Point Process
Observations. Neural Computation 15, 965–991. doi:10.1162/089976603765202622
(2003).
25. Abbaspourazad, H., Hsieh, H. & Shanechi, M. M. A Multiscale Dynamical Modeling
and Identification Framework for Spike-Field Activity. IEEE Transactions on Neural
Systems and Rehabilitation Engineering 27, 1128–1138. doi:10.1109/TNSRE.2019.
2913218 (2019).
26. Lawhern, V., Wu, W., Hatsopoulos, N. & Paninski, L. Population decoding of motor
cortical activity using a generalized linear model with hidden states. Journal of Neuroscience Methods 189, 267–280. doi:https://doi.org/10.1016/j.jneumeth.2010.
03.024 (2010).
27. Paninski, L., Ahmadian, Y., Ferreira, D. G., Koyama, S., Rahnama Rad, K., Vidne, M.,
et al. A new look at state-space models for neural data. en. Journal of Computational
Neuroscience 29, 107–126. doi:10.1007/s10827-009-0179-x (2010).
28. Bishop, C. M. & Nasrabadi, N. M. Pattern recognition and machine learning 4
(Springer, 2006).
29. Shanechi, M. M. Brain–Machine Interface Control Algorithms. IEEE Trans. Neural
Syst. Rehabil. Eng. 25, 1725–1734 (2017).
30. Kao, J. C., Stavisky, S. D., Sussillo, D., Nuyujukian, P. & Shenoy, K. V. Information
systems opportunities in brain–machine interface decoders. Proceedings of the IEEE
102, 666–682 (2014).
31. Churchland, M. M., Cunningham, J. P., Kaufman, M. T., Foster, J. D., Nuyujukian,
P., Ryu, S. I., et al. Neural population dynamics during reaching. Nature 487, 51–56.
doi:10.1038/nature11129 (2012).
32. Sani, O. G., Abbaspourazad, H., Wong, Y. T., Pesaran, B. & Shanechi, M. M. Modeling
behaviorally relevant neural dynamics enabled by preferential subspace identification.
Nature Neuroscience 24, 140–149 (2021).
100
33. Gilja, V., Nuyujukian, P., Chestek, C. A., Cunningham, J. P., Byron, M. Y., Fan,
J. M., et al. A high-performance neural prosthesis enabled by control algorithm design.
Nature neuroscience 15, 1752 (2012).
34. Orsborn, A. L., Moorman, H. G., Overduin, S. A., Shanechi, M. M., Dimitrov, D. F.
& Carmena, J. M. Closed-loop decoder adaptation shapes neural plasticity for skillful
neuroprosthetic control. Neuron 82, 1380–1393 (2014).
35. Shanechi, M. M., Orsborn, A. L., Moorman, H. G., Gowda, S., Dangi, S. & Carmena, J. M. Rapid control and feedback rates enhance neuroprosthetic control. Nat.
Commun. 8, 13825. doi:10.1038/ncomms13825 (2017).
36. Hsieh, H.-L. & Shanechi, M. M. Optimizing the learning rate for adaptive estimation
of neural encoding models. PLOS Computational Biology 14, 1–34. doi:10 . 1371 /
journal.pcbi.1006168 (2018).
37. Yang, Y., Ahmadipour, P. & Shanechi, M. M. Adaptive latent state modeling of brain
network dynamics with real-time learning rate optimization. Journal of Neural Engineering 18, 036013. doi:10.1088/1741-2552/abcefd (2021).
38. Ahmadipour, P., Yang, Y., Chang, E. F. & Shanechi, M. M. Adaptive tracking of
human ECoG network dynamics. Journal of Neural Engineering 18, 016011. doi:10.
1088/1741-2552/abae42 (2020).
39. Yang, Y., Lee, J. T., Guidera, J. A., Vlasov, K. Y., Pei, J., Brown, E. N., et al.
Developing a personalized closed-loop controller of medically-induced coma in a rodent
model. Journal of Neural Engineering 16, 036022 (2019).
40. Schwabe, L. & Wolf, O. T. Stress-induced modulation of instrumental behavior: From
goal-directed to habitual control of action. Behavioural Brain Research 219, 321–328.
doi:10.1016/j.bbr.2010.12.038 (2011).
41. Hennig, J. A., Oby, E. R., Golub, M. D., Bahureksa, L. A., Sadtler, P. T., Quick,
K. M., et al. Learning is shaped by abrupt changes in neural engagement. Nature
Neuroscience 24, 727–736 (2021).
42. Ashwood, Z. C., Roy, N. A., Stone, I. R., Urai, A. E., Churchland, A. K., Pouget, A.,
et al. Mice alternate between discrete strategies during perceptual decision-making.
Nature Neuroscience 25, 201–212 (2022).
43. Low, I. I., Williams, A. H., Campbell, M. G., Linderman, S. W. & Giocomo, L. M.
Dynamic and reversible remapping of network representations in an unchanging environment. Neuron 109, 2967–2980 (2021).
101
44. Mass´e, B., Ba, S. & Horaud, R. Tracking gaze and visual focus of attention of people
involved in social interaction. IEEE transactions on pattern analysis and machine
intelligence 40, 2711–2724 (2017).
45. Blom, H. A. & Bar-Shalom, Y. The interacting multiple model algorithm for systems
with Markovian switching coefficients. IEEE transactions on Automatic Control 33,
780–783 (1988).
46. Kim, C.-J. Dynamic linear models with Markov-switching. Journal of Econometrics
60, 1–22 (1994).
47. Perel, S., Sadtler, P. T., Oby, E. R., Ryu, S. I., Tyler-Kabara, E. C., Batista, A. P., et
al. Single-unit activity, threshold crossings, and local field potentials in motor cortex
differentially encode reach kinematics. Journal of neurophysiology 114, 1500–1512.
doi:10.1152/jn.00293.2014 (2015).
48. Wang, W., Collinger, J. L., Degenhart, A. D., Tyler-Kabara, E. C., Schwartz, A. B.,
Moran, D. W., et al. An electrocorticographic brain interface in an individual with
tetraplegia. PLoS One 8, e55344 (2013).
49. Sani, O. G., Yang, Y., Lee, M. B., Dawes, H. E., Chang, E. F. & Shanechi, M. M.
Mood variations decoded from multi-site intracranial human brain activity. Nature
biotechnology 36, 954–961. doi:10.1038/nbt.4200 (2018).
50. Yang, Y., Qiao, S., Sani, O. G., Sedillo, J. I., Ferrentino, B., Pesaran, B., et al. Modelling and prediction of the dynamic responses of large-scale brain networks during
direct electrical stimulation. Nature Biomedical Engineering 5, 324–345 (2021).
51. Bansal, A. K., Truccolo, W., Vargas-Irwin, C. E. & Donoghue, J. P. Decoding 3D
reach and grasp from hybrid signals in motor and premotor cortices: spikes, multiunit
activity, and local field potentials. Journal of Neurophysiology 107, 1337–1355. doi:10.
1152/jn.00781.2011 (2012).
52. Yang, Y., Connolly, A. T. & Shanechi, M. M. A control-theoretic system identification
framework and a real-time closed-loop clinical simulation testbed for electrical brain
stimulation. J. Neural Eng. 15, 066007. doi:10.1088/1741-2552/aad1a8 (2018).
53. Yang, Y., Sani, O. G., Chang, E. F. & Shanechi, M. M. Dynamic network modeling
and dimensionality reduction for human ECoG activity. J. Neural Eng. 16, 056014
(2019).
54. Fr¨uhwirth-Schnatter, S. Finite mixture and Markov switching models (Springer Science
& Business Media, 2006).
102
55. Barber, D. Expectation correction for smoothed inference in switching linear dynamical systems. Journal of Machine Learning Research 7 (2006).
56. Pawitan, Y. In all likelihood: statistical modelling and inference using likelihood (Oxford University Press, 2001).
57. Markowitz, D. A., Curtis, C. E. & Pesaran, B. Multiple component networks support
working memory in prefrontal cortex. Proceedings of the National Academy of Sciences
112, 11084–11089 (2015).
58. Murphy, K. P. Machine learning: a probabilistic perspective (MIT press, 2012).
59. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and
powerful approach to multiple testing. Journal of the Royal statistical society: series
B (Methodological) 57, 289–300 (1995).
60. Kao, J. C., Nuyujukian, P., Ryu, S. I. & Shenoy, K. V. A high-performance neural
prosthesis incorporating discrete state selection with hidden Markov models. IEEE
Transactions on Biomedical Engineering 64, 935–945 (2016).
61. Yu, B. M., Kemere, C., Santhanam, G., Afshar, A., Ryu, S. I., Meng, T. H., et al.
Mixture of trajectory models for neural decoding of goal-directed movements. Journal
of neurophysiology 97, 3763–3780 (2007).
62. Einevoll, G. T., Kayser, C., Logothetis, N. K. & Panzeri, S. Modelling and analysis
of local field potentials for studying the function of cortical circuits. Nature Reviews
Neuroscience 14, 770–785. doi:10.1038/nrn3599 (2013).
63. Bighamian, R., Wong, Y. T., Pesaran, B. & Shanechi, M. M. Sparse model-based
estimation of functional dependence in high-dimensional field and spike multiscale
networks. Journal of Neural Engineering 16, 056022. doi:10.1088/1741-2552/ab225b
(2019).
64. Wang, C. & Shanechi, M. M. Estimating Multiscale Direct Causality Graphs in Neural
Spike-Field Networks. IEEE Trans. Neural Syst. Rehabil. Eng. 27, 857–866 (2019).
65. Wang, C., Pesaran, B. & Shanechi, M. M. Modeling multiscale causal interactions
between spiking and field potential signals during behavior. Journal of Neural Engineering 19, 026001. doi:10.1088/1741-2552/ac4e1c (2022).
66. Naufel, S., Glaser, J. I., Kording, K. P., Perreault, E. J. & Miller, L. E. A muscleactivity-dependent gain between motor cortex and EMG. Journal of Neurophysiology
121. Publisher: American Physiological Society, 61–73. doi:10.1152/jn.00329.2018
(2019).
103
67. Losey, D. M., Hennig, J. A., Oby, E. R., Golub, M. D., Sadtler, P. T., Quick, K. M.,
et al. Learning alters neural activity to simultaneously support memory and action
en. Pages: 2022.07.05.498856 Section: New Results. 2022. doi:10.1101/2022.07.05.
498856.
68. Waiblinger, C., McDonnell, M. E., Reedy, A. R., Borden, P. Y. & Stanley, G. B. Emerging experience-dependent dynamics in primary somatosensory cortex reflect behavioral
adaptation. en. Nature Communications 13. Number: 1 Publisher: Nature Publishing
Group, 1–15. doi:10.1038/s41467-022-28193-z (2022).
69. Vahidi, P., Sani, O. G. & Shanechi, M. M. Modeling and dissociation of intrinsic and input-driven neural population dynamics underlying behavior en. Pages:
2023.03.14.532554 Section: New Results. 2023. doi:10.1101/2023.03.14.532554.
70. Dempster, A. P., Laird, N. M. & Rubin, D. B. Maximum Likelihood from Incomplete Data Via the EM Algorithm. Journal of the Royal Statistical Society: Series B
(Methodological). doi:10.1111/j.2517-6161.1977.tb01600.x (1977).
71. Wang, C. & Blei, D. M. Variational Inference in Nonconjugate Models. en. J. Mach.
Learn. Res. 14, 1005–1031 (2013).
72. Blei, D. M., Kucukelbir, A. & McAuliffe, J. D. Variational Inference: A Review for
Statisticians. Journal of the American Statistical Association 112. Number: 518 Publisher: Taylor & Francis eprint: https://doi.org/10.1080/01621459.2017.1285773, 859–
877. doi:10.1080/01621459.2017.1285773 (2017).
73. Saul, L. K., Jaakkola, T. & Jordan, M. I. Mean Field Theory for Sigmoid Belief
Networks. en. Journal of Artificial Intelligence Research 4, 61–76. doi:10.1613/jair.
251 (1996).
74. Kalman, R. E. A New Approach to Linear Filtering and Prediction Problems. Journal
of Basic Engineering 82, 35–45. doi:10.1115/1.3662552 (1960).
75. Arasaratnam, I., Haykin, S. & Elliott, R. J. Discrete-time nonlinear filtering algorithms
using gauss-hermite quadrature. Proceedings of the IEEE 95. Number: 5, 953–977.
doi:10.1109/JPROC.2007.894705 (2007).
76. Jia, B., Xin, M. & Cheng, Y. High-degree cubature Kalman filter. Automatica 49.
Number: 2 Publisher: Elsevier Ltd, 510–518. doi:10.1016/j.automatica.2012.11.
014 (2013).
77. Arasaratnam, I. & Haykin, S. Cubature kalman filters. IEEE Transactions on Automatic Control 54. Number: 6, 1254–1269. doi:10.1109/TAC.2009.2019800 (2009).
104
78. Bharani Chandra, K. P., Gu, D. W. & Postlethwaite, I. Cubature information filter and
its applications. Proceedings of the American Control Conference. Number: 3 ISBN:
9781457700804, 3609–3614. doi:10.1109/acc.2011.5990913 (2011).
79. Kao, J. C., Nuyujukian, P., Ryu, S. I., Churchland, M. M., Cunningham, J. P. &
Shenoy, K. V. Single-trial dynamics of motor cortex and their applications to brainmachine interfaces. Nature communications 6, 1–12 (2015).
80. Abbaspourazad, H., Choudhury, M., Wong, Y. T., Pesaran, B. & Shanechi, M. M.
Multiscale low-dimensional motor cortical state dynamics predict naturalistic reachand-grasp behavior. Nature Communications 12. Number: 1 Publisher: Springer US.
doi:10.1038/s41467-020-20197-x (2021).
81. Julier, S., Uhlmann, J. & Durrant-Whyte, H. F. A new method for the nonlinear
transformation of means and covariances in filters and estimators. IEEE Transactions
on Automatic Control 45. Number: 3, 477–482. doi:10 . 1109 / TAC . 2002 . 800742
(2000).
82. Menegaz, H. M., Ishihara, J. Y., Borges, G. A. & Vargas, A. N. A Systematization of
the Unscented Kalman Filter Theory. IEEE Transactions on Automatic Control 60.
Number: 10, 2583–2598. doi:10.1109/TAC.2015.2404511 (2015).
83. Julier, S., Uhlmann, J. & Durrant-Whyte, H. A new approach for filtering nonlinear
systems in Proceedings of 1995 American Control Conference - ACC’95 3 (1995),
1628–1632 vol.3. doi:10.1109/ACC.1995.529783.
84. Krishnan, V. Probability and random processes (John Wiley & Sons, Inc., 2005).
85. Blitzstein, J. K. & Hwang, J. Introduction to Probability en. Google-Books-ID: LciHDwAAQBAJ (CRC Press, 2019).
86. Kuss, M. & Rasmussen, C. E. Assessing Approximate Inference for Binary Gaussian
Process Classification. Journal of Machine Learning Research 6, 1679–1704 (2005).
87. Agrusa, A. S., Kunkel, D. C. & Coleman, T. P. Robust Regression and Optimal Transport Methods to Predict Gastrointestinal Disease Etiology From High Resolution EGG
and Symptom Severity. IEEE Transactions on Biomedical Engineering 69. Conference
Name: IEEE Transactions on Biomedical Engineering, 3313–3325. doi:10.1109/TBME.
2022.3167338 (2022).
88. O’Doherty, J. E., Cardoso, M. M. B., Makin, J. G. & Sabes, P. N. Nonhuman Primate Reaching with Multichannel Sensorimotor Cortex Electrophysiology 2020. doi:10.
5281/zenodo.3854034.
105
89. Makin, J. G., O’Doherty, J. E., Cardoso, M. M. B. & Sabes, P. N. Superior armmovement decoding from cortex with a new, unsupervised-learning algorithm. en.
Journal of Neural Engineering 15. Number: 2 Publisher: IOP Publishing, 026010.
doi:10.1088/1741-2552/aa9e95 (2018).
90. Bolus, M., Willats, A., Whitmire, C., Rozell, C. & Stanley, G. Design strategies for dynamic closed-loop optogenetic neurocontrol in vivo. J. Neural Eng. 15, 026011 (2018).
91. Greco, A., Valenza, G., Lanata, A., Scilingo, E. P. & Citi, L. cvxEDA: A convex optimization approach to electrodermal activity processing. IEEE Trans. Biomed. Eng.
63, 797–804 (2015).
92. Charles, A. S., Yap, H. L. & Rozell, C. J. Short-term memory capacity in networks
via the restricted isometry property. Neural Comput. 26, 1198–1235 (2014).
93. Cunningham, J. P., Nuyujukian, P., Gilja, V., Chestek, C. A., Ryu, S. I. & Shenoy,
K. V. A closed-loop human simulator for investigating the role of feedback control in
brain-machine interfaces. J. Neurophysiol. 105, 1932–1949 (2010).
94. Sadras, N., Pesaran, B. & Shanechi, M. M. A point-process matched filter for event
detection and decoding from population spike trains. J. Neural Eng. 16, 066016.
doi:10.1088/1741-2552/ab3dbc (2019).
95. Abbaspourazad, H., Erturk, E., Pesaran, B. & Shanechi, M. M. Dynamical flexible inference of nonlinear latent structures in neural population activity en. Pages:
2023.03.13.532479 Section: New Results. 2023. doi:10.1101/2023.03.13.532479.
96. Johnsen, K. A., Cruzado, N. A., Willats, A. A. & Rozell, C. J. Cleo: A testbed for
bridging model and experiment by simulating closed-loop stimulation, electrode recording, and optogenetics en. Pages: 2023.01.27.525963 Section: New Results. 2023. doi:10.
1101/2023.01.27.525963.
97. Allegra, A. B., Gharibans, A. A., Schamberg, G. E., Kunkel, D. C. & Coleman, T. P.
Bayesian inverse methods for spatiotemporal characterization of gastric electrical activity from cutaneous multi-electrode recordings. en. PLOS ONE 14. Publisher: Public
Library of Science, e0220315. doi:10.1371/journal.pone.0220315 (2019).
98. Linderman, S., Johnson, M., Miller, A., Adams, R., Blei, D. & Paninski, L. Bayesian
Learning and Inference in Recurrent Switching Linear Dynamical Systems in Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (eds
Singh, A. & Zhu, J.) 54 (PMLR, Fort Lauderdale, FL, USA, 2017), 914–922.
99. Hsieh, H.-L. & Shanechi, M. M. Optimizing the learning rate for adaptive estimation
of neural encoding models. PLOS Computational Biology 14. Number: 5 Publisher:
Public Library of Science, 1–34. doi:10.1371/journal.pcbi.1006168 (2018).
106
100. Yang, Y., Ahmadipour, P. & Shanechi, M. M. Adaptive latent state modeling of brain
network dynamics with real-time learning rate optimization. en. Journal of Neural
Engineering 18. Publisher: IOP Publishing, 036013. doi:10.1088/1741-2552/abcefd
(2021).
101. Li, Z., O’Doherty, J. E., Hanson, T. L., Lebedev, M. A., Henriquez, C. S. & Nicolelis,
M. A. L. Unscented Kalman Filter for Brain-Machine Interfaces. en. PLOS ONE 4.
Number: 7 Publisher: Public Library of Science, e6243. doi:10.1371/journal.pone.
0006243 (2009).
102. Li, S., Li, J. & Li, Z. An Improved Unscented Kalman Filter Based Decoder for Cortical
Brain-Machine Interfaces. Frontiers in Neuroscience 10 (2016).
103. Polson, N. G., Scott, J. G. & Windle, J. Bayesian Inference for Logistic Models Using
P´olya–Gamma Latent Variables. Journal of the American Statistical Association 108.
Publisher: Taylor & Francis eprint: https://doi.org/10.1080/01621459.2013.829001,
1339–1349. doi:10.1080/01621459.2013.829001 (2013).
104. Ueda, N. & Nakano, R. Deterministic annealing variant of the {EM} algorithm. Advances in Neural Information Processing Systems, 545–552 (1995).
105. Shanechi, M. M., Hu, R. C. & Williams, Z. M. A cortical–spinal prosthesis for targeted
limb movement in paralysed primate avatars. en. Nature Communications 5. Number:
1 Publisher: Nature Publishing Group, 3237. doi:10.1038/ncomms4237 (2014).
106. Shanechi, M. M., Hu, R. C., Powers, M., Wornell, G. W., Brown, E. N. & Williams,
Z. M. Neural population partitioning and a concurrent brain-machine interface for
sequential motor function. Nature Neuroscience 15. Number: 12, 1715–1722. doi:10.
1038/nn.3250 (2012).
107. Sibley, G. & Sukhatme, G. The iterated sigma point kalman filter with applications
to long range stereo. Proceedings of Robotics: Science (2006).
108. Kim, Y. S., Lee, J. H., Do, H. M., Kim, B. K., Tanikawa, T., Ohba, K., et al. Unscented
information filtering method for reducing multiple sensor registration error in 2008
IEEE International Conference on Multisensor Fusion and Integration for Intelligent
Systems (2008), 326–331. doi:10.1109/MFI.2008.4648086.
109. Byrd, R. H., Schnabel, R. B. & Shultz, G. A. A trust region algorithm for nonlinearly
constrained optimization. SIAM Journal on Numerical Analysis 24, 1152–1170 (1987).
110. Sip, V., Petkoski, S., Hashemi, M., Dickscheid, T., Amunts, K. & Jirsa, V. Parameter
inference on brain network models with unknown node dynamics and spatial heterogeneity. bioRxiv. doi:10.1101/2021.09.01.458521 (2022).
107
Abstract (if available)
Abstract
Switching dynamical system models are powerful tools for investigations of neural population dynamics and applications in brain machine interfaces as they can model switches between distinct regimes, e.g., due to changes in task or lapses in engagement. However, realizing neurotechnologies that can use such models requires overcoming challenges on multiple fronts from modelling and inference to learning system parameters.
First, the distributions that best capture the fast and discrete nature of spiking activity from individual neurons differ significantly from those that capture slower large-scale population activity measured with continuous local field potential signals. To address challenges of incorporating not only discrete but also multi-modal observations into switching systems, we develop a novel Switching Multiscale Dynamical System model and the associated inference methods. We perform validation in numerical simulations and prefrontal spike-field data and show that the methods can successfully combine the spiking and field potential observations to simultaneously track the regime and behavior states accurately.
Next, focusing on discrete Poisson observations, experimental data often lacks reliable trial or regime labels, necessitating accurate unsupervised learning methods to find model parameters. While prior learning methods exist, they use the Laplace approximation for inference which can lead to inaccurate learning. To address this, we derive a novel filter called the Poisson Cubature Filter and embed it in unsupervised learning frameworks. We perform validation in numerical simulations and motor spiking data and show that the developed methods yield is more accurate and data efficient than prior methods and can uncover behavior-relevant regimes completely unsupervised.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Geometric and dynamical modeling of multiscale neural population activity
PDF
Dynamical representation learning for multiscale brain activity
PDF
Multiscale spike-field network causality identification
PDF
Detection and decoding of cognitive states from neural activity to enable a performance-improving brain-computer interface
PDF
Dealing with unknown unknowns
PDF
Data-driven learning for dynamical systems in biology
PDF
Theoretical foundations and design methodologies for cyber-neural systems
PDF
Statistical inference for dynamical, interacting multi-object systems with emphasis on human small group interactions
PDF
Efficient inverse analysis with dynamic and stochastic reductions for large-scale models of multi-component systems
PDF
Latent space dynamics for interpretation, monitoring, and prediction in industrial systems
PDF
Decoding memory from spatio-temporal patterns of neuronal spikes
PDF
Distributed adaptive control with application to heating, ventilation and air-conditioning systems
PDF
Theoretical and computational foundations for cyber‐physical systems design
PDF
Deep learning architectures for characterization and forecasting of fluid flow in subsurface systems
PDF
Federated and distributed machine learning at scale: from systems to algorithms to applications
PDF
Understanding dynamics of cyber-physical systems: mathematical models, control algorithms and hardware incarnations
PDF
Integration of multi-physics data into dynamic reservoir model with ensemble Kalman filter
PDF
Verification, learning and control in cyber-physical systems
PDF
Exploiting mechanical properties of bipedal robots for proprioception and learning of walking
PDF
Theory of memory-enhanced neural systems and image-assisted neural machine translation
Asset Metadata
Creator
Song, Christian Yao
(author)
Core Title
Switching dynamical systems with Poisson and multiscale observations with applications to neural population activity
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Degree Conferral Date
2023-12
Publication Date
04/19/2024
Defense Date
09/22/2023
Publisher
Los Angeles, California
(original),
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
brain machine interface,deterministic sampling,dynamical systems,multiscale,OAI-PMH Harvest,Poisson,switching state space,unsupervised learning
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Shanechi, Maryam (
committee chair
), Mel, Bartlett (
committee member
), Nayak, Krishna (
committee member
)
Creator Email
cys@usc.edu,song.christian.y@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC113761112
Unique identifier
UC113761112
Identifier
etd-SongChrist-12432.pdf (filename)
Legacy Identifier
etd-SongChrist-12432
Document Type
Thesis
Format
theses (aat)
Rights
Song, Christian Yao
Internet Media Type
application/pdf
Type
texts
Source
20231030-usctheses-batch-1103
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
brain machine interface
deterministic sampling
dynamical systems
multiscale
Poisson
switching state space
unsupervised learning