Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Dynamical representation learning for multiscale brain activity
(USC Thesis Other)
Dynamical representation learning for multiscale brain activity
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Dynamical Representation Learning for Multiscale Brain Activity
by
Hamidreza Abbaspourazad
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(ELECTRICAL ENGINEERING)
December 2022
Copyright 2022 Hamidreza Abbaspourazad
I dedicate this thesis to my mom, dad and sister,
for their unconditional love and support.
ii
Acknowledgements
I would like to thank my advisor, professor Maryam Shanechi, for believing in me from day one and her
guidance and support throughout my PhD. I joined Shanechi lab as an undergraduate student who did not
have any research experience. Maryam introduced me to the research world, particularly statistical machine
learning, computational neuroscience and brain-machine interfaces. Her vision in approaching research
problems taught me how to critically tackle new problems, evaluate both the problem and the solution, and
look for potential ways to improve the solution. Maryam has always been open to my new ideas, and shown
incredible dedication and support to bring them to fruition. I will be forever grateful for the impact she has
had on me both as a person and as a scientist.
I would like to thank my dissertation committee members professors Antonio Ortega, Francisco Valero-
Cuevas and Daniel Lidar for their valuable feedback and support. Also, I thank professor Krishna Nayak
as a member of my PhD qualification committee. The committee members have been very kind to give me
feedback about my research outcomes on various occasions.
I would like to thank all of my friends for their support during my PhD experience. I thank all my friends
and colleagues in the Shanechi lab for the valuable lessons I learned from them in various aspects. I also
sincerely thank Omid Sani, Han-Lin Hsieh, Eray Erturk, Yusuf Umut Ciftci and Lucine Oganecian for their
help in several finished and ongoing projects. I would like to also thank Bijan Pesaran and his team, for their
critical contributions in data collection for various projects.
Last but not least, I would like to thank my mom and my dad for the countless sacrifices they have made
to shape me into who I am today, and my sister for always being there for me. I could not have asked for a
more supportive and loving family, they mean the world to me and I would not have achieved what I have if
it was not for them.
iii
Table of Contents
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Studying low-dimensional latent factors of multiscale neural population activity . . . . . . . 2
1.2 Developing nonlinear latent dynamic models of neural population activity . . . . . . . . . . 4
Chapter 2: Multiscale dynamical model and multiscale expectation-maximization (EM) algorithm . . 6
2.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.1 Multiscale dynamical model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.2 Multiscale EM algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.2.1 Expectation step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.2.2 Maximization step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.1 Validation on numerical simulations . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.1.1 Multiscale EM accurately identifies the multiscale dynamical model for
spike-field activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.1.2 Multiscale EM enables accurate decoding of behavior . . . . . . . . . . . 15
2.2.1.3 Multiscale EM identifies a hidden state that carries information from both
scales of activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.2 Validation on the 3D naturalistic reach-and-grasp task . . . . . . . . . . . . . . . . 16
2.2.2.1 3D naturalistic reach-and-grasp task . . . . . . . . . . . . . . . . . . . . 16
2.2.2.2 Multiscale EM algorithm improved behavior decoding accuracy in the
naturalistic 3D reach-and-grasp task . . . . . . . . . . . . . . . . . . . . 17
2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Chapter 3: Studying multiscale low-dimensional dynamics during naturalistic reach-and-grasp move-
ments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2.1 Spiking and LFP network activity exhibited several principal modes . . . . . . . . . 24
3.2.2 The predictive mode was multiscale . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.3 Illustration of multiscale predictive mode and its dynamical characteristics . . . . . 25
3.2.4 The multiscale predictive mode was not replicated in behavior dynamics . . . . . . . 28
iv
3.2.5 The predictive mode was present in both low- and high-frequency LFP bands . . . . 29
3.2.6 Both the decay and frequency of the multiscale predictive mode explained natural-
istic behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Chapter 4: Dynamical flexible inference of nonlinear latent structures in neural population activity . 36
4.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.1.1 DFINE generative model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.1.2 DFINE inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.1.3 Learning the DFINE model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.1.4 Extending DFINE to supervised DFINE . . . . . . . . . . . . . . . . . . . . . . . . 41
4.1.5 DFINE learning details and hyperparameters . . . . . . . . . . . . . . . . . . . . . 42
4.1.6 Comparison with benchmark methods such as linear dynamical models and sequen-
tial autoencoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.1.7 Topological data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2.1 DFINE successfully learns the dynamics on diverse nonlinear manifolds and enables
flexible inference in simulated datasets . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2.2 DFINE extracts single-trial latent factors that accurately predict neural activity and
behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.2.3 DFINE more robustly extracts the manifold structure in single-trials . . . . . . . . . 46
4.2.4 DFINE can also be extended to enable supervision and improve behavior prediction
accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.2.5 DFINE can perform flexible inference with missing observations . . . . . . . . . . . 49
4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Chapter 5: Training artificial recurrent neural networks that recapitulate the latent manifold structure
in the neural population activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.1.1 Details of RNN training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.1.2 Cannonical correlation analysis between the RNN and neural activity . . . . . . . . 57
5.1.3 Topological data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.1.4 Partitioning the space into the loop coordinate and appending dimensions . . . . . . 57
5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.2.1 CCA reveals a striking similarity between the RNN and neural activity . . . . . . . 58
5.2.2 RNN activity exhibited a ring-like manifold structure . . . . . . . . . . . . . . . . . 59
5.2.3 The loop coordinate explains the course of movements in the x-y direction while
other appending dimensions explain the target variability in the z direction . . . . . 60
5.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Chapter 6: Conclusion and ongoing work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
A Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
A.1 Saccade task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
A.2 Naturalistic 3D reach-and-grasp task . . . . . . . . . . . . . . . . . . . . . . . . . . 84
A.3 2D random-target reaching task . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
v
A.4 2D grid reaching task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
vi
List of Figures
2.1 Multiscale dynamical model and multiscale EM algorithm. . . . . . . . . . . . . . . . . . . 9
2.2 Goodness-of-fit of the multiscale EM algorithm. . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Decoding results with the multiscale EM algorithm. . . . . . . . . . . . . . . . . . . . . . . 15
2.4 Multiscale EM algorithm’s decoded trajectories. . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5 Multiscale EM algorithm fuses information across scales. . . . . . . . . . . . . . . . . . . . 18
3.1 Modal analysis framework for the neural population activity. . . . . . . . . . . . . . . . . . 23
3.2 Eigenvalue-dimension diagram of spiking and LFP network activity . . . . . . . . . . . . . 26
3.3 Multiscale predictive mode is shared across scales . . . . . . . . . . . . . . . . . . . . . . . 27
3.4 Illustration of multiscale predictive mode’s dynamical characteristics . . . . . . . . . . . . . 28
3.5 Control modal analysis of behavior variables . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.6 Eigenvalue-dimension diagram for low- and high-frequency bands of LFP . . . . . . . . . . 34
3.7 Importance of both decay and frequency of the multiscale predictive mode . . . . . . . . . . 35
4.1 DFINE method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2 Validation of DFINE on numerical simulations . . . . . . . . . . . . . . . . . . . . . . . . 45
4.3 Validation of DFINE on various datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.4 Example latent factor trajectories for the motor datasets . . . . . . . . . . . . . . . . . . . . 48
4.5 Supervised DFINE improves the behavior prediction accuracy . . . . . . . . . . . . . . . . 50
4.6 DFINE enables flexible inference in the presence of missing observations . . . . . . . . . . 52
4.7 DFINE outperforms SAEs in the presence of missing observations . . . . . . . . . . . . . . 53
5.1 Training task RNNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.2 RNN activity are similar to the recorded neural activity. . . . . . . . . . . . . . . . . . . . . 59
5.3 The role of loop coordinate and the manifold’s appending dimensions in the control of move-
ments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.1 Self-supervised contrastive learning for building neural decoders when few labels are available 64
vii
6.2 Experimental datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
viii
Abstract
Brain activity exhibits various spatiotemporal dynamical patterns emerging from the coordinated action of
neuronal populations in time and across many brain regions. As such, we can view the brain as a complex
dynamical system with high-dimensional observations in the form of neural population activity, which can
be described in terms of lower-dimensional latent factors. Therefore, developing models and algorithms that
extract latent factors of the neural population activity allows us to better understand the brain mechanisms
during tasks and to build more accurate brain-machine interfaces. Such models should characterize multiple
scales of neural population activity while enabling accurate and flexible inference of latent factors for basic
science and neurotechnology applications. The proposed research involves multiple advancements includ-
ing a dynamic latent model of multiscale neural population activity along with its learning and inference
algorithms, a novel discovery about the similarities and differences in low-dimensional neural dynamics
across the small-scale and large-scale brain networks, a novel nonlinear dynamic model using deep learning
techniques to incorporate nonlinearities while enabling the flexible inference properties such as causal or
non-causal inference in the presence of missing observations, and validating the low-dimensional manifold
structures emerging in neural population activity using artificial neural networks. Overall, the outcomes of
this research will advance future neuroscientific studies of small-scale and large-scale neural population ac-
tivity and provide general tools to extract low-dimensional latent factors from high-dimensional time-series.
ix
Chapter 1
Introduction
The ability to move is an essential part of our everyday lives. Populations of cortical neurons utilize the
control of movement, but the way they do so has remained controversial to date. The very first studies
of movement encoding in the brain, investigated how force [1] or movement direction [2] is encoded in
single neurons and later led to promising results showing encoding of movement velocity, direction and
speed in single neurons’ activity [3, 4]. This body of research paved the way for brain-machine interface
(BMI) research and successful clinical trials of BMIs aiming to restore movement to paralyzed patients [5–
8]. These early works assume that each neuron directly encodes movement parameters. However, several
recent studies have shown evidence to the contrary such as heterogeneous representation of movement pa-
rameters in the neurons across a neuronal population and variations in modulation in a single neuron across
different trials [9–11]. Such conflicting evidence regarding the information encoding in single neurons, ad-
vancements in recording high-dimensional neural activity and powerful statistical tools have led scientists to
study high-dimensional neural activity and extract explanatory low-dimensional latent factors summarizing
the temporal information in the high-dimensional neural population activity [12–16]. The studies of neural
population activity strongly suggest that the encoding of information happens at the neuronal population
level rather than single neurons [17–29], thus, highlighting the importance of studying interpretable low-
dimensional latent factors in high-dimensional neural activity. These low-dimensional latent factors are also
called low-dimensional dynamics in the neuroscience literature. In this work, we develop machine learning
tools to extract the low-dimensional latent factors of neural population activity and investigate them from
various aspects.
1
1.1 Studying low-dimensional latent factors of multiscale neural population
activity
The prior works on low-dimensional dynamic modeling of the neural population activity, have mostly fo-
cused on single-scale neural activity, mostly spiking activity [19–29]. However, it is well-known that be-
havioral information is encoded across multiple spatiotemporal scales of brain activity ranging from spiking
activity of individual or local neuronal networks to larger-scale neuronal networks that are reflected in lo-
cal field potentials (LFP) [30–40]. Therefore, it is essential to investigate the low-dimensional dynamics,
across small- and large-scale brain networks, i.e., across spiking and LFP network activity. Here, we call
the high-dimensional spiking activity as spiking network activity and the high-dimensional LFP activity as
LFP network activity. One of the major goals in this work is to build dynamical models for combined spik-
ing and LFP network activity, extract low-dimensional dynamics and study their characteristics. Modeling
spiking and LFP network activity jointly is challenging because spiking and LFP activity measure different
biological processes in different spatial and temporal scales, and with different statistical profiles [40–42].
Spiking activity measures all-or-none action potential events that ultimately drive muscle contraction and
LFP activity is continuously-varying and measures sub-threshold synaptic and dendritic activity that shapes
when cells generate action potentials [40–42]. We build models that simultaneously characterize spiking
and LFP activity in addition to comparing the low-dimensional dynamics from each scale.
We develop a multiscale dynamical model for the combined spiking and LFP network activity (i.e., mul-
tiscale activity), inspired by linear dynamical models (LDM), and a machine learning framework to learn
its model parameters. Multiscale dynamical model allows us to characterize spiking and LFP activity in a
unified dynamical model and enables extracting the low-dimensional dynamics from multiscale activity. We
then study the dynamical characteristics of the low-dimensional dynamics in spiking network activity and
LFP network activity during a naturalistic coordinated 3D reach-and grasp task performed by non-human-
primates. In fact, it is essential to study low-dimensional dynamics in small- and large-scale brain networks
as the following questions had remained elusive to date: 1) what are the dynamical characteristics of the
low-dimensional neural dynamics within spiking and LFP network activity during naturalistic movements,
2) are the low-dimensional dynamics across these scales shared or distinct, 3) what are the characteristics of
behaviorally relevant low-dimensional dynamics and are they shared between spiking and LFP network ac-
tivity. We exploit LDMs to characterize neural activity and later extend them to nonlinear versions. LDMs
2
have several properties that makes them powerful tools to model discrete [43] or continuous observation
time-series [44]: 1) they are generative models for the observations, therefore they could be used for obser-
vation prediction into the future, missing data imputation and generating new observations, 2) the state in
LDM could be in form of an observable state, e.g., movement parameters in motor tasks, or a latent state
(which is the case for most of the results in this work). In case of the latent state, the dimension of latent
state could be significantly lower than the observation dimension, allowing the information to be summa-
rized in the state. Therefore, LDMs could be used as a dimensionality reduction tool. 3) Last but not least,
given the nice properties of Kalman filtering, LDMs facilitate fast and exact calculation of the state poste-
rior distribution given a series of observations. This property come in useful for variational inference using
expectation-maximization (EM) algorithm, where the exact posterior of latent factors given the observations
can be analytically computed via Kalman filtering. To jointly characterize spiking and LFP network activity,
we use latent LDM formulation where the brain state is latent, LFP observations are formulated as Gaussian
observations of the latent state [32, 33, 35, 45–48] and binary spiking activity is modeled by a point process
likelihood function where the firing rate is a log-linear function of the state [33, 43, 49–54]. We develop a
novel expectation-maximization (EM) algorithm, multiscale EM algorithm, to learn the model parameters
in the presence of the latent state [55]. Through extensive numerical simulations, we show that multiscale
EM algorithm can correctly learn the model parameters, help decoding of an observable behavior parameter,
increase the decoding accuracy compared to single-scale EM [55].
After validating the multiscale EM algorithm, we apply it on a 3D naturalistic reach-and-grasp task per-
formed by non-human primates. We first show that multiscale EM algorithm increases behavior prediction
accuracy compared to single-scale EM algorithm, meaning that the multiscale EM fuses the information
from both scales in the latent state. We then study whether the low-dimensional dynamics in spiking net-
work activity and LFP network activity have similar or different dynamical characteristics and how behavior
is represented in these low-dimensional states [56]. For this purpose, we learn single-scale and multiscale
dynamical models from spiking and LFP network activity. To get the unique dynamical characteristics in the
latent state, we partition the latent state into dynamical modes with unique decay-frequency characteristics.
We then find the principal modes of the network dynamics, i.e., modes that are persistently identified in
the LDM regardless of the state dimension. We observed that spiking and LFP network activity each have
several principal modes where one dominated the behavior prediction accuracy. Interestingly, this predictive
3
mode was shared between spiking and LFP network activity, and between experimental sessions and mon-
keys. This implied that a preserved multiscale low-dimensional motor cortical state generates naturalistic
reach-and-grasp movements.
1.2 Developing nonlinear latent dynamic models of neural population activity
As we discussed earlier, neural population activity can be summarized with low-dimensional dynamics. We
call this low-dimensional latent structure as the low-dimensional manifold structure that neural population
activity lives on. Most prior methods of neural population activity have used linear methods to extract
such low-dimensional manifold structure [13, 19, 25, 26, 43, 56–61]. However, several recent works have
shown that incorporating nonlinearities could lead to a better understanding of the low-dimensional manifold
structure [29, 62–64]. Here, we use deep learning techniques to incorporate nonlinearities into our modeling
frameworks. We also develop a dynamic generative model of neural population activity that is accurate in
characterizing neural population activity with potential nonlinearities, but also enable the flexible inference
of latent factors so that they can be seamlessly extracted in basic science and neurotechnology applications.
We also use artificial neural networks to investigate whether observing a ring-like manifold structure is a
realistic phenomenon in the naturalistic dynamical systems.
In the first approach, we leverage neural networks to build a dynamic generative model of neural popu-
lation activity that incorporate nonlinearities while enabling flexible inference of latent factors. First, to be
accurate, such a model should (i) capture potential nonlinearities in neural population activity and (ii) have a
data-efficient architecture for generalizable data-driven training. Second, to enable flexible inference, such
a model should (iii) be capable of both causal/real-time and non-causal inference simultaneously and (iv)
allow for inference in the presence of missing neural and/or behavioral measurements. If achieved, flexible
inference will allow the same trained model both to operate causally in real-time to enable neurotechnology
and to leverage the entire length of data non-causally for more accurate inference in scientific investigations
when real-time operation is not needed. It will also make the model robust to missing or noisy measure-
ments. Current neural network models of neural population activity are in the form of generative models
[29, 65–69], whether static [69] or dynamic [29, 65–68], or in the form of predictive models or decoders
[70–73]. However, while these models can capture nonlinearity, they do not meet all the flexible inference
properties outlined above. Because the inference for these models is not solvable analytically unlike LDMs,
4
they need to empirically train an inference or recognition network simultaneously with their generative net-
work, usually requiring the entire length of data over a trial. Thus, their inference depends on the specific
network architecture trained and is not flexible to satisfy all the above properties. Indeed, in prior generative
models, including sequential autoencoders [29] (SAEs) or LDMs with nonlinear embeddings [66], inference
is non-causal, and real-time and/or recursive inference is not possible [29, 65, 66, 68], or the inference in the
presence of missing observations is not directly addressed [29, 66, 68]. Zero-imputation techniques were
used to address missing observations with SAEs [74], which is known to yield sub-optimal performance
because the imputation changes the inherent values of missing observations during inference [75–77]. Simi-
larly to these generative models, predictive dynamical models of neural activity that use forward RNNs [64,
70–72] also do not enable the flexible inference properties above. While these models can perform causal
inference, they do not allow for non-causal inference to leverage all data, and they do not address inference
with missing observations as RNNs are structured to take inputs at every time-step.
In the second approach, we use neural networks as artificial task models. As artificial task models, neu-
ral networks are trained to perform the same behavioral tasks as the brain, and thus corroborate hypotheses
about neural population activity via examining the activity of their artificial units [22, 28, 78–83] or to in-
vestigate multi-regional brain interactions [84]. Given their inter-connected and recurrent network structure,
recurrent neural networks (RNNs) are often used in neuroscience studies as a model of the brain network to
investigate brain mechanisms during a continuous task [22, 28, 78–81, 83]. To do this, RNNs are trained to
perform a certain task (thus called task RNNs) without the knowledge of neural activity and the similarity
of their artificial units activity and recorded neural activity is assessed. The trained task RNNs are used to
justify a hypothesis/observation in the recorded neural activity. By applying a manifold learning algorithm
on the 3D naturalistic reach-and-grasp task, we observed that the underlying manifold of neural population
activity was a distorted/nonlinear ring-like manifold structure. We trained RNNs to perform the same task
(independent of observed neural activity), by getting task instructions and replicating behavior output. We
showed that the observed ring-like manifold structure was also observed in the RNN’s artificial units in spite
of RNN training being fully independent of the observed neural activity. Similar to the neural population
activity, we observed that the location on the ring-like manifold contained information regarding forward-
backward movements, while the target information was encoded in a direction orthogonal to the ring-like
manifold. Our results suggested that the ring-like manifold structure was a realistic phenomenon in natural
dynamical systems.
5
Chapter 2
Multiscale dynamical model and multiscale expectation-maximization (EM)
algorithm
In this chapter, we develop multiscale dynamical model for simultaneously modeling combined spiking-field
activity and we obtain its corresponding learning algorithm, multiscale expectation-maximization (EM)
algorithm. As stated in chapter 1, we are interested in building dynamical models of neural population
activity given the significant insight that low-dimensional neural dynamics could provide. While recent
studies of neural population activity have only modeled a single scale of neural activity [13, 25, 47, 57,
85–88], population-level information and behavior is encoded across multiple spatiotemporal brain scales,
from individual spiking neurons to large-scale neural populations whose activity can be measured with field
signals such as LFP or ECoG. For example, field-spike interactions are present in short-term memories,
reaching movements, replay events, and coordinated saccade-reach movements [31, 89–93]. In addition to
the existence of such large-scale interactions, several studies have shown that combined spiking and LFP
activity could improve the behavior prediction accuracy by building representational encoding models (a
direct model from behavior variables to neural activity) with [32–35]. However, studying multiscale neural
population activity and building multiscale dynamical models are yet to be explored. We are thus motivated
to build a dynamical model of multiscale neural population activity, which allows us to simultaneously
characterize spiking and field activity in a single dynamical model and enables extraction of unified latent
factors from the combined spiking and field activity.
This simultaneous modeling is challenging because of the fundamental differences in spiking and field
activity [40–42]. Spikes are binary-valued—where the 1 or 0 binary value at each time-step indicates the
presence or absence of a spike at that time-step, respectively—and occur at a fast millisecond time-scale.
Fields are continuous-valued and have slower time-scales. Existing dimensionality reduction methods are
6
not directly applicable to mixed discrete-continuous spike-field activity. Thus new methods are needed
that can identify a dynamical model and latent factors from simultaneous observations of mixed discrete-
continuous spike-field signals at different time-scales.
Here, we introduce multiscale dynamical model and we obtain the derivations for its learning algorithm
based on the multiscale EM algorithm. We validate the learning algorithm on numerical simulations. This
chapter resulted in a publication with the help of my lab mate, Han-Lin Hsieh, for IEEE Transactions on
Neural Systems and Rehabilitation Engineering (TNSRE) [55].
2.1 Methods
2.1.1 Multiscale dynamical model
We first build the multiscale dynamical model for the combined spiking and field activity (Fig. 2.1a). For
the rest of this chapter, we will be using hidden states to term latent factors of the multiscale dynamic model.
To describe how hidden states evolve in time, we write the state dynamics equation as
x
t
= Ax
t− 1
+ w
t
. (2.1)
Here x
t
∈R
n
x
× 1
is the column vector of hidden states, where n
x
is the hidden state dimension and a hyper-
parameter to be picked. In addition, w
t
∈R
n
x
× 1
is a zero-mean white Gaussian noise with covariance matrix
W∈R
n
x
× n
x
and A∈R
n
x
× n
x
is the state transition matrix. We then write the multiscale neural observation
model using a joint probability distribution
p(y
t
,N
1:n
c
t
|x
t
)= p(y
t
|x
t
)p(N
1:n
c
t
|x
t
), (2.2)
where y
t
∈R
n
y
× 1
is the column vector of field signals at time t and n
y
is the dimension of the field signals.
The total number of neurons are defined as n
c
and N
1:n
c
t
=[N
1
t
,...,N
n
c
t
]
′
with N
c
t
denoting the binary spike
event for neuron c at time t. To get the joint probability distribution in equation (2.2), we have assumed
that spikes and fields are conditionally-independent given the hidden state. Assuming a small time-step (i.e.,
bin-width)∆ that contains at most one spike, the binary spike event N
c
t
is 1 if neuron c has a spike at the tth
time-step and is 0 otherwise [43, 49–51, 53, 94, 95]. The field signal can for example be taken as the LFP
7
and ECoG log-power features in various frequency bands as these features are informative about behavior
[32, 35, 45, 46, 96, 97]. We build the multiscale observation model with a joint probability distribution of
spike and field observations as in equation (2.2). We then characterize different time-scales and statistical
profiles of these modalities by modeling them with different likelihood functions. Motivated by single-scale
works on field signals [35, 45–47, 96, 98], we describe the field signals with a linear Gaussian model
y
t
= Cx
t
+ r
t
, (2.3)
in which C∈R
n
y
× n
x
is the observation matrix and r
t
∈R
n
y
× 1
is white Gaussian noise with a covariance
matrix R∈R
n
y
× n
y
. Thus p(y
t
|x
t
) in equation (2.2) is a Gaussian density with mean Cx
t
and covariance R.
To write p(N
1:n
c
t
|x
t
) in equation (2.2), we use a point process likelihood function to represent the 0-
1 time-series of spike events for a given neuron, where the 0 and 1 at a given time-step represents the
absence or presence of a spike at that time-step, respectively [43, 49–51, 53, 94, 95]. Assuming neurons
are conditionally-independent conditioned on the hidden states [43, 49–51], the point process likelihood
function for spikes is given by
p(N
1:n
c
t
|x
t
)=
n
c
∏
c=1
(λ
c
(x
t
)∆)
N
c
t
exp(− λ
c
(x
t
)∆), (2.4)
where λ
c
(x
t
)= exp(β
c
+α
′
c
x
t
) is the firing rate of neuron c, · ′
is the transpose operator, and [β
c
;α
c
]∈
R
(n
c
+1)× 1
are model parameters to be identified. Together, equations (2.1) and (2.2) form the multiscale
dynamical model, which needs to be identified from spike-field activity in the presence of hidden states.
Given the proposed multiscale dynamical model, We solve the challenges in multiscale modeling:
different statistical profiles and time-scales. First, we address the challenge of modeling mixed discrete-
continuous spike-field activity using proper statistical probability distributions in equations (2.3) and (2.4).
Second, to account for time-scale differences of spikes and fields, we model the field features as miss-
ing observations in the intermediate time-steps at which spikes are observed but new field features are not
computed. This is achieved by setting the observation noise covariance to R→∞ at these intermediate
time-steps.
2.1
8
...
... x
1
N
1
x
2
N
2
x
k
N
k
,Y
k
Hidden
states
Spike-field
activity
Time (t)
z
1
z
2
z
k
Motor
behavior
...
x
k+1
N
k+1
z
k+1
...
...
...
x
2k
N
2k
,Y
2k
z
2k
...
...
...
a
Combined spike-field activity
t = k t = 2k
b
Initial guess for
all parameters
Multiscale
decoder
Fixed interval
smoother
Compute
expectations
Update
parameters
(optimization)
Convergence
condition
Expectation step
Maximization step
Not satisfied
Satisfied
Compute
projection
matrix
Decode motor
behavior
Spike-field
activity
1. Multiscale EM algorithm (unsupervised)
2. Projection identification (supervised)
Motor
behavior
3. Decoding
Figure 2.1: Multiscale dynamical model and its corresponding learning pipeline including multiscale EM
algorithm. (a) Multiscale dynamical model follows a hidden Markov model, where the hidden states are the
Markov states and the multiscale observations are emissions of the Markov model. We assume the hidden
states also give rise to behavior variables. (b) The learning pipleline corresponding to the multiscale dy-
namical model and downstream behavior decoding. We first learn the multiscale dynamical model with the
unsupervised multiscale EM algorithm. Multiscale EM algorithm is an iterative method including expecta-
tion and maximization step at each iteration. We then learn a downstream behavior regression model (e.g.,
linear project matrix) from hidden states to observed behavior variables in the train set. In the test set, we
can use the learned model parameters to decode hidden states and ultimately decode behavior variables.
2.1.2 Multiscale EM algorithm
To identify the parameters in equations (2.1) and (2.2), we derive a novel multiscale EM algorithm (Fig.
2.1b). EM algorithm is an iterative approach to maximize data likelihood in the presence of the latent
factors [99]. Each iteration of the EM algorithm consists of an expectation and a maximization step. The
expectation step computes the posterior distribution of the latent factors from the observed data and the
9
current parameters, and the maximization step finds the optimal parameters that maximizes the expected
data likelihood given the posterior distribution learned in the expectation step.
The goal of the EM algorithm in our context is to find the maximum-likelihood estimate of all parame-
ters in the dynamical model,Θ={A,W,C,R,v
1
andΨ
1
,β
c
,α
c
for c= 1 : n
c
}, i.e., parameter values that
maximize the likelihood of the observed spike-field activity. Here v
1
∈R
n
x
× 1
,Ψ
1
∈R
n
x
× n
x
are the initial
estimate of the hidden state and its covariance. We denote the time-series of hidden states and observations
by{X}={x
t
| t= 1 : T} and{H}={y
t
,N
c
t
| t= 1 : T, c= 1 : n
c
}, respectively, where T is the total record-
ing time. We take the time-step as the bin-width of the spikes ∆. We assume that field observations are
made every k time-steps (i.e., at t∈T
f
={mk≤ T| m∈N}) given their slower time-scale (N is the set of
positive integers). We first write the complete log-likelihood function for the hidden states and spike-field
observations in our multiscale dynamical model as
L= log p({X},{H};Θ)
= log p({X};Θ)+ log p({H}|{X};Θ)
=− 1
2
(x
1
− v
1
)
′
Ψ
− 1
1
(x
1
− v
1
)− 1
2
log|Ψ
1
|− T
∑
t=2
1
2
(x
t
− Ax
t− 1
)
′
W
− 1
(x
t
− Ax
t− 1
)
− T− 1
2
log|W|
− ∑
t∈T
f
1
2
(y
t
− Cx
t
)
′
R
− 1
(y
t
− Cx
t
− T
2k
log|R|+ constant
+
T
∑
t=1
C
∑
c=1
N
c
t
(log∆+β
c
+α
c
′
x
t
)− ∆(e
β
c
+α
c
′
x
t
)
, (2.5)
where|·| denotes matrix determinant and constant accounts for normalization terms of Gaussian distribu-
tions and is not a function of parameters. We can now derive the expectation and maximization steps.
2.1.2.1 Expectation step
At iteration i, the expectation step computes the expected value of the complete log-likelihood in equation
(2.5) given all spike-field observations and using the parameter estimates from iteration i− 1 as follows
S=E[L|{H};
ˆ
Θ
(i− 1)
].
=
− 1
2
tr
Ψ
− 1
1
(P
1
− v
1
ˆ x
′
1
− ˆ x
1
v
′
1
+ v
1
v
′
1
)
− 1
2
log|Ψ
1
|
− 1
2
tr
W
− 1
T
∑
t=2
[P
t
− AP
′
t,t− 1
− P
t,t− 1
A
′
+ AP
t− 1
A
′
]
− T− 1
2
log|W|
10
− 1
2
tr
R
− 1
∑
t∈T
f
[y
t
y
′
t
− Cˆ x
t
y
′
t
− y
t
ˆ x
′
t
C
′
+ CP
t
C
′
]
− T
2k
log|R|+ constant
+
T
∑
t=1
C
∑
c=1
N
c
t
(log∆+β
c
+α
c
′
ˆ x
t
)− ∆(e
β
c
+α
c
′
ˆ x
t
+
1
2
α
c
′
ˆ
Λ
t
α
c
)
, (2.6)
where
ˆ x
t
= E[x
t
|{H};
ˆ
Θ
(i− 1)
],
ˆ
Λ
t
= E[(x
t
− ˆ x
t
)(x
t
− ˆ x
t
)
′
|{H};
ˆ
Θ
(i− 1)
], (2.7)
P
t
= E[x
t
x
′
t
|{H};
ˆ
Θ
(i− 1)
], P
t,t− 1
= E[x
t
x
′
t− 1
|{H};
ˆ
Θ
(i− 1)
]. (2.8)
The expectation step needs to compute the above expectation terms, which will be used in the maximization
step to solve a multivariate high-dimensional optimization problem that maximizes S with respect to model
parameters. Note that the expectations above are computed at each iteration.
To find these expectations, we design a multiscale Bayesian smoother that consists of a multiscale filter
[33] and the corresponding fixed-interval smoother algorithm for spike-field activity. Unlike the Kalman
filter, which applies to continuous-valued linear observations, the multiscale filter can estimate the state from
a mixture of binary-valued nonlinear spike observations and continuous-valued linear field observations
[33]. At any given time t, the multiscale filter finds the minimum mean-square error (MMSE) estimate of
the hidden state given spike-field observations up to that time. The MMSE estimate is given by the mean
of the posterior density p(x
t
|y
1:t
,N
1:n
c
1:t
), denoted by x
t|t
. We denote the covariance of the posterior density
by Λ
t|t
and the prediction density mean and covariance by x
t|t− 1
and Λ
t|t− 1
, respectively. The multiscale
decoder recursions are derived using a Laplace Gaussian approximation to the posterior as in our prior work
on representational models [33] and the full derivations can be found in our manuscript [55].
2.1.2.2 Maximization step
The maximization step in each iteration needs to solve for the parameter values that maximize the expected
log-likelihood S in equation (2.6), which is evaluated from equations (2.7)–(2.8) in the expectation step.
The maximization step is an optimization process where we find the optimal parameters that maximize
11
S by setting its partial derivative with respect to all parameters to 0 (
∂S
∂Θ
= 0). Optimal parameters for
A,W,C,R,v
1
andΨ
1
have closed-form solutions [44] as
A=(
T
∑
t=2
P
t,t− 1
)(
T
∑
t=2
P
t− 1
)
− 1
, W=
1
T− 1
T
∑
t=2
[P
t
− AP
′
t,t− 1
− P
t,t− 1
A
′
+ AP
t− 1
A
′
],
v
1
= ˆ x
1
, Ψ
1
= P
1
− ˆ x
1
ˆ x
′
1
,
C=
∑
t∈T
f
[y
t
ˆ x
′
t
]
∑
t∈T
f
P
t
− 1
, R=
k
T
∑
t∈T
f
[y
t
y
′
t
− Cˆ x
t
y
′
t
− y
t
ˆ x
′
t
C
′
+ CP
t
C
′
]. (2.9)
Unlike the above, optimal parameters for spike model need to be solved using numerical methods. Defin-
ingγ
c
=[β
c
,α
′
c
]
′
, for neuron c we need to maximize the cost function F= E[log p({N
1:C
1:t
}|{X};Θ)|{H};
ˆ
Θ
(i− 1)
].
Computing F and∇F with respect toγ
c
at each iteration, we can find the numerical solution of
∂S
∂γ
c
= 0 for
each c by using trust-region methods [100]. Note that optimal parameters are computed at each iteration.
In the EM, we initializeΘ randomly. We set the convergence condition to terminate the EM algorithm as
{(DL
(i)
− DL
(i− 1)
)/DL
(i− 1)
< 10
− 5
}, where DL
(i)
= log p({H}|{X};
ˆ
Θ
(i)
) is the data log-likelihood at the
iteration i.
2.2 Results
2.2.1 Validation on numerical simulations
We first validated multiscale EM algorithm on numerical simulations. To this end, we performed extensive
Monte Carlo simulations where we generated 50 randomly simulated sessions. To incorporate some behav-
ioral task, we simulated a grid reaching task with random targets at each iteration. The details of simulated
sessions can be found in our manuscript [55] and are removed from here for compactness of exposition. We
use various metrics to evaluate the multiscale EM algorithm: model goodness-of-fit and decoding.
First, we evaluate how well the learned multiscale model can describe the spike-field activity in the test
set within cross-validation, i.e., we evaluate the model goodness-of-fit. Due to similarity transforms, there
are infinite state-space choices that can explain the observations properly. Thus, to validate the EM algo-
rithm, we look at the measures that remain unchanged by different similarity transforms. As our goodness-
of-fit measure for field features, we use the one-step-ahead prediction error [101], which is the error in
12
predicting the value of the field features in the next time-step given all spike-field observations up to the cur-
rent time-step. We take the normalized root mean-square error (NRMSE) as our one-step-ahead prediction
error measure. NRMSE for the jth field feature is computed by dividing its root mean-square error by its
standard deviation
NRMSE( j)=
r
∑
t∈T
f
(y
j
t
− ˆ y
j
t|t− 1
)
2
q
var({y
j
t
| t∈T
f
})
, (2.10)
where ˆ y
t|t− 1
is the one-step-ahead prediction of y
t
. For each feature, we compute the relative NRMSE to
assess goodness-of-fit, which is defined as
NRMSE
rel
=(NRMSE
est
− NRMSE
true
)/NRMSE
true
. (2.11)
Here NRMSE
est
is the NRMSE using the model estimated with EM and NRMSE
true
is that for the true
model.
As our second evaluation of the algorithm, we assess how well the kinematic states can be decoded from
the estimated hidden states. We build the decoder based on ordinary linear least squares (OLS) and then
evaluate decoding accuracy using the NRMSE
kin
between the true and decoded trajectory as
NRMSE
kin
(i)=
s
T
∑
t=1
(z
i
t
− ˆ z
i
t
)
2
q
var({z
i
1:T
})
, (2.12)
where i denotes the ith component of the kinematic state z
t
and ˆ z
t
is the decoded kinematics [55].
2.2.1.1 Multiscale EM accurately identifies the multiscale dynamical model for spike-field activity
We first assess the goodness-of-fit of the identified multiscale dynamical model in describing spike-field
activity by using the one-step-ahead prediction error for fields and KS plot for spikes. Fig. 2.2a shows that
the one-step-ahead prediction error for a sample field feature using the identified model converges to that for
the true model as a function of the multiscale EM iterations. Across all simulated sessions and field features,
NRMSE
rel
in equation (2.11) decreased from 44.59%± 23.39% to 0.75%± 0.57% (mean± s.d.), showing
that the identified multiscale dynamical model described the field dynamics with less than 1% relative differ-
ence compared to the true model on average (Fig. 2.2b). Fig. 2.2c shows the KS plot for the spiking activity
13
c a
10 20
Iteration
0.6
0.7
0.8
0.9
1
1.1
One-step-ahead
prediction error
(NRMSE)
Learned model
True model
0 0.5 1
Model CDF
0
0.2
0.4
0.6
0.8
1
Empirical CDF
95% confidence bound
Initialized model
Learned model
within conf.
bound
(97%)
out of conf.
bound
(3%)
d
0 0.01 0.02 0.03
Relative NRMSE of all learned models
b
KS plot of all learned models
Ideal KS plot
Goodness of fit: field activity Goodness of fit: spiking activity
Figure 2.2: Multiscale EM learns the model parameters correctly for spiking and field activity. (a) An exam-
ple simulated field channel’s one-step-ahead prediction error for the true and learned model with multiscale
EM algorithm. One-step-ahead prediction error was quantified by root-mean-squared error normalized by
variance of ground-truth signal, termed NRMSE. (b) The box plot for the relative NRMSE of the learned
model vs the true model is shown. The one-step-ahead prediction error of the learned converges to that of
the true model for all channels and simulated sessions. We observed that the relative NRMSE of the learned
model vs the true model converges to below 1% difference. (c) KS plot for the initialized and learned mod-
els for an example spiking channel are shown. The KS statistics of the learned model is inside the 95%
confidence bounds of the ideal KS plot, unlike the initialized model. The ideal KS plot follows the identity
line. (d) Across all simulated sessions and spiking channels, 97% of all spiking channels had KS plot inside
the confidence bound.
of a sample neuron for the identified and initial models (the model EM starts with). Unlike the initial model,
the KS plot for the identified model is inside the confidence bounds. Across all sessions, 97 .47%± 3.34%
of neurons had their KS plot within the 95% confidence bounds once the multiscale dynamical model was
identified (Fig. 2.2d). This indicates that the identified multiscale dynamical model described the spike
dynamics almost as well as the true model. Together, these results show that the multiscale EM accurately
identifies the multiscale dynamical model from spike-field data.
14
2 4 6 8
Time (s)
-10
-5
0
5
10
P
X
2 4 6 8
Time (s)
-20
-10
0
10
20
V
X
10 20 30 40 50
Iteration
0.2
0.4
0.6
0.8
1
Average NRMSE
kin
a b
True
Decoded
True
Decoded
c
d f e
Field
Spike
Spike-field
0.1
0.2
0.3
0.4
NRMSE
kin
(position)
0.2
0.3
0.4
0.5
NRMSE
kin
(velocity)
Field
Spike
Spike-field
Figure 2.3: Multiscale dynamical model allows decoding of external behavior variables and improves be-
havior decoding accuracy compared to single-scale dynamical models. Example true and decoded behavior
trajectories are shown for (a) position of direction x vs. time, (b) velocity of direction x vs. time, (c) posi-
tion on the grid. (d) To show that multiscale EM improves the information in the hidden state throughout
learning, we demonstrated that it improves behavior decoding accuracy as the iterations of learning go by.
(e) Multiscale dynamical model and its learning algorithm improve behavior decoding accuracy for position
variables compared to single-scale dynamical models. This demonstrates that multiscale dynamical model
fuses information across the two observation modalities. (f) Similar to (e), for velocity variables.
2.2.1.2 Multiscale EM enables accurate decoding of behavior
As another validation of multiscale EM, we assessed how well the identified multiscale dynamical model
predicted behavior. Fig. 2.3a-c show the decoded and true trajectories in 4 test trials of a sample simulated
session. As the decoding accuracy, we measured the average cross-validated NRMSE
kin
in equation (2.12)
across the 4 kinematic components (position and velocity in the two coordinates) in the test trials. The
decoding accuracy improved as a function of EM iterations (Fig. 2.3d): across all Monte Carlo simulated
sessions, average NRMSE
kin
decreased from 0.98± 0.01 for the initial model to 0.28± 0.06 for the identi-
fied model. This result again shows that multiscale EM improves the multiscale dynamical model at each
15
iteration until convergence. Note that EM is only run on training trials while decoding is only performed on
test trials.
2.2.1.3 Multiscale EM identifies a hidden state that carries information from both scales of activity
We next examined whether multiscale EM indeed identifies a hidden state that is multiscale, i.e., carries
information about both spike and field dynamics. To do so, we also identified two single-scale dynamical
models, one for spikes alone and one for fields alone. To identify the single-scale models, we used special
cases of the multiscale EM with terms included only for either spikes or fields. We also re-identified the
projection matrix to behavior for single-scale hidden states separately. We then compared the decoding
accuracy with the hidden states identified from spike-field activity using the multiscale EM vs. the hidden
states identified from spikes or fields alone.
Fig. 2.3e-f show the average NRMSE
kin
for position and velocity over all Monte Carlo simulated ses-
sions using spike-field activity, fields alone and spikes alone. The average NRMSE
kin
for spike-field activity
was significantly lower than both spikes alone and fields alone. Compared with hidden states from spike-
field activity, the average NRMSE
kin
for the states from field activity was 39% higher (increase from 0 .28
to 0.39, P< 10
− 57
, one-sided paired t-test). Also, compared with hidden states from spike-field activity, the
average NRMSE
kin
for the hidden states from spike activity was 23% higher (increase from 0.28 to 0.35,
P < 10
− 42
, one-sided paired t-test). These results indicate that the hidden state identified by the multiscale
EM contains information from both spikes and fields, i.e., that it is indeed multiscale.
2.2.2 Validation on the 3D naturalistic reach-and-grasp task
We also validated the multiscale EM algorithm on combined spiking-LFP recordings during a 3D natural-
istic reach-and-grasp task (more details on the task can be found in A.2). We showed that multiscale EM
algorithm improves behavior decoding accuracy by fusing information from various scales of brain activity.
2.2.2.1 3D naturalistic reach-and-grasp task
In this task, two subjects (Monkey J and Monkey C) performed naturalistic reach and grasp movements in
a 50× 50× 50cm
3
workspace and continuously in time [56, 102]. Monkeys naturalistically reached for an
object positioned on a wand, grasped the object, then released the object and returned their hand to a resting
16
position. An experimenter continuously moved the wand to random locations spanning a large spatial area
in front of the monkey. Also, critically, the task allowed each monkey to choose how to reach-and-grasp
the object, lacked instructions needed to isolate reach-and-grasp movement components and, in fact, lacked
overt movement instructions in general. There was no go cue, no time-limit was enforced instructing how
fast to reach, grasp or return, and no targets appeared suddenly as the wand was always visible and generally
moving. We used motion capture technology using retroreflective markers to track 27 (Monkey J) or 25
(Monkey C) joint angles on the right shoulder, elbow, wrist, and fingers at each point in time. Marker
trajectories were tracked using infrared and near-infrared cameras. Arm and finger joint angles were then
obtained from the marker trajectories by solving for the inverse kinematics using an anatomically-correct
non-human primate (NHP) musculoskeletal model. Consistent with the expression of a naturalistic behavior,
each animal could develop their own behavioral strategy for performing the task.
Spiking and LFP activity were simultaneously recorded from motor cortical areas across multiple days
from two Rhesus macaque monkeys (Monkey J and Monkey C) and analyzed recordings in the hemisphere
contralateral to the arm and hand movement. Recordings covered primary motor cortex (M1), dorsal pre-
motor cortex (PMd), ventral premotor cortex (PMv) and prefrontal cortex (PFC) for Monkey J and PMd
and PMv for Monkey C. For the analyses in the following sections in this chapter, we show the results for
Monkey J.
0.2
1
1.8
Actual motion
Decoded w/ 5 spikes
Decoded w/ 5 spikes+ 25 LFPs
0 5 10
Time(s)
-0.5
-0.1
0.3
Wrist flexion
angle
(rad)
0 5 10
Time(s)
Shoulder elevation
angle
(rad)
Figure 2.4: Multiscale EM algorithm improves decoded behavior trajectories. Example actual and decoded
trajectories of shoulder elevation and wrist flexion using 5 spiking channels and 5 spiking + 25 LFP channels.
2.2.2.2 Multiscale EM algorithm improved behavior decoding accuracy in the naturalistic 3D reach-
and-grasp task
For tractability of analyses, we first sort the spike and LFP channels based on their individual decoding
performance (mean of C.C. for 7 joints). We randomly choose 5 channels from the pool of best 40 spike
17
channels. We then randomly select LFP channels from the pool of 40 LFP channels one by one, and add
them one by one to the set of selected spike channels. The LFP pool is chosen such that individual LFP
channels in the pool have similar decoding performance compared with the individual spike channels in the
spike pool. This selection allows us to evaluate addition of information from one scale to the other without
having one scale dominate the decoding results. This procedure is repeated 20 times to get the statistical
average estimate of the decoding performance.
5 LFPs
5 LFPs + 15 spikes
5 LFPs + 25 spikes
Shoulder
elevation
Elevation
angle
Shoulder
rotation
Elbow
flexion
Pro
supination
Wrist
flexion
Mean over
all joints
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
5 spikes
5 spikes+ 15 LFPs
5 spikes+ 25 LFPs
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Behavior decoding accuracy
(CC)
Wrist
deviation
Shoulder
elevation
Elevation
angle
Shoulder
rotation
Elbow
flexion
Pro
supination
Wrist
flexion
Mean over
all joints
Wrist
deviation
Behavior decoding accuracy
(CC)
Adding LFPs to spikes Adding LFPs to spikes
a
b
Figure 2.5: Multiscale EM improves behavior decoding accuracies across all joint angles on the arm,
throughout the process of adding spikes to LFPs and LFPs to spikes. (a) Behavior decoding accuracy
of the arm joint angles using 5 spikes, 5 spikes + 15 LFPs and 5 spikes + 25 LFPs. (b) Behavior decoding
accuracy of the arm joint angles using 5 LFPs, 5 LFPs + 15 spikes and 5 LFPs + 25 spikes.
2.3 Discussion
Here, we developed the multiscale dynamical model which simultaneously characterizes discrete spiking
and continuous field potentials activity. The state in the multiscale dynamical model is hidden/latent and
can be interpreted as a low-dimensional representation of the combined spiking and field activity. To learn
the multiscale dynamical model parameters, we developed the multiscale EM algorithm, which is an iterative
18
approach to maximize the data likelihood given that the state is hidden. We can interpret the multiscale EM
algorithm as a variational inference framework [103] to learn the model parameters. The major distinction
is the fact that we can analytically obtain the dynamical decoder (inference network) in our framework,
therefore, the familiar Kullback–Leibler (KL) divergence term in the variational inference learning [104]
does not appear in our derivations because it is equivalent to zero [105]. However, it is worth noting that
since the multiscale decoder is based on Laplace approximation [33], it is not an exact posterior estimation
and could lead to suboptimal results at occasions. The decoder used in the multiscale EM algorithm is based
on the multiscale decoder [33], a recursive Bayesian decoder similar to the Kalman filter.
After learning the model parameters, the multiscale decoder can serve as a dynamic dimensionality
reduction tool which summarizes high-dimensional combined spiking and field activity into the hidden
state. This is unlike prior methods of dimensionality reduction in neuroscience that have largely focused
on single-scale spike or field activity [13]. For example, to extract low-dimensional neural states from
continuous data, prior studies have used static approaches such as principal component analysis (PCA) [23]
and factor analysis (FA) [20], which disregard the continuous nature of the neural observations. Dynamic
dimensionality reduction approaches such as Gaussian process factor analysis (GPFA) [24] or linear and
point process state-space models have also been developed for single-scale continuous [44, 47, 57, 60] or
single-scale binary data [25, 43, 87]. However, these dynamical techniques are again not directly applicable
to combined binary-continuous spiking and field activity with different time-scales.
19
Chapter 3
Studying multiscale low-dimensional dynamics during naturalistic
reach-and-grasp movements
In this chapter, we investigate the low-dimensional dynamics across small-scale and large-scale brain net-
works during naturalistic reach-and-grasp movements. As stated in chapter 1, studies of low-dimensional
motor cortical population dynamics have largely examined a single scale of population activity [17–21, 23–
26, 28, 29, 106], mostly the spiking network activity. Such works have revealed significant insight about
the low-dimensional neural dynamics during certain motor tasks, for example, prominent rotatory dynam-
ics during motor movements [18, 19, 21, 23, 26, 28, 29, 106]. Additionally, prior models of single-scale
low-dimensional dynamics in single-scale activity mainly investigate them within behavioral paradigms that
instruct isolated trial-based reach movements [19–22, 24, 25, 28, 29], or in a few cases isolated trial-based
reach-to-grasp movements typically to fixed locations and within relatively limited spatial coordinates [23,
26, 106]. However, in our natural behaviors, reach and grasp movements evolve continuously in time rather
than in an isolated manner and are coordinated. The low-dimensional neural population dynamics of con-
tinuous reach-and-grasp movements in naturalistic experimental setups are largely under-explored even for
spiking activity. Further, neural control of arm and hand movements may reflect the integrated action of brain
networks [40, 107, 108]; as a result, this neural control is distributed across multiple spatial and temporal
scales of neural activity, spanning not only neuronal spiking activity but also larger-scale neural popula-
tion activity reflected in local LFPs [17, 18, 31–36, 38, 108–110]. Therefore, we are motivated to study
the low-dimensional dynamics of spiking and LFP population activity during naturalistic reach-and-grasp
movements for the first time.
It is important to highlight that the low-dimensional dynamics in spiking and LFP activity are not
trivially similar or dissimilar. First of all, spiking and LFP activity reflect different biological processes
20
in the brain and they have different statistical profiles and time scales. Spiking activity measures binary
(present or non-present) action potential events that ultimately drive muscle contraction, while LFP activity
is continuously-varying and measures sub-threshold synaptic and dendritic activity shaped from cells’ ac-
tion potentials [40–42]. Second, while LFPs are known to encode motor actions, it is not yet clear whether
they encode similar or different aspects of movements. In fact, single-scale LFP activity has been directly
related to movement parameters such as position, velocity or direction [32, 34, 38, 39, 111]. Studies look-
ing at spiking and LFP activity together have generally focused on quantifying the amount of task-related
information at each measurement scale using decoding analyses [17, 31–36, 38, 39, 109, 110]. Some of
these studies found similar and comparable amount of task-related information in spiking and LFP activity
[34, 35, 110], while other studies suggested that each scale may reflect non-redundant aspects of an ongoing
behavior [31, 36, 38, 39, 109]. Not only these studies did not investigate the low-dimensional dynamics,
they reached to different conclusions about whether behavior is similarly or differentially encoded in spiking
and LFP activity, which further makes it unclear how spiking and LFP dynamics would relate to each other
or to reach-and-grasp behavior. Specifically, it is not clear whether low-dimensional dynamics of spiking
and LFP network activity have similar or different temporal characteristics, i.e. whether they are shared
or distinct. Further, it is not known how low-dimensional dynamics across the different scales similarly or
differentially predict reach-and-grasp behavior and encode movement parameters.
Here, we propose that multiscale, low-dimensional motor cortical state dynamics reflect the neural con-
trol of naturalistic reach-and-grasp behavior due to the integrated action of large-scale brain networks. We
study the single-scale and multiscale low-dimensional dynamics from single-scale and multiscale EM algo-
rithms and we partition these dynamics into different modes. Each mode captures a unique dynamical char-
acteristic of network activity and is specified by a pair of time-decay and frequency in the temporal neural
response to excitations. By analyzing the modes of the network dynamics, we can assess the dynamical pat-
terns present in each scale of brain activity. We find that spiking and LFP network activity exhibited several
distinct principal modes in their dynamics among which one mode was dominant in predicting behavior in
each case. Notably, this predictive mode was shared across scales with a similar dynamical decay-frequency
characteristic, was again learned as a unified mode for spike-LFP activity in combination, was shared across
experimental sessions and monkeys, yet did not simply replicate the modes in behavior trajectories. This
chapter resulted in a publication in Nature Communications [56].
21
3.1 Methods
For the analyses in this chapter, we used the same machine learning framework we introduced in Chapter
2 to learn the dynamical models, single-scale and multiscale EM algorithms for single-scale and multiscale
network activity, respectively (Fig. 3.1a). Since we were interested in the dynamical characteristics of the
population dynamics, we also introduced a modal analysis framework for the learned dynamical models. The
dynamical models had a latent state-space formulation, where the hidden states summarize the dynamics of
neural population activity. In fact, in the latent state space model, it is known that the dynamical character-
istics are determined by the state transition matrix A and are reflected in the hidden states which ultimately
form the population dynamics (Fig. 3.1b). We refer to each real eigenvalue or each pair of complex conju-
gate eigenvalues of A as a mode. Each mode corresponds to a unique decay-frequency pair present within
neural dynamics; each pair corresponds to one component of the neural response to excitations and indicates
how fast this response decays in time and with what frequency it rings over time [112]. We thus study the
modes of the learned dynamical models to study the dynamical characteristics of population activity.
To get the robust modes of the neural population activity, we used the fact that a principal mode of net-
work dynamics should have consistent decay-frequency characteristics regardless of latent state dimension.
Investigating the consistency of learned modes is inspired by prior studies where excitation frequencies of
physical structures are of interest [113]. We therefore compared the learned modes when increasing the
dimension of the latent state from 1 to 25. We found the validated principal modes of network dynamics as
those modes that were consistently learned at all dimensions greater than some value (Fig. 3.1c). We plot
the modes within a 3D eigenvalue-dimension diagram whose x-y plane represents the real and imaginary
components of the modes and whose z-axis represents the state dimension (Fig. 3.1c). For modes with
complex conjugate eigenvalue pairs, we only show the eigenvalue with the positive imaginary component
for simplicity of illustration. The distance between any two modes is defined as the norm of the difference in
their corresponding eigenvalues, which is the Euclidean distance in the x-y plane of eigenvalue-dimension
diagram. Since a principal mode should have consistent decay-frequency characteristics regardless of di-
mension, the mode should appear at a similar location on the x-y plane for any dimension, thus forming a
vertical cluster. We thus use K-means clustering to group the vertical clusters that persist through dimen-
sions based on their location on the x-y plane (Fig. 3.1c). To do this, we project all modes on the x-y plane
and we find the clusters of modes. A vertical cluster is defined as one for which the distance of the members
22
Matrix
A
Mode 1
Mode 2
Mode 3
Mode 2
predicted behavior
Mode 2
predicted spike-LFP
activity
Extract modes of matrix A
Real
Imaginary
Dimension
Cluster
modes
Spike-LFP
activity
(training)
.
.
.
Dynamical
model
EM algorithm
Dynamical
model
a
c
d
Eigenvalue-dimension
diagram
b
Dynamical
model
Dynamical
model
Real
Imaginary
Unit
circle
Extract
eigenvalues
Non-clustered eigenvalue-dimension
diagram
Learn
dynamical
model
Spike-LFP
activity
(training)
Extract
modes
Dynamical
model
.
.
.
. . .
Predicted
behavior
Predicted
spike-LFP
activity
. . .
Dimension 1
Dimension 2
Dimension max
Learn
dynamical model
Predict
spike-LFP
activity
Predict
behavior
True behavior
Mode 2
behavior prediction
accuracy
Mode 1
Mode 2
Mode 3
Separate contribution
of each mode
. . .
True spike-LFP activity
Mode 2
spike-LFP
prediction
accuracy
Real
Imaginary
Dimension
CC
CC
PP
Spike-LFP
activity
(test)
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
* *
Figure 3.1: Modal analysis framework for neural population activity. (a) We first learn the single-scale or
multiscale dynamical model from the neural activity in the training set. (b) After learning the dynamical
model, we find the modes represented as the eigenvalues of the state transition matrix. (c) To get the principal
modes, for the same neural activity, we learn dynamical models of various latent state dimensions. For each
dimension, we indicate the location of the modes corresponding to their real and imaginary values on a plane
parallel to the x-y plane and intersecting the z axis at that dimension. We finally cluster the modes using
K-means clustering to find the vertical clusters, each corresponding to a different location on the x-y plane
and thus different decay-frequency characteristics. (d) In the test set, we use the learned dynamical model
to estimate the modes and states. We then use the estimated state to predict behavior and predict neural
activity one-step-ahead into the future, and we separate the contribution of each mode in these predictions.
By comparing the contribution of each mode with true behavior and neural activity, we get each modes’
behavior prediction accuracy and one-step-ahead prediction accuracy of neural activity [56].
23
from the cluster centroid, on the 2D x-y plane, is significantly smaller than half of chance-level distance.
With this criteria, members of a cluster are within a circle whose diameter is smaller than chance-level. We
compute the chance-level as the mean distance of two eigenvalues placed randomly and uniformly on the
eigenvalue planes in the range of the observed modes in our data [56].
For each principal mode, in addition to the decay-frequency pair, we quantify how well the mode predicts
behavior and neural activity (see the methods and supplementary material in [56]). Similar to prior work,
we model the behavior, defined as joint angle trajectories or 3D end-point hand kinematics (positions and
velocities in 3D physical space), as linear projections of the latent state [47, 57, 114]. After learning the
dynamical model and estimating the latent states based on neural data alone in the training set, we fit the
projection matrix in the training set to relate the already-estimated modes and states to behavior. In the test
set, we first estimate the latent state using the learned dynamical model with a multiscale filter. We then
use the fitted projection matrix to project the estimated latent states and predict the joint angles (Fig. 3.1d).
Prediction accuracy is measured by the Pearson’s correlation coefficient between the actual and decoded
trajectories in the test set. Finally, we find the prediction accuracy of each mode separately using a linear
transformation of the latent states (Fig. 3.1d). We also assess the contribution of each mode to predicting
the neural activity (Fig. 3.1d).
3.2 Results
3.2.1 Spiking and LFP network activity exhibited several principal modes
We first asked whether there existed clusters of principal modes in the spiking and LFP network activity,
and if so what dynamical characteristics they had (Fig. 3.2). It has to be noted that it was not necessarily
clear that there existed strong principal modes in the neural population activity. Fig. 3.2a-c shows the
eigenvalue-dimension diagram and its top view in both the eigenvalue plane and decay-frequency domain
(for interpretability) for spiking and LFP activity in one experimental session. Interestingly, We found that
clear vertical clusters of principal modes were formed across dimensions for both spiking and LFP activity,
suggesting the existence of consistent dynamical characteristics that persist regardless of state dimension.
This observation was consistent in all experimental sessions [56]. To quantify this, we showed that the
member distance from centroid in each principal mode cluster was small (Fig. 3.2d).
24
We then quantified how predictive of behavior each principal mode cluster was by predicting the joint
angles of the arm during movement. In each individual session and in both spiking and LFP network activity,
there was a principal mode cluster with a dominant prediction accuracy, which was significantly higher than
the prediction accuracy of every other cluster (Fig. 3.2e-g). We term this dominant mode the predictive
mode. In our work, we also observed that this predictive mode was dominant in predicting other behavior
variables (such as arm kinematics in the 3D space) and was also dominant in describing neural dynamics
across sessions [56].
3.2.2 The predictive mode was multiscale
Although distinct modes were present in spiking and LFP network activity consistent with different biolog-
ical processes (Fig. 3.2), the predictive mode was shared between spiking and LFP activity (Fig. 3.3). This
was the case even though we analyzed spiking and LFP activity were recorded from non-overlapping chan-
nels, had different time-scales and statistical distributions in the dynamical models, and had their models
learned with different algorithms (see [56] for extensive discussion on this). Two sets of analyses supported
this conclusion. First, we learned a multiscale dynamical model for the combined spike-LFP network ac-
tivity and found that a single principal mode cluster was again learned at a very similar location (Fig. 3.3b;
blue dots were at similar locations as yellow and brown dots); further, this mode was again dominant in
predicting behavior, i.e., was also the predictive mode for combined spike-LFP activity (Fig. 3.3a). Second,
we found that the distance between the predictive mode in the spiking network activity and that in the LFP
network activity was small (Fig. 3.3b; yellow and brown dots were at similar locations).
3.2.3 Illustration of multiscale predictive mode and its dynamical characteristics
We show sample trajectories of the two latent states associated with just the predictive mode in Fig. 3.4a
along with reach and return periods. To have a better understanding of the dynamical characteristics of the
multiscale predictive mode, we show its vector field along with a sample trajectory on the vector field from
an initial condition (Fig. 3.4b). To do this, We construct a 2-dimensional state-space equation x
t+1
= Ax
t
whose A is fully specified by the complex-conjugate eigenvalues corresponding to the multiscale predictive
mode. We then plot the vector field for this state-space model [115] and simulate a sample state trajectory
for a fixed time interval (5 s) in response to an impulse excitation (initial condition). Multiscale predictive
25
g
Monkey J Monkey C
Spike LFP Spike LFP
e f
Spike LFP
Joints prediction accuracy
across sessions (CC)
Predictive
Second
Predictive
Second
Predictive
Second
Predictive
Second
***
***
***
***
***
***
***
***
***
***
***
***
***
***
0
0.2
0.4
0.6
*** *** *** ***
Joints prediction accuracy (CC)
Joints prediction accuracy (CC)
-0.2
0
0.2
0.4
0.6
-0.2
0
0.2
0.4
0.6
Monkey C
Monkey J
Monkey C
Monkey J
*
*
*
*
*
*
0
0.005
0.01
0.015
Distance from centroid
Chance level / 2
*
*
*
0
0.005
0.01
0.015
Distance from centroid
*
*
*
2D projection
Decay-frequency
domain
ND: non-decaying
NP: non-periodic
a b c
NP
Eigenvalue-dimension diagram Eigenvalue-dimension diagram
(top view)
0
0
d
Spike LFP
0.19
0.25
0.5
1
2
4
ND
Decay
(s)
0.1
0.2
0.3
0.4
0.47
Frequency
(Hz)
0.19
0.25
0.5
1
2
4
ND
Decay
(s)
0.1
0.2
0.3
0.4
0.47
Frequency
(Hz)
1
0.99
0.03
0.98
Real
5
0.02 0.97
Imaginary
10
0.01 0.96
Dimension
0.95 0
15
20
25
1
0.99
0.03
0.98
Real
5
0.02 0.97
Imaginary
10
0.01 0.96
Dimension
0.95 0
15
20
25
0.97
0.98
0.99
1
Real
0
0.01
0.02
0.03
Imaginary 0.95
0.96
0.99
1
Real
0.01
0.03
Imaginary
0.02
0.95
0.96
0.97
0.98
Figure 3.2: Both spiking and LFP network activity exhibited several principal modes in their dynamics,
where one of them was dominant in predicting behavior. (a) Eigenvalue-dimension diagram for spiking
(top) and LFP (bottom) network activity in one sample session. The vertical clusters are principal modes
each shown with a different color. Modes that cannot be categorized in clusters are shown in grey. (b) The
top view of eigenvalue-dimension diagram in (a) only shown for dimensions higher than 10. (c) The top view
of eigenvalue-dimension diagram in (b) in the interpretable decay-frequency domain. NP and ND represent
non-periodic or non-decaying dynamics, respectively. (d) Principal modes that form vertical clusters exist
in every session. Each data point (grey dot) represents the mean distance of the members in one vertical
mode cluster in one session from its centroid in the x-y plane. Data points are shown for all clusters in all
sessions. (e-f) In both spiking and LFP activity, one mode is dominantly predictive of behavior. Joint angle
prediction accuracy statistics are shown for the members of the mode clusters in spiking and LFP network
activity in the sample session shown in (a-c). Colors represent the colored clusters in (a-c). For spiking
activity and LFP activity, the yellow and brown clusters were the predictive modes, respectively. (g) Across
both subjects, among all principal modes in spiking and LFP activity, the prediction accuracy of one of the
predictive mode is significantly better than that of the second best principal mode.
26
***
***
***
***
***
***
***
***
d
Predictive mode’s centroid
From
LFP
From
spike
From
spike-LFP
b
c
Combined spike-LFP
a
Eigenvalue-dimension diagram Eigenvalue-dimension diagram
(top view)
Monkey J Monkey C
0.95 0.96 0.97 0.98 0.99 1
Real
0
0.01
0.02
0.03
Imaginary
0.95 0.96 0.97 0.98 0.99 1
Real
0
0.01
0.02
0.03
Imaginary
Real
0
0.01
0.02
0.03
Imaginary 0.95
0.96
0.97
0.98
0.99
1
1
0.99
0.03
Real
0.97
Imaginary
10
0.01
0.96
Dimension
0.95
0
15
20
25
0.98
5
0.02
-0.2
0
0.2
0.4
0.6
Joints prediction accuracy (CC)
Figure 3.3: The predictive mode is multiscale, i.e., is shared across scales of brain activity. (a) Eigenvalue-
dimension diagram for combined spike-LFP activity. Figure convention is the same as Fig. 3.2. Combined
spike-LFP network activity also exhibited several principal modes among which again one (cyan) was dom-
inantly predictive of behavior. (b) The predictive modes across sessions in the two monkeys. Each dot
represents the centroid of the predictive mode cluster in one experimental session for spiking (yellow), LFP
(brown) and combined spike-LFP (cyan) network activity.
state dynamics exhibited a∼ 0.2 Hz frequency component, which quantifies the tendency of the state vector
to rotate around the fixed point and exhibited a ∼ 1 s time decay which quantifies the tendency of the state
vector to go to the fixed point [56]. The fixed point for LDMs is the origin of the state space. Please refer to
the Supplementary material in [56] to understand how the vector field of dynamics changes by changing the
frequency and decay characteristics of the multiscale predictive mode.
27
-10 -5 0 5 10
State 1 (a.u.)
-10
-5
0
5
10
State 2 (a.u.)
Decay = 1.0 s
Frequency = 0.2 Hz
c
-10 -5 0 5 10
State 1 (a.u.)
-10
-5
0
5
10
State 2 (a.u.)
Decay = 1.0 s
Frequency = 0.2 Hz
b
a
0 20 40
Time (s)
-4
-2
0
2
4
Predictive mode
latent states (a.u.)
0 20 40
Time (s)
-4
-2
0
2
4
Predictive mode
latent states (a.u.)
Reach
Return
State 2 State 1
Spike LFP
Figure 3.4: Illustration of the multiscale predictive mode and its dynamical characteristics. (a) The predic-
tive mode’s two latent states are shown for both spiking network activity (left) and LFP network activity
(right) in a sample experimental session. We extracted the reach and return periods from the temporal pro-
files of movement kinematics. This is done by extracting stationary periods when velocity and acceleration
are below certain thresholds and subsequently identifying movement initiation given increased velocity af-
ter stationary periods. Reach period (green) is defined as the period starting at when the subjects initiate
the movement towards the target until they reach it. Similarly, return period (red) is defined as the period
starting at when they initiate the backward movement towards the resting position until they reach it. (b)
The vector field corresponding to the multiscale predictive mode exhibits rotations. State space is shown in
arbitrary units (a.u.). (c) A sample state trajectory (shown by a cyan trajectory) from initial point (shown by
a cyan dot) also shows rotations.
3.2.4 The multiscale predictive mode was not replicated in behavior dynamics
An important concern is that the multiscale predictive mode is shared across scales because it simply repli-
cates a direct representation of the behavior modes. To address this concern, we performed two control
analyses. First, we performed the same modal analysis on behavior signals in two ways: 1) direct dynamical
model, 2) latent dynamical model. In the direct dynamical model, we extracted the eigenvalues of the matrix
that directly describes how the joint angles z
t
evolve in time, i.e., eigenvalues of z
t
= A
kin
z
t− 1
, where A
kin
28
is the matrix that is fitted based on ordinary least squares from z
t− 1
to z
t
. In the latent dynamical model,
we treat the joint angles as the observation of an LDM and repeat the same modal analysis procedure as
in neural activity. Overall, we found that the behavior modes exhibited a wider range of frequencies and
larger decays compared to the multiscale predictive mode in neural activity (Fig. 3.5a-c). The mean decay
and frequency of the multiscale predictive mode were 1.1± 0.49 s and 0.17± 0.03 Hz, respectively. In
comparison, for the modes in joint angle trajectories, the decay was significantly larger than the multiscale
predictive mode decay and ranged around 4–10 s (Fig. 3.5c). Also, for these joint angle modes, the range
of frequency was 0–0.49 Hz, which was significantly wider than the range of multiscale predictive mode
frequencies (Fig. 3.5c).
Second, we computed the power spectral density (PSD) of the trajectories of behavior measurements and
calculated the peak frequencies. These results further supported that behavior frequency ranges were wider
than the multiscale predictive mode frequencies (Fig. 3.5d). The peak frequencies of the PSD for the joint
angles and end-point hand kinematics were distributed between a wide range of 0–0.4 Hz consistent with
the behavior modal analysis above. The same results held for the end-point hand kinematics [56]. In our
manuscript, we also performed another control analysis where we simulated a dynamical model with some
behavior modes directly represented in the neural observations. We showed that in that case, unlike what
we observed in the real data, the neural modes would replicate the behavior modes. These control analyses
justify that the multiscale predictive mode was not simply shared across scales and sessions, simply due to
it being the direct representation of the behavior modes.
3.2.5 The predictive mode was present in both low- and high-frequency LFP bands
One concern from the observation of shared multiscale predictive mode was that it was simply due to the
known related information in the high-frequency (gamma) band of LFP activity and spiking activity [36,
116]. Therefore, we asked whether the predictive mode was present in the low-frequency bands of LFP
activity as well.
To address this question, we performed the modal analysis on the low-frequency (theta + alpha + beta
1 + beta 2) and high-frequency (gamma 1 + gamma 2 + gamma 3) bands of LFP activity separately. We
found that the multiscale predictive mode was again present in both (Fig. 3.6). Finally, since the multiscale
predictive mode was also shared across frequency bands of LFP activity, we further validated the hypothesis
29
that the behavior prediction accuracy should be explained by the distance between the multiscale predic-
tive mode and the closest estimated mode when using different combinations of these frequency bands.
We estimated the principal modes from various combinations of frequency bands of LFP activity (theta +
alpha, beta and gamma in addition to their pairwise combinations). We found that the distance of the clos-
est estimated principal mode to the multiscale predictive mode was significantly correlated with behavior
prediction accuracy (Fig. 3.6c).
3.2.6 Both the decay and frequency of the multiscale predictive mode explained naturalistic
behavior
We reached to multiple convergent evidence revealing that the decay and frequency of the principal modes
explain their behavior prediction accuracy. First, the multiscale predictive modes had significantly larger
decays compared to every other complex conjugate principal mode (Fig. 3.7a), indicating that accurate pre-
dictions of naturalistic behavior required integrating information over larger time intervals on the order of
1 s (Discussion). Second, we fitted a multivariate linear regression (MVLR) model that wrote the behavior
prediction accuracy of principal modes as a linear function of their decay and frequency deviation from the
multiscale predictive mode [56]. We found that both decay and frequency deviation had significantly nega-
tive coefficients in the MVLR (Fig. 3.7b), indicating that the further the decay or frequency of a principal
mode was from the multiscale predictive mode, the lower its behavior prediction accuracy was. Third, we
perturbed the decay and frequency of the multiscale predictive mode and recomputed its behavior predic-
tion accuracy. To do this, we first fixed the frequency of the multiscale predictive mode and perturbed its
decay from 0.1–10 s while fixing all the other principal modes at their learned values. We then relearned
the state-space model (all parameters except the state transition matrix that is dictated by the modes) for all
experimental sessions from neural data. We repeated this analysis by similarly fixing the decay and per-
turbing the frequency from 0.02 Hz to 3 Hz for the multiscale predictive mode. For simplicity, we fixed the
dimension of the latent state at 15. We found that the behavior prediction accuracy of the perturbed mode
decreased significantly when going outside the vicinity of the multiscale predictive mode (Fig. 3.7c-d).
Further, a perturbation that decreased the decay value (faster decays) was more detrimental than one that in-
creased the decay value (slower decays). Decreasing the decay value 10 times led to a decrease of 0.33 in CC
of prediction accuracy (from 0.4 to 0.07) compared to a decrease of 0.05 (from 0.4 to 0.35) when increasing
30
the decay 10 times (Fig. 3.7c). Therefore, prediction accuracy decreased more when the perturbation led to
smaller decays (fast-decaying) compared to larger decays (slow-decaying), again indicating that integrating
information over larger time intervals of at least 1 s is important in behavior prediction. Taken together,
these convergent results suggest that both the decay and frequency temporal characteristics of the multiscale
predictive mode are important in explaining how motor cortical state dynamics control naturalistic behavior.
3.3 Discussion
Here, we developed a modal analysis framework to study the dynamical characteristics of the low-dimensional
dynamics in the neural population activity, and investigated those in the spiking and LFP activity during a
naturalistic reach-and-grasp task. Our modal analysis framework is based on linear (or generalized linear)
dynamical models and quantifies the dynamical characteristics via the modes of the LDM. The idea of ex-
tracting the principal modes of the population activity from the eigenvalue-dimension diagram, allows us to
systematically separate the principal modes of the population activity from those that are randomly identi-
fied in various state dimensions. Partitioning the hidden state of LDM into modes, allows us to quantify the
contribution of the modes in any linear or generalized linear prediction. In fact, as shown in our work [56],
any linear projection can be summarized into sum of contributions of modes. We used this to quantify the
contribution of modes to behavior prediction or one-step-ahead LFP prediction. This partitioning can also
be extended to generalized linear models and we leveraged this to quantify the contribution of mods in the
spiking one-step-ahead prediction.
After applying our dynamical model learning and modal analysis framework to simultaneously recorded
spiking and LFP activity during naturalistic movements, we made several interesting observations. The first
observation was that there existed several principal modes in the population activity. This observation is not
necessarily expected in any high-dimensional time-series. To show this, we randomly shuffled the observed
neural activity in time and showed that there did not exist any clear principal mode via the same modal
analysis procedure [56]. The second interesting observation was that in both spiking and LFP population
activity, there was a mode that was dominant in predicting behavior variables and neural activity. More
interestingly, this mode was shared between both scales, i.e., had similar eigenvalue and therefore similar
decay and frequency characteristics. To show that this observation is more fundamental than being simply
due to shared aspects of the recordings, we did several control tests. We first showed that this shared
31
predictive mode is not simply represented from the behavior modes. To do this, we repeated the same
modal analysis on the behavior variables and showed that in fact the behavior modes are different from
the multiscale predictive mode. It has to be noted that we expect the predictive mode to be close to the
behavior modes to an extent, but not exactly matched. This is in line with a large body of work that showed
observations against the representational modeling [9–11]. We believe behavior is encoded in the neural
activity but not necessarily directly. Another control analysis we did was to make sure that the shared
predictive mode was not necessarily shared because of the known relation between high-frequency bands
of LFP activity and spiking activity. Interestingly, we showed that this mode was present in both low-
dimensional and high-dimensional frequency bands of LFP activity. Also, we made sure that we choose non-
overlapping electrodes for the pool of spiking activity and LFP activity. Since the inter-electrode distance
(1.5 mm) in our recording array was far more than the distance known for LFP-spike correlations, we
believe our results was not confounded by such correlations in the pool of electrodes we chose [56]. In
our manuscript, we thoroughly discussed the frequency of the multiscale predictive mode in the context of
prior works [56]. Overall, we propose that multiscale, low-dimensional motor cortical state dynamics reflect
the neural control of naturalistic reach-and-grasp behavior due to the integrated action of large-scale brain
networks.
32
b
c
One-step-ahead
prediction accuracy
(CC)
a
0.95
0.96
0.97
0.98
0.99
1
Real
0
0.01
0.02
0.03
0.04
0.05
0.06
Imaginary
D = 1.1 s
D = 10 s
D = 4 s
0.95
0.96
0.97
0.98
0.99
1
Real
0
0.01
0.02
0.03
0.04
0.05
0.06
Imaginary
F = 0.17 Hz
F = 0.49 Hz
F = 0 Hz
D: Time decay
F: Frequency
0
0.1
0.2
0.3
0.4
0.5
0.6
d
0
0.1
0.2
0.3
0.4
0.5
Shoulder elevation
Elevation angle
Shoulder rotation
Pro supination
Elbow flexion
Wrist flexion
Wrist deviation
Multiscale predictive
mode
Frequency (Hz)
0.95
0.96
0.97
0.98
0.99
1
Real
0
0.01
0.02
0.03
0.04
0.05
0.06
Imaginary
1
Real
0
0.01
0.02
0.03
0.04
0.05
0.06
Imaginary
0.95
0.96
0.97
0.98
0.99
Mean of
multiscale predictive
modes
Multiscale
predictive
modes
Behavior
modes
Figure 3.5: The multiscale predictive mode was not simply due to direct representation or replication of
behavior modes in neural activity. (a) The modes of behavior dynamics (pooled across all sessions) were
different from the multiscale predictive mode in neural activity. This process is done with the direct dy-
namical model as explained in Section 3.2.4. The yellow-brown stars represent the average of multiscale
predictive modes (pooled across all sessions). (b) Similar to (a), but with the latent dynamical model ex-
plained in 3.2.4. These modes were very similar to direct behavior modes found in (a), yet again different
from the multiscale predictive mode in neural activity. Behavior mode’s color indicates the correlation co-
efficient (CC) between its one-step-ahead prediction of joint angles with their true values as shown in the
color bar (shows their prevalence in behavior dynamics). (c) The decay (left) and frequency (right) for the
behavior modes vs. the multiscale predictive mode in neural activity. The yellow-brown dot represents the
mean location of the multiscale predictive modes across all sessions. The red and green lines show the con-
tours on which either frequency or decay are constant. The decay of the behavior modes was significantly
smaller than the behavior modes, and their frequency had smaller range. (d) The peak frequency of joint
angle PSDs across all experimental sessions as shown in the figure by the different colors. Consistent with
our behavior modal analysis in (a-c), the PSD peak frequency of behavior had a wider range compared with
the multiscale predictive modes.
33
Low-frequency LFP
a
Monkey J Monkey C
High-frequency LFP
b
Low-frequency
High-frequency
Chance level
*
*
*
*
*
*
Distance to the multiscale predictive mode
(z-scored)
Joints prediction accuracy
(CC, z-scored)
Monkey J Monkey C
c
Distance to the multiscale predictive mode
(z-scored)
Joints prediction accuracy
(CC, z-scored)
*
*
*
*
*
*
Members of closest
principal mode cluster
Multiscale predictive
mode
-1 0 1
-2
-1
0
1
2
-1 0 1 2
-2
-1
0
1
R = -0.71 P = 1.5 x 10
-7
R = -0.92 P = 1.8 x 10
-10
0
0.005
0.01
0.015
0.02
0.025
Distance
0
0.005
0.01
0.015
0.02
0.025
Distance
0.03
0.02
0.01
0
0.03
0.02
0.01
0
0.03
0.02
0.01
0
0.03
0.02
0.01
0
0.95
0.96
0.97
0.98
0.99
1
Real
Imaginary
0.96
0.97
0.98
0.99
1
Real
Imaginary
0.95
0.95
0.96
0.97
0.98
0.99
1
Real
Imaginary
0.95
0.96
0.97
0.98
0.99
1
Real
Imaginary
Figure 3.6: The multiscale predictive mode was present in both low-frequency and high-frequency bands of
LFP activity. (a) For a sample experimental session (top: Monkey J, bottom: Monkey C), the top-view of the
eigenvalue-dimension diagram is shown for low-frequency (theta + alpha + beta; left) and high-frequency
(gamma; right) bands of LFP activity. The members of the closest estimated principal mode cluster to the
yellow dot are shown in orange. (b) In both monkeys and for both low-frequency and high-frequency bands
of LFP activity, the mean distance of the closest estimated principal mode cluster to the multiscale predictive
mode was significantly smaller than chance-level. The distance in each session is calculated between the
centroid of the orange cluster and the yellow dot in (a). (c) The distance of the estimated principal mode
to the multiscale predictive mode is correlated with the behavior prediction accuracy for different frequency
band combinations of LFP activity (theta + alpha, beta, gamma in addition to their pairwise combinations);
For each monkey, each dot represents one frequency band combination for one session.
34
***
***
***
n.s.
***
P
f
= 7.8 x 10
-5
P
d
= 2.6 x 10
-10
c
R
2
= 0.66
Perturb decay Perturb frequency
Decay (s)
* * *
a
b
0.02
0.05
0.1
0.17
0.3
0.5
1
1.5
3
Perturbed frequency (Hz)
-0.1
0
0.1
0.2
0.3
0.4
0.5
Joints prediction accuracy (CC)
***
***
***
***
0.1
0.2
0.5
1
2
4
6
10
Perturbed decay (s)
-0.1
0
0.1
0.2
0.3
0.4
0.5
Joints prediction accuracy (CC)
***
***
n.s.
*
*
***
Multiscale
predictive
mode
Other
principal
modes
0
1
2
2
4
0.2
-0.2
0
0
Frequency
deviation (Hz)
0.1
0.2
Joints prediction accuracy
(CC)
Decay
deviation (s)
0.5
0.4
0.6
0
1
Figure 3.7: The multiscale predictive mode had a larger decay than other modes and both its decay and fre-
quency explained naturalistic behavior prediction. (a) Multiscale predictive modes had significantly larger
decays than other complex conjugate modes in spiking and LFP activity. Each grey dot shows the decay of
one principal mode across subjects and experimental sessions. (b) Multivariate linear regression (MVLR)
relating the principal mode’s prediction accuracy to its decay and frequency deviation. The deviations were
computed as the mode’s decay and frequency absolute difference from those of the multiscale predictive
mode. Both decay and frequency had significantly negative coefficient with p-values reported by P
d
and P
f
,
respectively. R
2
shows the R-squared of the fitted MVLR model. Each green dot represents one complex-
conjugate principal mode across subjects and experimental sessions. The black line shows the MVLR fitted
line. (c) Perturbing decay (left) and frequency (right) of the multiscale predictive modes in the learned state-
space model shows that both decay and frequency components explained naturalistic behavior prediction
accuracy. Each dot represents the mean prediction accuracy of one perturbed mode across experimental
sessions and monkeys. The x axis is shown in log scale and represents the value of perturbed decay and
frequency. Asterisks show whether the prediction of the perturbed mode is significantly different from that
of the unperturbed multiscale predictive mode represented by a vertical dashed line. Decays around 1–2 s
had the best prediction accuracy and mode frequencies around 0.17-0.3 Hz had the best prediction accuracy.
35
Chapter 4
Dynamical flexible inference of nonlinear latent structures in neural
population activity
Neural population activity exhibits spatiotemporal dynamical patterns during our daily functions [17, 19, 20,
26, 29, 56, 58, 62, 78, 80, 117–121]. Such dynamical patterns are summarized within lower-dimensional
latent structures, uncovered by inferring the latent factors of high-dimensional neural activity. Extracting
such latent structures reveals significant insight about the brain mechanisms during tasks, which cannot be
revealed by directly looking at the high-dimensional neural activity [6, 10, 13, 15–17, 19, 20, 26, 29, 56, 58,
62, 78, 80, 117–122]. These latent factors can also decode behavior to enable enhanced neurotechnology
and brain-machine interfaces (BMIs) [6, 15]. As such, a critical objective is to develop latent factor models
that not only are accurate in characterizing neural population activity with potential nonlinearities, but also
enable the flexible inference of latent factors so that these factors can be seamlessly extracted in basic science
and neurotechnology applications. Despite much progress, this objective of simultaneously enabling both
accurate nonlinear modeling and flexible inference has remained elusive.
Overall, we are interested in developing a model that (i) captures potential nonlinearities in neural pop-
ulation activity and (ii) has a data-efficient architecture for generalizable data-driven training. Second, we
are interested in a model that enables flexible inference, i.e., (iii) to be capable of both causal/real-time and
non-causal inference simultaneously and (iv) to allow for inference in the presence of missing neural mea-
surements. If achieved, flexible inference will allow the same trained model not only to operate causally and
recursively in real-time to enable neurotechnology, but also to leverage the entire length of data non-causally
for more accurate inference in scientific investigations. Such inference will also make the model robust to
missing or noisy measurements. Current dynamical models of neural population activity do not satisfy all
the above four properties simultaneously.
36
In general, most prior latent models of neural population activity are either linear or generalized linear,
mostly in form of LDMs [13, 19, 25, 43, 56–58, 61, 123]. LDMs are widely used given that they are
easy to train, to interpret and have flexible inference with Kalman filtering [124]. One major limitation
of LDMs is that they cannot capture potential nonlinearities in neural population activity. On the other
hand, recent studies have leveraged the richness of deep learning to develop generative dynamical models
of neural population activity [29, 65–68, 119]. While these models can capture nonlinearity, they do not
meet all the flexible inference properties outlined above. This is because the inference for these models is
not solvable analytically unlike Kalman filtering for LDMs, they thus need to empirically train an inference
or recognition network simultaneously with their generative network, usually requiring the entire length of
data over a trial. Thus, their inference depends on how the specific inference network is structured and is
not flexible to satisfy all the above properties. For example, in prior generative models, including sequential
autoencoders (SAEs) [29] or LDMs with nonlinear embeddings [66], inference is non-causal, and real-time
and/or recursive inference is not directly addressed [29, 65–68, 119]. Further, in these models, inference in
the presence of missing observations is not directly addressed [29, 66–68, 119] and using zeros instead of
missing observations [74, 75] can yield sub-optimal performance by changing the inherent values of missing
observations during inference [76, 77]. Similarly to these generative models, predictive dynamical models of
neural activity that use forward recurrent neural networks (RNNs) [64, 71, 72] also do not enable the flexible
inference properties above. While these models can perform causal inference, they do not allow for non-
causal inference to leverage all data, and they do not directly address inference with missing observations
similar to above generative models.
Here, we develop a neural network model that encompasses both flexible inference and accurate non-
linear description of neural population activity. To achieve this, we build a network architecture consisting
of two sets of latent factors rather than one. One set termed dynamic factors captures how neural popula-
tion activity evolves over a low-dimensional nonlinear manifold, and the other set termed manifold factors
characterizes how this manifold is embedded in the high-dimensional neural activity space. By separating
these two sets of factors, we can capture the nonlinearity in the manifold factors while keeping the dynam-
ics on the manifold linear, thus enabling flexible inference by exploiting the Kalman filter on the nonlinear
manifold (see Methods and Discussion). We term this method Dynamical Flexible Inference for Nonlinear
Embeddings (DFINE). We validated DFINE in nonlinear simulations and then compared it to benchmark
linear LDM and nonlinear SAE methods across diverse behavioral tasks, brain regions and neural signal
37
types. This chapter resulted in a manuscript with the help of my lab mate, Eray Erturk, and is currently
under revision for Nature Biomedical Engineering journal.
4.1 Methods
We develop a neural network architecture that allows for accurate nonlinear description similar to neural
networks, but in a manner that also enables flexible inference similar to LDM. To do so, we introduce
two sets of latent factors to describe n
y
× 1 dimensional neural population activity y
t
∈R
n
y
: dynamic latent
factors x
t
∈R
n
x
× 1 and manifold latent factors a
t
∈R
n
a
× 1. This is based on the key idea that the dynamical
latent domain is separated form the nonlinear latent domain, by adding a middle manifold layer a between
the dynamic latent factors x and neural observations y (Fig. 4.1a). This separation plays a key role in
enabling the flexible inference properties of the network to optimally perform: 1) recursive causal inference
(filtering), 2) non-causal inference with all observations (smoothing), and 3) inference in the presence of
missing observations. Specifically, the separation enables flexible inference using a Kalman filter for the
bottom dynamic part of the model in Fig. 4.1a that infers the dynamic factors x from the manifold factors a
(Fig. 4.1b). We now describe the network architecture consisting of the dynamic and manifold factors.
4.1.1 DFINE generative model
We write the state evolution equation similar to equation (2.1):
x
t
= Ax
t− 1
+ w
t
, (4.1)
where w
t
∈R
n
x
× 1
is a zero-mean Gaussian noise with covariance matrix W∈R
n
x
× n
x
and A∈R
n
x
× n
x
is the
state transition matrix. The manifold latent factor a
t
is related to the dynamic latent factor x
t
as the emission
of an LDM model similar to (2.3):
a
t
= Cx
t
+ r
t
, (4.2)
where C∈R
n
a
× n
x
is the emission matrix and r
t
∈R
n
x
× 1
is a zero-mean Gaussian noise with covariance
matrix R∈R
n
a
× n
a
. Equations (4.1) and (4.2) together form an LDM with a
t
being the Gaussian noisy
38
y
1
y
2
y
T
a
1
a
2
a
T
x
1
x
2
x
T
Neural
population
activity
Manifold
latent
factor
Dynamic
latent
factor
A
. . .
...
...
...
z
1
z
2
z
T
... Behavior
. . .
. . .
Decoder
MLP
MLP
C
Encoder
MLP
Mapper
y
1
x
1|1
Encoder
MLP
â
1
x
0
Initial dynamic
latent factor
Kalman predi-
ction & update
...
...
a
1|1
C
Step2:
Kalman ltering
Step1:
Nonlinear manifold
embedding
a
b
. . .
y
2
x
2|2
Encoder
MLP
â
2
Kalman predi-
ction & update
a
2|2
C
. . .
...
...
Optional
Figure 4.1: DFINE graphical model and its flexible inference mechanism. (a) The DFINE generative graph-
ical model with two sets of latent factors: the dynamic and manifold latent factors, x
t
and a
t
, respectively.
The relationship between a
t
and y
t
is modeled with an autoencoder with MLP encoder and decoder net-
works, where the manifold latent factor is the bottleneck representation. The dashed line from y
t
to a
t
is
only used for inference and is not part of the generative model. The dynamic and manifold latent factors
together form an LDM with a
t
being noisy observations of x
t
, which constitute the LDM states. The tem-
poral evolution of the x
t
is described with a linear dynamic equation. All model parameters are learned
jointly in a single optimization by minimizing the prediction error of future neural observations from their
past. In the unsupervised version, after training the DFINE model, we use a mapper MLP network to learn
the mapping between the manifold latent factors a
t
and behavior variables z
t
. We also extend to supervised
DFINE where the mapper MLP network is simultaneously trained with all other model parameters in an
optimization that now minimizes both neural and behavior prediction errors. (b) The inference procedure
with DFINE is shown. We first get a noisy estimate of a
t
using the nonlinear manifold embedding at every
time point. With the aid of the dynamic equation, we use Kalman filtering to infer the dynamic latent factors
x
t|k
and refine our estimate of the manifold latent factors a
t|k
, with subscript t|k denoting inference at time
t from all neural observations up to time k, y
1:k
. In addition to real-time filtering ( t|t) which is displayed,
DFINE can also perform smoothing or filtering in the presence of missing observations. Inference method is
the same for unsupervised and supervised DFINE and is done purely based on neural observations as shown
here (only model training and thus weights are different).
observations. We denote the parameter set of this LDM byΨ={A,W,C,R,µ
0
,Λ
0
}, whereµ
0
andΛ
0
are
the initial estimate and covariance of the dynamic latent factors, respectively.
Second, to model nonlinearities, we use autoencoders to learn the mapping between neural observations
y
t
and manifold latent factors a
t
. In general, autoencoders are static generative models made of two parts:
the encoder that maps the observations to a bottleneck representation and the decoder that takes this bot-
tleneck representation to the observations. Here, autoencoder observations are the neural observations and
39
autoencoder bottleneck representation is given by the manifold latent factors. We model the decoder part as
a nonlinear mapping f
θ
(·) from manifold latent factors to neural observations:
y
t
= f
θ
(a
t
)+ v
t
, (4.3)
where θ are the decoder parameters and v
t
∈R
n
y
× 1
is a white Gaussian noise with covariance matrix
V∈R
n
y
× n
y
. We model nonlinear mappings with MLPs as they are universal approximators of any nonlinear
function under mild conditions [125]. Equations (4.1)-(4.3) together form the generative model (Fig. 4.1a).
For inference (next section), we also need the mapping from y
t
to a
t
, which we characterize as:
a
t
= f
φ
(y
t
), (4.4)
where f
φ
(·) represents the encoder in the autoencoder structure and is parameterized by another MLP
(Fig. 4.1a).
4.1.2 DFINE inference
Using the model in equation (4.1)-(4.4), we need to infer both the manifold and dynamic latent factors
from neural observations y
1:T
, where T is the total number of time steps for the observations. We use the
subscript t|k to denote the inferred latent factors at time t given y
1:k
. Therefore, t|t denotes filtering (causal)
inference given y
1:t
and t|T denotes smoothing (non-causal) inference given y
1:T
. As an intermediate step
called nonlinear manifold embedding, we first directly but statically get an initial estimate of a
t
based on
y
t
from equation (4.4) as ˆ a
t
t = f
φ
(y
t
) to provide the noisy observations of the dynamical model, i.e., ˆ a
t
,
in Fig. 4.1b. Having obtained ˆ a
t
, we can now use the dynamical part of the model in equations (4.1) and
(4.2) to infer x
t|t
with Kalman filtering from ˆ a
1:t
, and infer x
t|T
with Kalman smoothing [126] from ˆ a
1:T
.
We can then infer the manifold latent factor as a
t|t
= Cx
t|t
and a
t|T
= Cx
t|T
based on equation (4.2). Note
that a
t|t
is inferred not only based on the neural observations but also based on the learned dynamical model
using the Kalman filter, and thus this inference aggregates information over time dynamically. The inference
above has the following major advantages compared with prior generative neural network models that train
a non-causal inference network and use all observations in a trial. First, we can tractably infer latent factors
recursively in real-time, i.e., use the current inferred dynamic latent factors x
t|t
to calculate the next step’s
40
inferred value a
t+1|t+1
. Second, we can handle missing observations by doing only forward prediction with
the Kalman predictor at time-steps when observations are missing, without any need to impute 0 value for
these missing observations as done previously [55, 127]. Third, we can perform both causal filtering and
non-causal smoothing inference with the same learned model.
4.1.3 Learning the DFINE model
DFINE model contains stochastic noise variables making it different from regular deep learning techniques
where there exists no uncertainties, such as typical RNNs or CNNs. When stochastic noise variables exist
[29, 66, 128], the objective function is typically taken as the evidence lower bound (ELBO) of data likeli-
hood. But optimizing ELBO can be challenging [129–132], e.g., some works manipulate the KL divergence
weight in the ELBO during training to avoid falling into local minima. The direct goal of dynamical mod-
eling is to predict neural and/or behavior dynamics, We thus instead define our cost as the k-step-ahead
prediction error in predicting neural observations k time-steps into the future, i.e., the error between y
t+k
and its prediction from y
1:t
denoted by y
t+k|t
. What allows us to use this cost is that our model enables
efficient inference/prediction to compute the k-step-ahead prediction error (Fig. 4.1b) in the presence of
noise; this is because we can run all forward predictions with a single run of Kalman filtering. Thus, our
cost L is a function of all parameters as:
L(ψ,θ,φ)=
K
∑
k=1
T− k
∑
t=1
e(y
t+k|t
,y
t+k
)+λ
reg
L
2
(θ,φ), (4.5)
where K denotes the maximum horizon for future-step-ahead prediction and e(· ,·) denotes the error
measure. T is the length of the time-series, i.e., the length of each batch in the mini-batch gradient descent
[133]. L
2
(θ,φ) is an L
2
penalty for the autoencoder parametersθ,φ to prevent overfitting with regularization
hyperparameterλ
reg
. In practice, we use root mean-squared error (RMSE) for the error measure e(· ,·).
4.1.4 Extending DFINE to supervised DFINE
DFINE model, and thus its latent factors, are optimized to be predictive of neural population activity. To
address situations in which neural dynamics of a specific behavior are of interest, we develop a new learning
algorithm for DFINE that aims to extract latent factors that are more predictive of that behavior. We call
41
the resulting algorithm supervised DFINE. In particular, when continuous behavior variables z
t
∈R
n
z
× 1
are
available during training, we use them to supervise the training and learn latent factors that are predictive of
not only neural observations but also behavior variables. This is done by adding an auxiliary neural network,
termed mapper MLP network, to the DFINE graphical model during training only (Fig. 4.1a); this mapper
network maps the manifold latent factor a
t
to behavior variables z
t
during training (the link from a to z in
Fig. 4.1a) and is written as z
t
= f
γ
(a
t
)+ q
t
, where q
t
∈R
n
z
× 1
is a white Gaussian noise. Now to motivate
the network to learn latent factors that are predictive of both behavior and neural observations, we add the
behavior prediction error to the cost function in equation (4.5) as follows:
L(ψ,θ,φ,γ)=
K
∑
k=1
T− k
∑
t=1
e(y
t+k|t
,y
t+k
)+λ
beh
T
∑
t=1
e( f
γ
(a
t|T
),z
t
)+λ
reg
L
2
(θ,φ,γ), (4.6)
where λ
beh
is a hyperparameter. Larger values of λ
beh
put more emphasis on behavior prediction vs.
neural prediction and vice versa. Note that the parameters of the auxiliary MLP (γ) are also added to the
regularization term in the cost function. We emphasize that after training of supervised DFINE is completed,
the mapper MLP is discarded and not used for inference. The inference of latent factors remains identical to
that in unsupervised DFINE, and is done purely based on neural observations and independent of behavior
variables (Fig. 4.1b). The only difference is that supervised DFINE’s learned model has different parameter
values given that its learning algorithm is distinct as described above.
4.1.5 DFINE learning details and hyperparameters
Given a set of observation, we learn the model parameters by minimizing the cost function in (4.5). We
used MLP architectures containing 3 hidden layers each with 32 units for f
θ
(·) and f
φ
(·) in decoder and
encoder parts of the model, respectively. The activation function used for the units was set to tanh(·). We
used back propagation with mini-batch gradient descent implemented by ADAM optimizer [134] to learn
the model parameters and we continued the training for 300 epochs. We used 0.02 for the initial learning
rate of the ADAM optimizer. We set the maximum horizon for future-step-ahead prediction K such that
the future predictions cover at least 100ms into the future, therefore we set K = 2− 4 to optimize the cost
function across various datasets (note our time step is 50ms). We use the regularization parameterλ
reg
=10
to prevent overfitting. λ
beh
was set to 100 across all the supervised DFINE models to put emphasis on the
improved behavior prediction accuracy.
42
4.1.6 Comparison with benchmark methods such as linear dynamical models and sequential
autoencoders
We validate DFINE on numerical simulations and several datasets ranging various brain regions and neural
modalities. We compare the results on neural datasets with 2 benchmark methods: 1) linear dynamical
models (LDM) and 2) sequential autoencoders (SAE)s. We use the architecture in latent factor analysis of
dynamical systems (LFADS) for SAEs [29]. The details of these benchmark methods is very similar to our
prior work [135].
4.1.7 Topological data analysis
To quantify how robustly models identify the latent manifold structure in single-trials, we applied topolog-
ical data analysis (TDA) on smoothed latent factors. TDA uses persistent homology [136] to find multi-
dimensional holes (e.g. 1D hole is a ring, 2D hole is a 2D void) in data by growing the radius of ε-balls
around datapoints, which are connected when they touch. TDA finds the manifold type by counting the
number of persistent multi-dimensional holes in the data-manifold. For example, a torus has one 2D hole
and two 1D holes [136]. We run TDA on smoothed single-trial latent factors of the learned models in the
cross-validation test set and assess the most persistent hole’s birth and length. The most persistent hole is
the hole that lasts the longest. The birth happens at the ε value at which the hole appears (smaller values
correspond to earlier births) and the length is the ε interval for which the hole lasts. To assess robustness
in single-trials, we ask for which model TDA finds holes that are born earlier and last longer, i.e., are more
persistent: the sooner a hole is born (at shorter radius) and the longer it lasts (at longer radius), the more
prevalent/robust it is in the latent factors extracted. To take into account scaling differences in the latent
space of each method, we z-score single-trial latent factors in each dimension before running TDA. To visu-
alize the TDA results, for the most persistent holes, we plot their lengths vs. births. On this plot, the optimal
frontier is the top left area of the plot indicating earlier births and longer lengths. To aggregate the results in
each test cross-validation fold, we average the birth and length values of TDA for single-trial latent factors
in that test fold.
43
4.2 Results
4.2.1 DFINE successfully learns the dynamics on diverse nonlinear manifolds and enables
flexible inference in simulated datasets
We first demonstrated that the DFINE model learns dynamical trajectories on various nonlinear manifolds
and also enables flexible inference. Given the plausibility of ring-like, spiral-like and toroidal structures in
neural population activity in prior studies [62, 80, 121, 137], we simulated trajectories on Swiss roll, Torus
and ring-like manifolds as a proof of concept (Fig. 4.2). We synthesized 30 different simulated sessions
(each with 250 trials) with randomly selected noise values and with the manifolds uniformly chosen from
the above possibilities. We demonstrate a sample underlying trajectory and its noisy realization in Fig. 4.2a
and Fig. 4.2b, respectively.
We showed that DFINE can correctly infer the trajectories on the manifolds, even from noisy observa-
tions (Figs. 4.2a-c) and even in the presence of missing observations (Fig. 4.2f). The inferred trajectory
from the DFINE model (Fig. 4.2c) matches with the true underlying trajectory (Fig. 4.2a). To show that
across various trials and simulated manifolds, the DFINE model correctly learns the dynamics, we showed
that the learned model’s one-step-ahead prediction error converged to that of the true model (Fig. 4.2d). We
next demonstrated that DFINE model enables flexible inference, filtering and smoothing inference with or
without the presence of missing samples (Figs. 4.2e-f). To show this, we randomly dropped neural observa-
tion datapoints from each trial to achieve a desired observed datapoint ratio, which is defined as the ratio of
the datapoints that are maintained/not-dropped to the total number of datapoints. DFINE predictions even
for ratios as low as 0.2 were similar to when all datapoints were retained, showing that DFINE could use
the learned dynamics to compensate for missing observations (Fig. 4.2e). Further, even for ratios as low
as 0.05, DFINE still performed better than chance-level of 1 (Fig. 4.2e). Third, smoothing significantly
improved the inference of trajectories because it could also leverage the future neural observations, and
this improvement due to smoothing was more prominent in the lower observed datapoint ratios (Fig. 4.2e).
Indeed, when observation samples were dropped, filtering could fail to produce good predictions for low
ratios while smoothing led to successful predictions even for ratios as low as 0.1 (Fig. 4.2f). Overall, the
simulation analysis showed that DFINE can learn the dynamics on diverse nonlinear manifolds, perform
flexible inference both causally (filtering) and non-causally (smoothing), and succeed even in the presence
of missing observations.
44
True underlying trajectory Noisy observation Inferred trajectory (smoothing)
0
25
50
75
100
125
150
175
200
Example manifold (Swiss roll)
a b c
d
Dim 1
Dim 2
Dim 3
Dim 1
Dim 2
Dim 3
Dim 1
Dim 2
Dim 3
e
Inferred trajectory (filtering) Inferred trajectory (smoothing) Observed trajectory (before noise)
f
Dim 1
Dim 2
Dim 3
Dim 1
Dim 2
Dim 3
Dim 1
Dim 2
Dim 3
Observed datapoint ratio
= 0.1
0 50 100 150 200
Epoch
0.4
0.6
0.8
1.0
One-step-ahead
prediction error
(NRMSE, n = 150)
Learned model
True model
Initialized model
0.05
0.1
0.2
0.25
0.3
0.4
0.5
0.6
0.7
0.75
0.8
0.9
1
Observed datapoint ratio
0.05
0.10
0.15
0.20
Underlaying trajectory
reconstruction error
(NRMSE, n = 150)
Filtering
Smoothing
Figure 4.2: DFINE successfully infers trajectories on nonlinear manifolds and enables inference in the
presence of missing samples. (a) A sample simulated trajectory is shown for an example manifold, Swiss
roll. The gray points are samples from the underlying manifold. Color evolution on the trajectory indicates
time evolution into a trial as shown in the color bar. (b) Noisy observations of the trajectory are shown with
the same color convention as in (a) and on top of the gray true trajectory. (c) After learning the DFINE model,
inferred trajectory with smoothing is shown, which is essentially on top of the true trajectory represented
with gray dots and masking it. (d) The learned models’ one-step-ahead prediction error converges to that of
the true models. The plot shows the mean of one-step-ahead prediction error for the learned and true models
across all simulated sessions, cross-validation folds and trials. (e) Neural reconstruction error between the
inferred and true trajectories is shown for smoothing and filtering across various observed datapoint ratios.
(f) An example trajectory with missing observations and its smoothing and filtering inference for observed
datapoint ratio of 0.1. Colored dots in the left panel are observed datapoints and the gray dots are shown to
visualize the true underlying trajectory.
45
4.2.2 DFINE extracts single-trial latent factors that accurately predict neural activity and
behavior
We then applied DFINE on four independent datasets, three motor datasets and one saccade dataset (Ap-
pendix A), to show that it not only allows for flexible inference but also for accurate neural and behavior
prediction (we only show 3 datasets here, but the results all held for a saccade task). We next quantified this
ability to capture inter-condition variability by computing the single-trial neural and behavior prediction
accuracies of 16-dimensional (16D) latent factors for each method. We picked this dimension because it
was sufficient for the performance of all methods to converge across all datasets. Note that during a given
trial, SAE does not predict the neural observations into the future from its past because it needs to use all
neural observations until the end of a trial to get the initial condition and then to extract the factor trajectories
at every time-step from this initial condition. Thus, while we computed the one-step-ahead neural predic-
tion error for DFINE and LDM using only past neural observations, we gave SAE the advantage of doing
neural reconstruction with smoothing based on both past and future neural observations instead. Despite
this advantage given to SAE, DFINE was better at neural prediction not only compared with LDM but also
compared with SAE across all datasets (Figs. 4.3a, 4.3d and 4.3g). In addition to its better neural prediction,
DFINE also had higher behavior prediction accuracy compared to LDM and SAE in all tasks (Figs. 4.3b,
4.3e and 4.3h). These results demonstrate that in addition to enabling flexible inference, DFINE’s single-
trial latent factors were more predictive of both behavior and neural activity compared to the benchmark
linear and nonlinear methods.
4.2.3 DFINE more robustly extracts the manifold structure in single-trials
Consistent with its better behavior and neural prediction, DFINE also more robustly captured the nonlinear
manifold structure in single-trial data compared with both LDM and SAE. First, visualization of DFINE
(and LDM/SAE) revealed a ring-like manifold structure during the movement periods in the motor tasks
(Fig. 4.4). We quantify the quality of the ring-like manifold structures using Topological Data Analysis
(TDA). TDA uses persistent homology to quantify whether there exist holes in data, and if so how many
[136]. TDA finds multi-dimensional holes (e.g., 1D hole is a ring and 2D hole is a 2D void) in data by
growing the radius ofε-balls around datapoints. If holes exist, a model that finds holes which are born earlier
and last longer – i.e., are more persistent – is more robust in revealing the manifold structure in single-trials.
46
a b c
Behavior prediction accuracy
(CC, n = 35)
Neural prediction accuracy
(CC, n = 35)
Better identification of
the ring-like manifold
LDM
SAE
DFINE (unsupervised)
Birth (a.u., n = 35)
Length (a.u., n = 35)
d
Behavior prediction accuracy
(CC, n = 15)
e f
Better identification of
the ring-like manifold
0
0.2
0.4
0.6
***
***
0.2
0.4
0.6
0.8
***
***
0.2
0.4
0.6
0.8
***
***
Neural prediction accuracy
(CC, n = 15)
3.6 4.0 4.4 4.8 5.2
Birth (a.u., n = 15)
0.8
1.0
1.2
1.4
Length (a.u., n = 15)
1.5 2.0 2.5 3.0 3.5
1.6
2.0
2.4
2.8
3.2
g h i
***
***
0
0.2
0.4
0.6
2.5 3.0 3.5 4.0
Birth (a.u., n = 35)
1.2
1.4
1.6
1.8
2.0
2.2
2.4
Length (a.u., n = 35)
Better identification of
the ring-like manifold
0
0.2
0.4
0.6
Neural prediction accuracy
(CC, n = 35)
0.2
0.4
0.6
0.8
Behavior prediction accuracy
(CC, n = 35)
***
***
***
***
Figure 4.3: DFINE outperformed benchmarks in terms of neural prediction, behavior prediction, and robust
extraction of the manifold structure in various datasets. (a) Neural prediction accuracy, (b) behavior predic-
tion accuracy, and (c) TDA analysis, for the 3D naturalistic reach-and-grasp task. Similar results held for
the random-target reaching task (d-f), and 2D grid reaching task (g-i).
Consistent with observing a ring in low-dimensional visualizations above, TDA revealed a persistent 1D
hole in low-dimensional latent factors, which we then analyzed. Compared with benchmarks, in DFINE,
the birth of the TDA’s most persistent 1D hole was significantly earlier and its length was significantly larger
during movement periods for all motor tasks (Figs. 4.3c, 4.3f, 4.3i). These results show that the ring-like
47
manifold structure was more robustly captured with DFINE than both LDM and SAE. We show the 3D
latent factor trajectories from DFINE and other methods in Fig. 4.4.
Latent factors
(DFINE)
Latent factors
(SAE)
Latent factors
(LDM)
a
Latent factors
(supervised DFINE)
Dim 1
Dim 2
Dim 3
Dim 1
Dim 2
Dim 3
Dim 1
Dim 2
Dim 3
b
Dim 1
Dim 2
Dim 3
Dim 1
Dim 2
Dim 3
Dim 1
Dim 2
Dim 3
Dim 1
Dim 2
Dim 3
Dim 1
Dim 2
Dim 3
c
Dim 1
Dim 2
Dim 3
Dim 1
Dim 2
Dim 3
Dim 1
Dim 2
Dim 3
Dim 1
Dim 2
Dim 3
Figure 4.4: Example latent factor trajectories for the motor datasets. The condition-average latent factor
trajectories are shown for all methods in the (a) 3D naturalistic reach-and-grasp task, (b) 2D random-target
reaching task, and (c) 2D grid reaching task. We observed a ring-like manifold structure during movement
periods.
4.2.4 DFINE can also be extended to enable supervision and improve behavior prediction
accuracy
DFINE is trained is trained unsupervised with respect to behavior and optimizes neural prediction. To al-
low for considering continuous behavior measurements when available during training, we next developed
supervised DFINE to train a model with latent factors that are optimized for both neural and behavior pre-
dictions (Fig. 4.1a). To validate the supervised DFINE method, we applied it to the motor datasets in
which continuous behavior measurements were available during training. We found that supervised DFINE
48
successfully improves the behavior prediction accuracy even though its model and inference architectures
are identical to those in the unsupervised DFINE, and even though its inference only takes the same neural
observations as input (Fig. 4.5). The idea is that during training, supervised DFINE encourages the network
to be more predictive of behavior parameters. So even with the same model architecture, supervised DFINE
is expected to outperform unsupervised DFINE in behavior prediction accuracy. Indeed, the latent factors
of supervised DFINE better distinguished different task conditions across all motor datasets compared to
those of unsupervised DFINE (Figs. 4.5a, 4.5d, 4.5g). Consistent with this observation, supervised DFINE
significantly improved the behavior prediction accuracy compared to unsupervised DFINE across all motor
datasets (Figs. 4.5b, 4.5e, 4.5h). These improvements were 13.6± 2.7% in the 3D naturalistic reach-and-
grasp task, 22.3± 1.8% in the 2D random-target reaching task, and 24.5± 2.5% for the 2D grid reaching
task. Also, as expected, the neural prediction accuracy of supervised DFINE was significantly lower than
unsupervised DFINE across all datasets (Figs. 4.5c, 4.5f, 4.5i); this is because supervised DFINE’s la-
tent factors are optimized for both behavior and neural prediction while those of unsupervised DFINE are
optimized purely for neural prediction.
4.2.5 DFINE can perform flexible inference with missing observations
We next investigated whether DFINE can perform inference even in the presence of missing neural observa-
tions (Fig. 4.6). To do so, we uniformly dropped neural observations throughout the recordings (Methods)
and performed inference in two ways: 1) filtering inference (causal) that only uses the past and present
available neural observations, 2) smoothing inference (non-causal) that uses all available neural observa-
tions. We found that behavior prediction accuracies of DFINE remained relatively unchanged even when
dropping up to 40% of observations (0.6 ratio), and remained well above the chance-level of 0 even when
dropping 80% of observations (lowest 0.2 ratio) (Figs. 4.6c-e). Also, behavior prediction accuracy with
smoothing inference was significantly better than that with filtering inference across all observed datapoint
ratios (Figs. 4.6c-e). Figs. 4.6a and 4.6b visually demonstrate that the smoothing inference yields more ac-
curate reconstruction of the low-dimensional latent trajectories as it leverages both past and future available
observations. We also compared DFINE with SAE in terms of inference in the presence of missing observa-
tions. Because SAE’s decoder network is modeled with an RNN, it is structured to take neural observation
inputs at every time-step. To handle missing observations for SAE at inference, we imputed them with
49
Latent factors
(DFINE)
Latent factors
(supervised DFINE)
Dim 1
Dim 2
Dim 3
Dim 1
Dim 2
Dim 3
Latent factors
(DFINE)
Latent factors
(supervised DFINE)
Dim 1
Dim 2
Dim 3
Dim 1
Dim 2
Dim 3
a
b c
d
0.2
0.4
0.6
0.8
Behavior prediction accuracy
(CC, n = 15)
Neural prediction accuracy
(CC, n = 15)
*** ***
0
0.2
0.4
0.6 ***
Neural prediction accuracy
(CC, n = 35)
0.2
0.4
0.6
0.8
Behavior prediction accuracy
(CC, n = 35)
***
e
f
Monkey T Monkey J
g
h
i
Monkey I
DFINE (supervised)
DFINE (unsupervised)
0
0.2
0.4
0.6
Dim 1
Dim 2
Dim 3
Dim 1
Dim 2
Dim 3
Latent factors
(DFINE)
Latent factors
(supervised DFINE)
0
0.2
0.4
0.6
***
0
0.2
0.4
0.6
0.8
1
***
Neural prediction accuracy
(CC, n = 35)
Behavior prediction accuracy
(CC, n = 35)
Figure 4.5: Supervised DFINE extracts latent factors that are more behavior predictive. This is done in
expense of lower neural prediction accuracy. (a) Examples of condition-average latent factor trajectories
for the unsupervised (left) and supervised (right) DFINE are shown for the 3D reach-and-grasp task. Each
color represents one condition (i.e., movement to left or right). Supervised DFINE better separates differ-
ent conditions. (b) Supervised DFINE improved the prediction of behavior, i.e., continuous position and
velocity, in the 3D reach-and-grasp task. (c) Supervised DFINE’s improved behavior prediction accuracy
was in expense of lower neural prediction accuracy compared to unsupervised DFINE. Similar results held
for the 2D random target reaching task (d-f) and 2D grid reaching task (g-i), where here each condition is a
direction angle interval/sector (for example, all movements whose direction angle is between 0-45 degrees
regardless of where they start/end).
zeros in the test set as previously done [74, 75], extracted the latent factors, and computed the associated
behavior prediction accuracy. DFINE outperformed SAE and, interestingly, this improvement grew larger
50
at lower observed datapoint ratios (Fig. 4.7). Indeed, DFINE’s degradation in performance due to missing
observations was much lower than that of SAE (Fig. 4.7). These analyses show that DFINE can flexibly
compensate for missing observations and perform both causal and non-causal inference with missing data.
These analyses also show that DFINE’s non-causal inference can leverage future data for more accurate
prediction when real-time processing is not needed.
4.3 Discussion
Here, we proposed a neural network model of neural population activity that enables flexible inference of
latent factors while capturing the potential nonlinearities. Our model can work as dimensionality reduction
technique for neural population activity. Many studies have studied low-dimensional dynamics of neural
population activity by using dimensionality reduction methods [13, 15, 16, 138]. Earlier studies mostly
used linear or generalized linear dimensionality reduction methods including factor analysis (FA) [139],
principal component analysis (PCA) [26, 78] or its variations [19, 140], generalized linear dynamical models
(LDM) [25, 47, 56, 57]. Later models incorporated nonlinearities via various techniques such as topological
identification methods [62, 121], extensions of classical manifold identification techniques such as Isomap
[63] and neural networks [29, 65, 66, 68]. Among these techniques, neural network models have gained
more attention given their generalizability and their flexibility in new extensions. For example, recent works
extended prior neural networks models to contain more behaviorally relevant information by changes to
the network architecture and leveraging the numerical optimization [64, 119]. DFINE provides a dynamic
generative model of neural data and also serves as a dimensionality reduction technique. Compared to prior
neural network models of neural population activity, DFINE has a major difference that it provides the new
capability for flexible inference, in addition to being accurate in neural, behavior, and manifold prediction.
DFINE has flexible inference in that it simultaneously enables recursive real-time/causal inference, non-
causal inference, and inference in the presence of missing observations. Such flexible inference is critical to
enable both real-time neurotechnology and accurate testing of scientific hypotheses, but is not achieved by
prior neural network models of population activity.
Dynamical modeling of sequential data has great importance for other domains such as video processing
[128, 141, 142] or natural language processing [143, 144], with great progress made to date. However, neural
data introduce several distinct challenges that DFINE’s modeling architecture was specifically designed to
51
DFINE (supervised)
Condition-average Single-trial
Filtering Smoothing Filtering Smoothing
a b
DFINE (unsupervised)
c
Behavior prediction
accuracy (CC, n = 35)
Observed datapoint ratio
= 0.5
d
Behavior prediction
accuracy (CC, n = 35)
Behavior prediction
accuracy (CC, n = 15)
Behavior prediction
accuracy (CC, n = 15)
Dim 1
Dim 2
Dim 3
Dim 1
Dim 2
Dim 3
Dim 1
Dim 2
Dim 3
Dim 1
Dim 2
Dim 3
e
0.2 0.4 0.6 0.8 1
Observed datapoint ratio
0.3
0.4
0.5
0.6
0.7
0.8
0.2 0.4 0.6 0.8 1
Observed datapoint ratio
0.3
0.4
0.5
0.6
0.7
0.8
Filtering
Smoothing
0.2 0.4 0.6 0.8 1
Observed datapoint ratio
0.4
0.5
0.6
0.7
0.8
Filtering
Smoothing
0.2 0.4 0.6 0.8 1
Observed datapoint ratio
0.4
0.5
0.6
0.7
0.8
Filtering
Smoothing
Filtering
Smoothing
0.2 0.4 0.6 0.8 1
Observed datapoint ratio
0.4
0.5
0.6
0.7
0.8
0.9
0.2 0.4 0.6 0.8 1
Observed datapoint ratio
0.4
0.5
0.6
0.7
0.8
0.9
Filtering
Smoothing
Filtering
Smoothing
Behavior prediction
accuracy (CC, n = 35)
Behavior prediction
accuracy (CC, n = 35)
Figure 4.6: DFINE can perform both causal and non-causal inference with missing observations and do so
more accurately through non-causal inference. (a-b) Examples of condition-average (a) and single-trial (b)
latent factor trajectories for filtering and smoothing inference with missing observations in the 3D reach-
and-grasp task. Both DFINE filtering and smoothing captured the low-dimensional structure in single-trials,
with smoothing doing so more accurately than filtering. (c-e) Behavior prediction accuracies of filtering and
smoothing inferences are shown across various observed datapoint ratios, for 3 different datasets.
52
a
Behavior prediction
accuracy (CC, n = 35)
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Observed datapoint ratio
0.2
0.4
0.6
0.8
SAE
DFINE
0
20
40
60
Percentage drop in
behavior prediction
accuracy (n = 35)
From 1 to 0.3
From 1 to 0.6
Change in observed datapoint ratio
***
***
b
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Observed datapoint ratio
0.2
0.4
0.6
0.8
0
20
40
60
From 1 to 0.3
From 1 to 0.6
Change in observed datapoint ratio
***
***
Behavior prediction
accuracy (CC, n = 15)
Percentage drop in
behavior prediction
accuracy (n = 15)
SAE
DFINE
c d
0
20
40
60
Percentage drop in
behavior prediction
accuracy (n = 35)
From 1 to 0.3
From 1 to 0.6
Change in observed datapoint ratio
***
***
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Observed datapoint ratio
0.2
0.4
0.6
0.8
Behavior prediction
accuracy (CC, n = 35)
e f
SAE
DFINE
SAE
DFINE (unsupervised)
SAE
DFINE (unsupervised)
SAE
DFINE (unsupervised)
Figure 4.7: DFINE outperforms SAEs in the presence of missing observations and this improvement grows
with more missing samples. (a) DFINE and SAE’s behavior prediction accuracy across various observed
datapoint ratios in the 3D naturalistic reach-and-grasp task. Given models trained on fully observed neural
observations, we inferred latent factors in the test set that had missing observations. SAE did so by imputing
missing observations in the test set to zero as done previously [74, 75], whereas DFINE did so through its
new flexible inference method. We then used the inferred factors in the test set to predict behavior variables.
This process was done at 0.3 and 0.6 observed datapoint ratios. For both models, we show the behavior
prediction accuracy of the smoothed latent factors. (b) The percentage drop in the behavior prediction
accuracy of DFINE and SAE as we vary the observed datapoint ratio from 1 to 0.3 and from 1 to 0.6. The
percentage drop in behavior prediction accuracy of DFINE is significantly lower than that of SAE, showing
that DFINE can better compensate for missing observations. Similar results held for the 2D random target
reaching task (c, d), and for the 2D grid reaching task (e, f).
53
handle. First, to consider the noisy nature of neural activity, we model and learn stochastic noise variables
in DFINE. This allows us to model uncertainties and to fit the noise values to any specific dataset across
diverse tasks, brain regions, and neural data modalities (e.g., spiking or LFP). In contrast, because video
or text observations are deterministically observed or are much less noisy, in applications of videos and
text, the stochastic noise variables are not necessarily included in the modeling or are tuned manually as
hyperparameters [128, 141–143]. In addition, instead of the common method of optimizing the evidence
lower bound (ELBO) of data likelihood when noise variables exist [29, 66, 128], we optimize the prediction
accuracy of data into the future because optimizing ELBO could be challenging [129–132] and a major
goal of a dynamic model is future prediction; indeed, we observed instabilities when using ELBO as our
optimization cost. Many works that optimize ELBO, instead of optimizing the actual theoretical ELBO cost
function, tweak its component for more stabilities, such as modifying the KL divergence weight in the cost
function [129–132]. In turn, the future prediction loss for us did not require such tuning which helped a
lot with the robustness of the learning. Finally, unlike these video/text applications, we also developed a
supervised learning algorithm for DFINE to allow for modeling two different but related sets of sequential
data (neural activity and behavior here) and motivate the latent factors to be predictive of not just one but
both time-series/sequences. In addition to modeling of neural/behavior data in neuroscience, this supervised
learning algorithm could show promise for future applications of video/text data where another measured
time-series/sequence is also of interest, e.g., inferring latent factors of videos regarding movements [145].
54
Chapter 5
Training artificial recurrent neural networks that recapitulate the latent
manifold structure in the neural population activity
Prior studies of neural population activity have shown convergent evidence on the existence of meaningful
low-dimensional structure in the neural population activity [6, 10, 13, 15–17, 19, 20, 26, 29, 35, 56, 58, 62,
78, 80, 117–122]. Notably, by looking at the low-dimensional dynamics of the neural population activity
during movements, consistent evidence about the existence of rotational dynamics have been shown [17–
19, 24, 25, 56, 78–80, 118, 137, 146]. These visual investigations have been further validated by recent
works that have presented evidence about the existence of ring-like and toroidal latent manifold structure
in the neural population activity using topological methods [62, 121]. As our results in Chapter 4 suggests,
we also observed a ring-like manifold structure in the low-dimensional latent factor structure of the neural
population activity during movements. This observation was cross-validated with my lab mate’s, Han-Lin
Hsieh, direct topological analysis of the neural population activity in the 3D naturalistic reach-and-grasp
dataset. Despite all the evidence, one question that remains elusive is whether the existence of a ring-like
manifold structure in the neural population activity during movements is a realistic phenomenon in the
natural dynamical systems.
Here, we use a specific form of artificial neural networks, recurrent neural networks (RNN) as a general-
ized model of natural dynamical systems. RNNs have artificial units and propagate the dynamical informa-
tion via a hidden state. In general, the use of artificial neural networks have been emerging in the studies of
the neural population activity. There are various ways to leverage neural networks: 1) using them as models
of neural data within unsupervised [29, 64, 65, 68, 69] or supervised learning methods [70–72], e.g., the
DFINE model introduced in Chapter 4. 2) using them to model the networks of neurons in the brain during
55
specific tasks [28, 78, 80–82]. Here, we use the second approach to mimic brain mechanisms in generating
movements.
Here, we train RNNs on the naturalistic 3D reach-and-grasp data performed by non-human primates
(Appendix A.2). The RNNs are trained to generate the subjects’ hand kinematics only from the task in-
structions/targets and independent from recorded neural activity. After training the RNNs, we compare their
artificial units’ activity with the recorded neural activity to assess similarities. We show that the ring-like
manifold structure observed in the recorded neural activity was also present in the artificial units’ activ-
ity, validating the hypothesis that such observation is a realistic phenomenon in natural dynamical systems.
Additionally, we show that the role of the ring-like manifold’s loop coordinate and other complementary
coordinates, called appending dimensions, were similar in RNNs and the brain. These results provide fur-
ther evidence on the view of modeling the brain as a dynamical system [147] and further emphasize on the
importance of studying rotational dynamics and ring-like manifolds in the neural population activity.
5.1 Methods
We train RNNs on the 3D naturalistic reach-and-grasp data sampled at 100Hz, where the subjects perform
3D hand movements to diverse locations in front of them (Appendix A.2). To do this, we assume that
the subjects perform the task given some external task instructions or internal task targets. These task
instructions/targets are the input to the brain and thus similarly the input to RNNs (Fig. 5.1a). Similar to
how the brain is ultimately in charge of generating motor outputs given the task instructions/targets, RNNs
are trained to generate the behavior variables, i.e., 3D hand kinematics (Fig. 5.1a). We take behavior
variables as the position of the 3D hand kinematics, and to form the task instructions/targets, we extract the
locations where the subjects reached to or returned to given the velocity data. We use step signals (zero-order
hold) to generate the targets signal from the extracted locations (Fig. 5.1a).
5.1.1 Details of RNN training
We train RNNs in form of long-short term memory (LSTM) networks, with 50 artificial units taken as the
size of the hidden state. RNNs are trained in the sequence-to-sequence fashion, where the input sequence is
the 3D task instructions, and the output sequence is the 3D kinematics data. Batch-based gradient descent
implemented with the ADAM optimizer is used to train the network with the objective of minimizing the
56
mean-squared error (MSE) of the output reconstruction. Each batch contained 128 4-second-long data
segments of the continuous data. Each data segment was randomly drawn from the continuous data by
uniformly choosing the start point and extracting 4-second-long data after that. To enforce positiveness of
the firing rates, we took absolute value of the RNN hidden state as the RNNs activity. The RNN activity
ultimately generated output, kinematics, with a linear projection. We put regularization weights 10
− 3
for
RNN recurrence weight matrix, output projection matrix, and RNN’s hidden state norm to avoid unrealistic
firing rates. We use initial learning rate of 0.01 and Tensorflow 1’s default momentum parameters for the
ADAM optimizer. Prior to training, the behavior signal is z-scored to remove any scaling imbalances in the
3D Cartesian coordinates. We train the networks for 2000 iterations for all available experimental sessions
in two subjects, Monkey J and C.
5.1.2 Cannonical correlation analysis between the RNN and neural activity
After training the RNN networks, we compare the RNN activity and the recorded neural activity. To do so,
we look for correlation coefficient between the directions in the shared subspace using cannonical correlation
analysis (CCA). CCA obtains the paired directions in the RNN and neural subspaces that are maximally
correlated. We calculate the correlation coefficient between these paired directions. To get the chance-level
of CCA correlation coefficients, we repeat the CCA with an untrained and randomly initialized RNN which
still takes the task instructions/target as input. In addition to correlation coefficient, CCA provides us with
the directions of RNN and neural activity in the same subspace, that can be further investigated.
5.1.3 Topological data analysis
Similar as explained in Section 4.1.7, we use topological data analysis (TDA) to validate the existence of
the ring-like manifold structure in the artificial units subspace.
5.1.4 Partitioning the space into the loop coordinate and appending dimensions
After validating that there exists a ring-like manifold in the RNN activity subspace (assuming existing in
R
n
y
), we explain each point in the Cartesian space with two new sets of coordinates: 1) The loop coor-
dinate θ
l
∈R
1
explaining the location of any point projected on the ring-like manifold, and 2) appending
dimensions φ
app
∈R
n
y
− 1
explaining the rest of the information as a vector from the projected point on the
57
ring-like manifold to the point. To do this we leveraged a framework for loop fitting in the high-dimensional
data, where the loop is fitted using spline fitting and the loop coordinate and the appending dimensions are
extracted given geometrical properties. This framework is developed my my lab mate, Han-Lin Hsieh, and
We skip the details as it is beyond the scope of this thesis.
a b
Behavior
x
y
z
Target
position
x
y
z
Brain
RNN
Neural activity RNN activity
Compare
Cannonical
correlation
analysis
Figure 5.1: Training task RNNs. (a) Given the task instructions/targets, we train RNNs to generate the
behavior variables, i.e., hand kinematics. (b) After training the RNNs, we compare the RNN activity to the
recorded neural activity.
5.2 Results
5.2.1 CCA reveals a striking similarity between the RNN and neural activity
We performed CCA between the RNN activity and neural activity and observed that there existed a striking
similarity between the them. We observed that the CCA correlation coefficients between the RNN and
neural activity were significantly higher than chance-level, indicating the meaningful similarities between
the two subspaces. This result validated that the high CCA values between the RNN and neural activity was
not simply due to similar task instructions/targets, since chance-level networks were randomly initialized
RNNs that receive task instructions/targets as input (Fig. 5.2a). This result held for top 3 CCA directions
and all 50 CCA directions (Fig. 5.2a). Fig. 5.2b shows the first 3 paired directions that CCA for an example
experimental session. In addition to significant CCA correlation coefficients, the pair-wise CCA directions
in the RNN supspace and neural subspace looked very similar to each other, indicating the existence of a
similar manifold structure in both subspaces (Fig. 5.2b).
58
c
0 0.1 0.2 0.3 0.4
Epsilon
0
50
100
TDA 1D hole ID
Persistent 1D hole
Neural manifold
RNN manifold
t = 1 t = T
Movement type 1
Movement type 2
Dim 1
Dim 2
Dim 3
Dim 1
Dim 2
Dim 3
b
0
0.2
0.4
0.6
0.8
1
CCA
mean CC
0
0.2
0.4
CCA
mean CC
a
Monkey J
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
CCA
mean CC
CCA
mean CC
*** *** ***
***
Monkey C
First 3 All 50 First 3 All 50
Trained networks
Chance networks
Figure 5.2: RNN’s artificial units’ activity were similar to that of the recorded neural activity. (a) CCA
correlation coefficient (CC) between the RNN and neural activity was significantly higher than chance-
level CC. The results were consistent for both subjects when averaging the correlation coefficients of the
top 3 CCA directions (left panels) or all 50 CCA directions (right panels). Each dot represents the mean
CCA CC value for an experimental session and a cross-validation fold across pair-wise directions. (b) The
first 3 CCA directions for both recorded and artificial activity exhibited a very similar ring-like manifold
structure. Each trajectory is the condition-average trajectory colored based on the temporal evolution. Each
condition represents two different movement types determined by partitioning the reached targets into 2
clusters. (c) Similar to the recorded neural activity, we observed a ring-like manifold structure in the RNN
activity determined by TDA. Each bar represent the birth and the death of the TDA’s 1-dimensional holes.
The existence of a single persistent 1-dimensional hole represents a ring-like manifold structure.
5.2.2 RNN activity exhibited a ring-like manifold structure
As shown in Fig. 5.2b, similar to the neural activity, RNN activity also visually exhibited a ring-like mani-
fold structure. To validate this hypothesis, we performed TDA analysis on the 50-dimensional RNN activity.
Fig. 5.2c represents the TDA results for an example experimental session, where a clear persistent 1D hole
was observed. A persistent 1D hole in the presence of a persistent 2D hole (not shown here), determines a
ring-like manifold structure [136]. These results suggest that the emergence of a ring-like manifold structure
during movements is a realistic phenomenon in the naturalistic systems.
59
5.2.3 The loop coordinate explains the course of movements in the x-y direction while other
appending dimensions explain the target variability in the z direction
After validating that there existed a ring-like manifold in the RNN activity, we questioned whether and how
this ring-like manifold was related to the task. Given the process explained in 5.1.4, we divided theR
n
y
space
into the loop coordinate and appending dimensions. We then asked whether and how the loop coordinate and
appending dimensions are related to the movement. To do this we performed linear regression from these
coordinates and calculated the behavior prediction accuracy with them. Interestingly, we observed that in
both recorded neural activity (not shown here) and RNN activity the loop coordinate explained the variations
in the x-y directions of the movement (Fig. 5.3). The x-y direction of the movement contained most of the
variation in the forward-backward movements. In addition, the loop coordinate minimally predicted the
kinematics z direction which was the direction that had the most variability in the targets. Unlike the loop
coordinate, the appending dimensions were informative of the z direction, i.e., targets, and not the x-y
directions (Fig. 5.3). These results not only validated that the RNNs had similar manifold structure to that
in the brain, but also showed that they encoded the movement information similar to the brain.
x
0
0.2
0.4
0.6
0.8
1
y z
x
0
0.2
0.4
0.6
0.8
1
Behavior prediction
accuracy (CC)
y z
Loop coordinate +
appending dimensions
Loop coordinate
Appending dimensions
Monkey J Monkey C
Behavior prediction
accuracy (CC)
Figure 5.3: For both subjects, the loop coordinate contained most of the information about the x-y direction
of the movement while the appending dimensions contained most of the information about the z direction.
Each dot represents the behavior prediction accuracy for an experimental session and a cross-validation fold.
This is inline with Fig. 5.2b where the loop (dim 1 and 2) contains the temporal evolution of the movement
while the targets information is encoded in an quasi-orthogonal dimension (dim 3) captured by appending
dimensions.
60
5.3 Discussion
Here, we trained artificial RNNs to perform a naturalistic 3D reach-and-grasp task. We showed that the
trained RNNs exhibited similar ring-like manifold structure as the recorded neural population activity. The
role of the loop coordinate and the appending dimensions of the ring-like manifolds were also preserved in
both RNN and neural activity subpsace. These results suggest that the existence of such ring-like manifold
structure is a realistic phenomenon in the natural dynamical systems. The idea of using artificial neural net-
works to test the plausibility of a certain phenomenon in the brain is an emerging technique in computational
neuroscience. Convolutional neural networks have been shown to accurately replicate the brain’s ventral vi-
sual system [82] and recurrent neural networks have been extensively used for motor task [28, 78, 80, 81].
Even though, some of these works validated rotatory characteristics of the low-dimensional dynamics in the
artificial units’ activity [28, 78], they have not investigated the manifold structure as we do here. In addition,
we made an interesting observation about the role of various components of the subspace in the control of the
movement. We showed that the loop coordinate of the ring-like manifold structure maintained information
about the temporal evolution of the movement (forward-backward movements in the x-y directions). The
remaining information in the subspace, which was summarized by appending dimensions, contained infor-
mation about the target variability. This observation was made in both recorded neural and RNN activity. It
is worth noting that the artificial RNNs were trained completely independent of the recorded neural activity.
Therefore, these compatible observations are not simply due to sharing the neural activity, but emphasize
on the prevalence of such manifold structures in the dynamical systems, whether artificial or the brain, in
generating the behavior.
61
Chapter 6
Conclusion and ongoing work
In this thesis, we have discussed various methods to extract low-dimensional latent factors of the neural
population activity. We discussed multiscale EM algorithm, that extracts the low-dimensional latent factors
from combined spiking-field activity (Chapter 2). We also introduced DFINE, a novel neural network model
of neural population activity, that enables flexible inference of the latent factors in the presence of missing
observations while modeling and incorporating nonlinearities (Chapter 4). It is worth noting that methods
such as multiscale EM and DFINE, dynamically extract the low-dimensional latent factors by taking into
account the temporal correlations given their auto-regressive structure. As we discussed in this thesis, ex-
tracting the latent factors could lead to new scientific discoveries (Chapters 3 and 5), and lead to better
neurotechnology by improving the performance (Chapter 4). Given significant evidence in the recent years,
it is imperative to build models characterizing the neural population activity considering the spatiotemporal
correlations, and statistical machine learning tools can be used to improve such models.
As the later chapters of this thesis suggest, the recent advancements in deep learning can be significantly
useful to build better models of neural population activity. A lot of real-world problems can be extended to
the applications of computational neuroscience and brain-machine interfaces. One example of such tech-
niques is contrastive learning. In general, contrastive learning framework learns a model given observations
by optimizing specific aspects in the representation space. This is unlike predictive modeling that is heavily
used in many of the generative/supervised models, which directly optimizes a cost function in the observa-
tion space. For examples, variational auto-encoders [104], normalizing flows [148], generative adversarial
networks [149] and many other well-known methods in deep learning follow predictive modeling as they
are optimized to better reconstruct the data. However, there is usually a ton of redundant and useless infor-
mation in the observations, e.g., noise, that could give an edge to contrastive learning approaches [150]. A
62
popular direction of contrastive learning approaches is their combination with self-supervised learning for
applications where few human-annotated labels are available [151–154]. These works leverage data aug-
mentations and contrastive learning to learn powerful encoders of observations in an unsupervised manner
and independent of the labels. The encoders are optimized in the representation space, thus are optimized
via contrastive learning. A clear application of such methods for computational neuroscience, is application
to datasets where few labels are available, for example, decoding mood variations from human participants
[47]. In such datasets, even though subjects’ neural activity are continuously recorded, infrequent mood
reports are recorded given the challenges of mood evaluations. Such mood reports are filled by participants
1-2 times a day, thus making building meaningful models from neural activity to mood reports challenging.
As anticipated, the mood reports could also be highly noisy due to the concept of the mood and the partic-
ipants may make mistakes while filling the reports. This makes building powerful decoders directly from
neural activity to mood reports a difficult learning problem. Self-supervised contrastive learning frame-
works could be significantly helpful in this application, because they do not rely on human-annotated labels
to learn meaningful representations of the data [151–154]. Therefore, we introduce a self-supervised learn-
ing framework to build mood decoders from data where infrequent mood labels are available (Fig. 6.1). In
summary, we first learn powerful encoders from the neural population activity (Fig. 6.1a), which is done
by self-supervision via augmentations (Fig. 6.1b) and contrastive learning (Fig. 6.1c). We then use the
learned encoder, to learn a mood classifier on mood reports at time-points where they are available (Fig.
6.1d). Since the contrastive self-supervised encoder is learned purely from observations, it does not suffer
from few/wrong labels and could indeed capture meaningful information from the data directly as shown
in applications of computer vision [151–154]. Another direction for contrastive learning could be mixing
ideas such as contrastive predictive coding [150] with architectures such as DFINE. Since DFINE allows
for futures-step-ahead prediction, it can be seamlessly used in the contrastive predictive coding’s learning
framework where dynamic latent factor is the context representation and manifold latent factor is the repre-
sentation after the encoder. Unlike the original work [150] where the future-step-ahead prediction is learned
separately for each k-steps-ahead, DFINE can infer it directly from the model parameters.
Overall, recent techniques in deep learning could be really useful to study the brain and improve neu-
rotechnology. Here, I list various methods that have not been thoroughly used in the computational neuro-
science field yet, and have merits for exploration and use to study the brain. Normalizing flows are interest-
ing generative models [148, 155] that are based on the idea that any data distribution can be approximated
63
Representation space
Attract
Repel
Representation
Mood
report
c
Cut out Additive Gaussian
noise
Amplitude
scale
Stochastic cascade
(proposed)
Infrequent
mood reports
Augment
Encoder
Augment Augment Augment
a
Self-supervised
deep
learning
Augment
b
d
Mood
classier
Continuous
neural activity
Representation
1
2
1
2
sim( , ): similarity score
. . .
Enc.
Time shift Time scale
. . .
Enc.
. . .
Enc.
. . .
Enc.
. . .
Enc.
Data segment
Neural data segments
sim( , )
sim( , ) sim( , ) sim( , ) + +
1 2
1 2 1 1 1 2
Maximize:
Time
Figure 6.1: Self-supervised deep learning framework for high-dimensional neural activity and infrequent
mood reports. (a) The encoder network extracts low-dimensional representations from augmented versions
of neural data. (b) We propose a stochastic cascade of various augmentations to provide a wider set of
augmentations. (c) The contrastive loss helps the encoder network to learn representations that are close
for augmented versions of the same segment of neural data (attract them), and far for augmentations of two
different segments (repel them). (d) The learned representations are used to train a mood classifier using the
available mood-labelled neural data segments.
by a nonlinear transformation of a Gaussian distribution, and the encoder is formed by some invertible oper-
ations, making it easy to derive the decoder based on the encoder [148, 155]. The downside of normalizing
flows, in my opinion, is their inability for dimensionality reduction as the dimension of the latent factors
is usually similar to the dimension of the observations. Another recent advancement that gained popular-
ity is training generative models based on diffusion models and score-based generative models [156]. In
score-based matching, instead of modeling the data density directly, the score function is modeled. The
score function is the gradient of the data density with respect to the observations. The advantage is that the
score-based modeling remove the difficulty of intractable normalizing constants in typical log-likelihood
learning. After learning the score-based model, the generative process can be done by a sampling technique
such as Langevian dynamics [156]. These ideas are used in the state-of-the-art deep learning methods for
text to image generation [157], but again the issue with diffusion models is that the latent dimension is
similar to the data dimension. The contrastive learning framework used in the state-of-the-art models such
64
as contrastive language-image pre-training (CLIP) can also be very useful in some applications of neuro-
science for multi-modal data such as simultaneously modeling spiking and LFP activity [158]. In addition
to these frameworks, I personally believe graph neural network learning is highly under-appreciated in the
computational neuroscience field. I believe graph neural networks can play a key role not only in model-
ing the data, but also in inferring biophysiological information when modeling the brain as a network, e.g.,
inter-neuronal/inter-regional interactions during tasks.
65
References
1. Evarts, E. V . Relation of pyramidal tract activity to force exerted during voluntary movement. Journal
of Neurophysiology 31, 14–27. doi:10.1152/jn.1968.31.1.14 (1968).
2. Tanji, J. & Evarts, E. V . Anticipatory activity of motor cortex neurons in relation to direction of an
intended movement. Journal of Neurophysiology 39. ISBN: 0022-3077 (Print), 1062–1068. doi:10.
1152/jn.1976.39.5.1062 (1976).
3. Georgopoulos, a. P., Kalaska, J. F., Caminiti, R. & Massey, J. T. On the relations between the direction
of two-dimensional arm movements and cell discharge in primate motor cortex. J.Neurosci. 2(11).
ISBN: 02706474, 1527–1537. doi:citeulike-article-id:444841 (1982).
4. Moran, D. W. & Schwartz, A. B. Motor cortical representation of speed and direction during reaching.
J.Neurophysiol. 82. ISBN: 0022-3077 Publisher: Am Physiological Soc, 2676–2692. doi:10.1152/
jn.1999.82.5.2676 (1999).
5. Shanechi, M. M. Brain-Machine Interface Control Algorithms. IEEE Trans. Neural Syst. Rehabil.
Eng. 25. Publisher: IEEE, 1725–1734 (2017).
6. Shanechi, M. M. Brain-machine interfaces from motor to mood. Nature Neuroscience 22, 1554–1564.
doi:10.1038/s41593-019-0488-y (2019).
7. Gilja, V ., Nuyujukian, P., Chestek, C. A., Cunningham, J. P., Yu, B. M., Fan, J. M., et al. A high-
performance neural prosthesis enabled by control algorithm design. Nature Neuroscience 15. ISBN:
1546-1726 (Electronic)$\backslash$r1097-6256 (Linking) Publisher: Nature Publishing Group eprint:
arXiv:1211.4487v1, 1752–1757. doi:10.1038/nn.3265 (2012).
8. Hochberg, L. R., Serruya, M. D., Friehs, G. M., Mukand, J. A., Saleh, M., Caplan, A. H., et al.
Neuronal ensemble control of prosthetic devices by a human with tetraplegia. Nature 442, 164–171.
doi:10.1038/nature04970 (2006).
66
9. Scott, S. H. Inconvenient Truths about neural processing in primary motor cortex. Journal of Phys-
iology 586. ISBN: 1469-7793 (Electronic), 1217–1224. doi:10.1113/jphysiol.2007.146068
(2008).
10. Churchland, M. M. & Shenoy, K. V . Temporal Complexity and Heterogeneity of Single-Neuron Ac-
tivity in Premotor and Motor Cortex. Journal of Neurophysiology 97, 4235–4257. doi:10.1152/jn.
00095.2007 (2007).
11. Fetz, E. E. Are movement parameters recognizable coded in the activity of single neurons? Behav-
ioral and brain sciences 15, 679–690 (1992).
12. Gao, P. & Ganguli, S. On simplicity and complexity in the brave new world of large-scale neu-
roscience. Current Opinion in Neurobiology 32. ISBN: 0959-4388 Publisher: Elsevier Ltd eprint:
1503.08779, 148–155. doi:10.1016/j.conb.2015.04.003 (2015).
13. Cunningham, J. P. & Yu, B. M. Dimensionality reduction for large-scale neural recordings. Nature
Neuroscience 17. ISBN: 1097-6256 eprint: 15334406, 1500–1509. doi:10.1038/nn.3776 (2014).
14. Gallego, J. A., Perich, M. G., Miller, L. E. & Solla, S. A. Neural Manifolds for the Control of Move-
ment. Neuron 94. ISBN: 9783902661623 Publisher: Elsevier Inc. eprint: 10.1.1.91.5767, 978–984.
doi:10.1016/j.neuron.2017.05.025 (2017).
15. Vyas, S., Golub, M. D., Sussillo, D. & Shenoy, K. V . Computation Through Neural Population Dy-
namics. Annual Review of Neuroscience 43. eprint: https://doi.org/10.1146/annurev-neuro-092619-
094115, 249–275. doi:10.1146/annurev-neuro-092619-094115 (2020).
16. Pandarinath, C., Ames, K. C., Russo, A. A., Farshchian, A., Miller, L. E., Dyer, E. L., et al. Latent
Factors and Dynamics in Motor Cortex and Their Application to Brain–Machine Interfaces. The
Journal of Neuroscience 38. ISBN: 9781627480031 eprint: 0307085, 9390–9401. doi:10.1523/
jneurosci.1669-18.2018 (2018).
17. Susilaradeya, D., Xu, W., Hall, T. M., Gal´ an, F., Alter, K. & Jackson, A. Extrinsic and intrinsic
dynamics in movement intermittency. eLife 8. Publisher: NLM (Medline). doi:10.7554/eLife.
40145.001 (2019).
18. Hall, T. M., DeCarvalho, F. & Jackson, A. A Common Structure Underlies Low-Frequency Cortical
Dynamics in Movement, Sleep, and Sedation. Neuron 83. Publisher: Cell Press, 1185–1199. doi:10.
1016/j.neuron.2014.07.022 (2014).
67
19. Churchland, M. M., Cunningham, J. P., Kaufman, M. T., Foster, J. D., Nuyujukian, P., Ryu, S. I., et
al. Neural Population Dynamics During Reaching. Nature 487. ISBN: 6176321972 Publisher: Nature
Publishing Group eprint: NIHMS150003, 1–20. doi:10.1016/j.micinf.2011.07.011.Innate
(2012).
20. Sadtler, P. T., Quick, K. M., Golub, M. D., Chase, S. M., Ryu, S. I., Tyler-Kabara, E. C., et al.
Neural constraints on learning. Nature 512. ISBN: 1476-4687 (Electronic)$\backslash$n0028-0836
(Linking) Publisher: Nature Publishing Group, 423–426. doi:10.1038/nature13665 (2014).
21. Golub, M. D., Sadtler, P. T., Oby, E. R., Quick, K. M., Ryu, S. I., Tyler-Kabara, E. C., et al. Learn-
ing by neural reassociation. Nature Neuroscience 21. Publisher: Nature Publishing Group, 607–616.
doi:10.1038/s41593-018-0095-3 (2018).
22. Michaels, J. A., Dann, B. & Scherberger, H. H. Neural Population Dynamics during Reaching Are
Better Explained by a Dynamical System than Representational Tuning. PLoS Computational Bi-
ology 12. ISBN: 1553-734x Publisher: Public Library of Science, 1–22. doi:10.1371/journal.
pcbi.1005175 (2016).
23. Vaidya, M., Kording, K., Saleh, M., Takahashi, K. & Hatsopoulos, N. G. Neural coordination during
reach-to-grasp. Journal of Neurophysiology 114. ISBN: 0022-3077 Publisher: Am Physiological Soc,
1827–1836. doi:10.1152/jn.00349.2015 (2015).
24. Yu, B. M., Cunningham, J. P., Santhanam, G., Ryu, S. I., Shenoy, K. V ., Sahani, M., et al. Gaussian-
Process Factor Analysis for Low-Dimensional Single-Trial Analysis of Neural Population Activity.
Journal of Neurophysiology 102. ISBN: 0022-3077 eprint: arXiv:1011.1669v3, 614–635. doi:10.
1152/jn.90941.2008 (2009).
25. Macke, J. H., Buesing, L., Cunningham, J. P., Yu, B. M., Shenoy, K. V . & Sahani, M. Empirical
models of spiking in neuronal populations. In Advances in Neural Information Processing Systems
(NIPS) 24. ISBN: 9781618395993, 1–9. doi:10.1.1.230.7630 (2011).
26. Gallego, J. A., Perich, M. G., Naufel, S. N., Ethier, C., Solla, S. A. & Miller, L. E. Cortical population
activity within a preserved neural manifold underlies multiple motor behaviors. Nature Communica-
tions 9. Publisher: Springer US, 1–13. doi:10.1038/s41467-018-06560-z (2018).
27. Gallego, J. A., Perich, M. G., Chowdhury, R. H., Solla, S. A. & Miller, L. E. Long-term stability of
cortical population dynamics underlying consistent behavior. en. Nature Neuroscience 23. Number:
2 Publisher: Nature Publishing Group, 260–270. doi:10.1038/s41593-019-0555-4 (2020).
68
28. Sussillo, D., Churchland, M. M., Kaufman, M. T. & Shenoy, K. V . A neural network that finds a
naturalistic solution for the production of muscle activity. Nature Neuroscience 18. ISBN: 0018-9294
Publisher: Nature Publishing Group eprint: NIHMS150003, 1025–1033. doi:10.1038/nn.4042
(2015).
29. Pandarinath, C., O’shea, D. J., Collins, J., Jozefowicz, R., Stavisky, S. D., Kao, J. C., et al. Inferring
single-trial neural population dynamics using sequential auto-encoders. Nature Methods 15. ISBN:
1087-0792 Publisher: Springer US eprint: 152884, 805–815. doi:10.1038/s41592-018-0109-9
(2018).
30. Flint, R. D., Lindberg, E. W., Jordan, L. R., Miller, L. E. & Slutzky, M. W. Accurate decoding of
reaching movements from field potentials in the absence of spikes. J. Neural Eng. 9, 46006 (2012).
31. Perel, S., Sadtler, P. T., Oby, E. R., Ryu, S. I., Tyler-Kabara, E. C., Batista, A. P., et al. Single-unit
activity, threshold crossings, and local field potentials in motor cortex differentially encode reach
kinematics. Journal of Neurophysiology 114. ISBN: 1412268400 Publisher: American Physiological
Society Bethesda, MD, 1500–1512. doi:10.1152/jn.00293.2014 (2015).
32. Bansal, A. K., Truccolo, W., Vargas-Irwin, C. E. & Donoghue, J. P. Decoding 3D reach and grasp
from hybrid signals in motor and premotor cortices: spikes, multiunit activity, and local field po-
tentials. Journal of Neurophysiology 107. ISBN: 1522-1598 (Electronic)$\backslash$r0022-3077
(Linking) Publisher: American Physiological Society Bethesda, MD, 1337–1355. doi:10.1152/
jn.00781.2011 (2012).
33. Hsieh, H.-L., Wong, Y . T., Pesaran, B. & Shanechi, M. M. Multiscale modeling and decoding algo-
rithms for spike-field activity. Journal of neural engineering 16. Publisher: IOP Publishing, 016018
(2018).
34. Mehring, C., Rickert, J., Vaadia, E., de Oliveira, S. C., Aertsen, A. & Rotter, S. Inference of hand
movements from local field potentials in monkey motor cortex. Nature Neuroscience 6. ISBN: 1097-
6256 (Print)$\backslash$n1097-6256 (Linking) Publisher: Nature Publishing Group, 1253–1254.
doi:10.1038/nn1158 (2003).
35. Stavisky, S. D., Kao, J. C., Nuyujukian, P., Ryu, S. I. & Shenoy, K. V . A high performing brain-
machine interface driven by low-frequency local field potentials alone and together with spikes.
Journal of neural engineering 12. ISBN: 1741-2552 Publisher: IOP Publishing, 036009. doi:10.
1088/1741-2560/12/3/036009 (2015).
69
36. Belitski, A., Panzeri, S., Magri, C., Logothetis, N. K. & Kayser, C. Sensory information in local field
potentials and spikes from visual and auditory cortices: time scales and frequency bands. Journal of
computational neuroscience 29, 533–45. doi:10.1007/s10827-010-0230-y (2010).
37. Berens, P., Keliris, G. A., Ecker, A. S., Logothetis, N. K. & Tolias, A. S. Comparing the feature se-
lectivity of the gamma-band of the local field potential and the underlying spiking activity in primate
visual cortex. Frontiers in Systems Neuroscience 2. Publisher: Frontiers, 2. doi:10.3389/neuro.
06.002.2008 (2008).
38. Pesaran, B., Pezaris, J. S., Sahani, M., Mitra, P. P. & Andersen, R. A. Temporal structure in neuronal
activity during working memory in macaque parietal cortex. Nature Neuroscience 5. ISBN: 1097-
6256 eprint: 0309034, 805–811. doi:10.1038/nn890 (2002).
39. Scherberger, H., Jarvis, M. R. & Andersen, R. A. Cortical local field potential encodes movement
intentions in the posterior parietal cortex. Neuron 46. ISBN: 0896-6273 (Print)$\backslash$n0896-
6273 (Linking) Publisher: Cell Press eprint: 1706.02451, 347–354. doi:10.1016/j.neuron.2005.
03.004 (2005).
40. Pesaran, B., Vinck, M., Einevoll, G. T., Sirota, A., Fries, P., Siegel, M., et al. Investigating large-scale
brain dynamics using field potential recordings: analysis and interpretation. Nature Neuroscience.
ISBN: 1546-1726 Publisher: Springer US, 1–17. doi:10.1038/s41593-018-0171-8 (2018).
41. Einevoll, G. T., Kayser, C., Logothetis, N. K. & Panzeri, S. Modelling and analysis of local field
potentials for studying the function of cortical circuits. Nature Reviews Neuroscience 14. ISBN: 1471-
0048 (Electronic)$\backslash$r1471-003X (Linking) Publisher: Nature Publishing Group eprint:
NIHMS150003, 770–785. doi:10.1038/nrn3599 (2013).
42. Buzs´ aki, G., Anastassiou, C. A. & Koch, C. The origin of extracellular fields and currents$ \backslash$textemdash{EEG,
ECoG, LFP and spikes}. Nature reviews neuroscience 13. Publisher: Nature Publishing Group, 407–
420 (2012).
43. Smith, A. C. & Brown, E. N. Estimating a State-Space Model from Point Process Observations.
Neural Computation 15. ISBN: 0899-7667 (Print) Publisher: MIT Press, 965–991. doi:10.1162/
089976603765202622 (2003).
44. Ghahramani, Z. & Hinton, G. E. Parameter Estimation for Linear Dynamical Systems. Technical Re-
port 6. ISBN: 0018-9294 (Print) Publisher: Technical Report CRG-TR-96-2, University of Totronto,
Dept. of Computer Science, 1–6. doi:10.1080/00207177208932224 (1996).
70
45. Markowitz, D. A., Wong, Y . T., Gray, C. M. & Pesaran, B. Optimizing the decoding of movement
goals from local field potentials in macaque cortex. Journal of Neuroscience 31. Publisher: Society
for Neuroscience, 18412–18422. doi:10.1523/JNEUROSCI.4165-11.2011 (2011).
46. So, K., Dangi, S., Orsborn, A. L., Gastpar, M. C. & Carmena, J. M. Subject-specific modulation of
local field potential spectral power during brain–machine interface control in primates. en. Journal
of Neural Engineering 11. Publisher: IOP Publishing, 026002. doi:10.1088/1741-2560/11/2/
026002 (2014).
47. Sani, O. G., Yang, Y ., Lee, M. B., Dawes, H. E., Chang, E. F. & Shanechi, M. M. Mood variations
decoded from multi-site intracranial human brain activity. Nature Biotechnology 36, 954. doi:10.
1038/nbt.4200 (2018).
48. Yang, Y ., Connolly, A. T. & Shanechi, M. M. A control-theoretic system identification framework and
a real-time closed-loop clinical simulation testbed for electrical brain stimulation. Journal of neural
engineering 15. Publisher: IOP Publishing, 66007 (2018).
49. Shanechi, M. M., Orsborn, A. L. & Carmena, J. M. J. M. Robust Brain-Machine Interface Design
Using Optimal Feedback Control Modeling and Adaptive Point Process Filtering. PLoS Computa-
tional Biology 12. ISBN: 10.1371/journal.pcbi.1004730 Publisher: Public Library of Science, 1–29.
doi:10.1371/journal.pcbi.1004730 (2016).
50. Truccolo, W., Eden, U. T., Fellows, M. R., Donoghue, J. P., Brown, E. N. & John, P. A Point Process
Framework for Relating Neural Spiking Activity to Spiking History, Neural Ensemble, and Extrinsic
Covariate Effects. Journal of neurophysiology 93. ISBN: 0022-3077 (Print) Publisher: Am Physio-
logical Soc eprint: NIHMS150003, 1074–1089. doi:10.1152/jn.00697.2004 (2005).
51. Eden, U. T., Frank, L. M., Barbieri, R., Solo, V . & Brown, E. N. Dynamic Analysis of Neural En-
coding by Point Process Adaptive Filtering. Neural Computation 16. ISBN: 0899-7667, 971–998.
doi:10.1162/089976604773135069 (2004).
52. Sadras, N., Pesaran, B. & Shanechi, M. M. A point-process matched filter for event detection and
decoding from population spike trains. Journal of Neural Engineering. doi:10.1088/1741-2552/
ab3dbc (2019).
53. Shanechi, M. M., Orsborn, A. L., Moorman, H. G., Gowda, S., Dangi, S. & Carmena, J. M. Rapid
control and feedback rates enhance neuroprosthetic control. Nature Communications 8. Publisher:
Nature Publishing Group, 1–10. doi:10.1038/ncomms13825 (2017).
71
54. Shanechi, M. M., Hu, R. C., Powers, M., Wornell, G. W., Brown, E. N. & Williams, Z. M. Neural
population partitioning and a concurrent brain-machine interface for sequential motor function. Nat.
Neurosci. 15. Publisher: Nature Publishing Group, 1715–1722. doi:10.1038/nn.3250 (2012).
55. Abbaspourazad, H., Hsieh, H.-L. & Shanechi, M. M. A multiscale dynamical modeling and identi-
fication framework for spike-field activity. IEEE Transactions on Neural Systems and Rehabilitation
Engineering 27. Publisher: IEEE, 1128–1138. doi:10.1109/tnsre.2019.2913218 (2019).
56. Abbaspourazad, H., Choudhury, M., Wong, Y . T., Pesaran, B. & Shanechi, M. M. Multiscale low-
dimensional motor cortical state dynamics predict naturalistic reach-and-grasp behavior. en. Nature
Communications 12. Number: 1 Publisher: Nature Publishing Group, 607. doi:10.1038/s41467-
020-20197-x (2021).
57. Kao, J. C., Nuyujukian, P., Ryu, S. I., Churchland, M. M., Cunningham, J. P. & Shenoy, K. V . Single-
trial dynamics of motor cortex and their applications to brain-machine interfaces. Nature Commu-
nications 6. ISBN: 20411723 (ISSN) Publisher: Nature Publishing Group, 1–12. doi:10.1038/
ncomms8759 (2015).
58. Sani, O. G., Abbaspourazad, H., Wong, Y . T., Pesaran, B. & Shanechi, M. M. Modeling behaviorally
relevant neural dynamics enabled by preferential subspace identification. Nature Neuroscience, 1–10.
doi:10.1038/s41593-020-00733-0 (2020).
59. Kao, J. C., Ryu, S. I. & Shenoy, K. V . Leveraging neural dynamics to extend functional lifetime
of brain-machine interfaces. Scientific Reports 7. ISBN: 4159801706 Publisher: Springer US, 1–16.
doi:10.1038/s41598-017-06029-x (2017).
60. Buesing, L., Macke, J. H. & Sahani, M. Learning stable, regularised latent models of neural pop-
ulation dynamics. Network: Computation in Neural Systems 23. ISBN: 9781627480031 Publisher:
Taylor & Francis eprint: 9605103, 24–47. doi:10.3109/0954898X.2012.677095 (2012).
61. Aghagolzadeh, M. & Truccolo, W. Inference and Decoding of Motor Cortex Low-Dimensional Dy-
namics via Latent State-Space Models. IEEE Transactions on Neural Systems and Rehabilitation
Engineering 24. ISBN: 1534-4320 VO - 24 Publisher: IEEE, 272–282. doi:10.1109/TNSRE.2015.
2470527 (2016).
62. Chaudhuri, R., Gerc ¸ek, B., Pandey, B., Peyrache, A. & Fiete, I. The intrinsic attractor manifold and
population dynamics of a canonical cognitive circuit across waking and sleep. Nature Neuroscience
72
22. Publisher: Springer Science and Business Media LLC, 1512–1520. doi:10.1038/s41593-019-
0460-x (2019).
63. Low, R. J., Lewallen, S., Aronov, D., Nevers, R. & Tank, D. W. Probing variability in a cognitive map
using manifold inference from neural dynamics. bioRxiv, 418939. doi:10.1101/418939 (2018).
64. Sani, O. G., Pesaran, B. & Shanechi, M. M. Where is all the nonlinearity: flexible nonlinear model-
ing of behaviorally relevant neural dynamics using recurrent neural networks. en. bioRxiv. Company:
Cold Spring Harbor Laboratory Distributor: Cold Spring Harbor Laboratory Label: Cold Spring Har-
bor Laboratory Section: New Results Type: article, 2021.09.03.458628. doi:10.1101/2021.09.03.
458628 (2021).
65. Ye, J. & Pandarinath, C. Representation learning for neural population activity with Neural Data
Transformers. en. bioRxiv. Publisher: Cold Spring Harbor Laboratory Section: New Results, 2021.01.16.426955.
doi:10.1101/2021.01.16.426955 (2021).
66. Gao, Y ., Archer, E. W., Paninski, L. & Cunningham, J. P. Linear dynamical neural population models
through nonlinear embeddings in Advances in Neural Information Processing Systems (2016), 163–
171.
67. She, Q. & Wu, A. Neural Dynamics Discovery via Gaussian Process Recurrent Neural Networks en.
in Uncertainty in Artificial Intelligence ISSN: 2640-3498 (PMLR, 2020), 454–464.
68. Kim, T. D., Luo, T. Z., Pillow, J. W. & Brody, C. Inferring Latent Dynamics Underlying Neural
Population Activity via Neural Differential Equations en. in International Conference on Machine
Learning ISSN: 2640-3498 (PMLR, 2021), 5551–5561.
69. Liu, R., Azabou, M., Dabagia, M., Lin, C.-H., Azar, M. G., Hengen, K. B., et al. Drop, Swap, and
Generate: A Self-Supervised Approach for Generating Neural Activity. en. bioRxiv. Company: Cold
Spring Harbor Laboratory Distributor: Cold Spring Harbor Laboratory Label: Cold Spring Harbor
Laboratory Section: New Results Type: article, 2021.07.21.453285. doi:10.1101/2021.07.21.
453285 (2021).
70. Makin, J. G., Moses, D. A. & Chang, E. F. Machine translation of cortical activity to text with an en-
coder–decoder framework. Nature Neuroscience. Publisher: Nature Publishing Group, 1–8. doi:10.
1038/s41593-020-0608-8 (2020).
73
71. Willett, F. R., Avansino, D. T., Hochberg, L. R., Henderson, J. M. & Shenoy, K. V . High-performance
brain-to-text communication via handwriting. en. Nature 593. Number: 7858 Publisher: Nature Pub-
lishing Group, 249–254. doi:10.1038/s41586-021-03506-2 (2021).
72. Glaser, J. I., Benjamin, A. S., Chowdhury, R. H., Perich, M. G., Miller, L. E. & Kording, K. P.
Machine Learning for Neural Decoding. eNeuro 7, ENEURO.0506–19.2020. doi:10.1523/ENEURO.
0506-19.2020 (2020).
73. Kim, M.-K., Sohn, J.-W. & Kim, S.-P. Decoding Kinematic Information From Primary Motor Cortex
Ensemble Activities Using a Deep Canonical Correlation Analysis. Frontiers in Neuroscience 14,
1083. doi:10.3389/fnins.2020.509364 (2020).
74. Zhu, F., Sedler, A. R., Grier, H. A., Ahad, N., Davenport, M. A., Kaufman, M. T., et al. Deep inference
of latent dynamics with spatio-temporal super-resolution using selective backpropagation through
time. arXiv:2111.00070 [cs, q-bio]. arXiv: 2111.00070 (2021).
75. Lipton, Z. C., Kale, D. C. & Wetzel, R. Modeling Missing Data in Clinical Time Series with RNNs.
arXiv:1606.04130 [cs, stat]. arXiv: 1606.04130 (2016).
76. Che, Z., Purushotham, S., Cho, K., Sontag, D. & Liu, Y . Recurrent Neural Networks for Multivariate
Time Series with Missing Values. en. Scientific Reports 8. Number: 1 Publisher: Nature Publishing
Group, 6085. doi:10.1038/s41598-018-24271-9 (2018).
77. Ghazi, M. M., Nielsen, M., Pai, A., Cardoso, M. J., Modat, M., Ourselin, S., et al. Robust training of
recurrent neural networks to handle missing data for disease progression modeling. arXiv:1808.05500
[cs]. arXiv: 1808.05500 (2018).
78. Remington, E. D., Narain, D., Hosseini, E. A., Correspondence, J. & Jazayeri, M. Flexible Sensori-
motor Computations through Rapid Reconfiguration of Cortical Dynamics. Neuron 98, 1005–1019.
doi:10.1016/j.neuron.2018.05.020 (2018).
79. Russo, A. A., Bittner, S. R., Perkins, S. M., Seely, J. S., London, B. M., Lara, A. H., et al. Motor
Cortex Embeds Muscle-like Commands in an Untangled Population Response. Neuron 97. ISBN:
0896-6273 Publisher: Elsevier Inc., 953–966.e8. doi:10.1016/j.neuron.2018.01.004 (2018).
80. Russo, A. A., Khajeh, R., Bittner, S. R., Perkins, S. M., Cunningham, J. P., Abbott, L. F., et al.
Neural Trajectories in the Supplementary Motor Area and Motor Cortex Exhibit Distinct Geometries,
Compatible with Different Classes of Computation. en. Neuron 107, 745–758.e6. doi:10.1016/j.
neuron.2020.05.020 (2020).
74
81. Mante, V ., Sussillo, D., Shenoy, K. V . & Newsome, W. T. Context-dependent computation by recur-
rent dynamics in prefrontal cortex. Nature 503. ISBN: 1476-4687 (Electronic)$\backslash$n0028-
0836 (Linking) Publisher: Nature Publishing Group eprint: 15334406, 78–84. doi:10.1038/nature12742
(2013).
82. Bashivan, P., Kar, K. & DiCarlo, J. J. Neural population control via deep image synthesis. Science
364. Publisher: American Association for the Advancement of Science. doi:10.1126/science.
aav9436 (2019).
83. Michaels, J. A., Schaffelhofer, S., Agudelo-Toro, A. & Scherberger, H. A goal-driven modular neural
network predicts parietofrontal neural dynamics during grasping. en. Proceedings of the National
Academy of Sciences 117, 32124–32135. doi:10.1073/pnas.2005087117 (2020).
84. Perich, M. G., Arlt, C., Soares, S., Young, M. E., Mosher, C. P., Minxha, J., et al. Inferring brain-wide
interactions using data-constrained recurrent neural network models. en. bioRxiv. Company: Cold
Spring Harbor Laboratory Distributor: Cold Spring Harbor Laboratory Label: Cold Spring Harbor
Laboratory Section: New Results Type: article, 2020.12.18.423348. doi:10.1101/2020.12.18.
423348 (2020).
85. Wu, W., Kulkarni, J. E., Hatsopoulos, N. G. & Paninski, L. Neural decoding of hand motion using a
linear state-space model with hidden states. IEEE Transactions on Neural Systems and Rehabilitation
Engineering 17. ISBN: 6314442508 Publisher: IEEE eprint: 15334406, 370–378. doi:10.1109/
TNSRE.2009.2023307 (2009).
86. Zhuang, J., Truccolo, W., Vargas-Irwin, C. & Donoghue, J. P. Decoding 3-D reach and grasp kinemat-
ics from high-frequency local field potentials in primate primary motor cortex. IEEE Transactions
on Biomedical Engineering 57. ISBN: 1558-2531 (Electronic)$\backslash$n0018-9294 (Linking),
1774–1784. doi:10.1109/TBME.2010.2047015 (2010).
87. Lawhern, V ., Wu, W., Hatsopoulos, N. & Paninski, L. Population decoding of motor cortical activity
using a generalized linear model with hidden states. Journal of Neuroscience Methods 189. ISBN:
0165-0270 Publisher: Elsevier B.V . eprint: NIHMS150003, 267–280. doi:10.1016/j.jneumeth.
2010.03.024 (2010).
88. Mazor, O. & Laurent, G. Transient dynamics versus fixed points in odor representations by lo-
cust antennal lobe projection neurons. Neuron 48. ISBN: 0896-6273 Publisher: Elsevier, 661–673.
doi:10.1016/j.neuron.2005.09.032 (2005).
75
89. Siegel, M., Warden, M. R. & Miller, E. K. Phase-dependent neuronal coding of objects in short-
term memory. Proceedings of the National Academy of Sciences 106. ISBN: 0908193106 Publisher:
National Acad Sciences eprint: arXiv:1408.1149, 21341–21346. doi:10.1073/pnas.0908193106
(2009).
90. O’keefe, J. & Burgess, N. Dual phase and rate coding in hippocampal place cells: Theoretical signifi-
cance and relationship to entorhinal grid cells. Hippocampus 15. ISBN: 1050-9631 (Print) Publisher:
Wiley Online Library, 853–866. doi:10.1002/hipo.20115 (2005).
91. Eden, U. T., Frank, L. M. & Tao, L. in Dynamic Neuroscience 29–52 (Springer, 2018).
92. Dean, H. L., Hagan, M. A. & Pesaran, B. Only Coherent Spiking in Posterior Parietal Cortex Co-
ordinates Looking and Reaching. Neuron 73. ISBN: 1097-4199 Publisher: Elsevier Inc., 829–841.
doi:10.1016/j.neuron.2011.12.035 (2012).
93. Hagan, M. A., Dean, H. L. & Pesaran, B. Spike-field activity in parietal area LIP during coordinated
reach and saccade movements. Journal of Neurophysiology 107. ISBN: 1522-1598; 0022-3077 Pub-
lisher: American Physiological Society Bethesda, MD, 1275–1290. doi:10.1152/jn.00867.2011
(2012).
94. Citi, L., Ba, D., Brown, E. N. & Barbieri, R. Likelihood methods for point processes with refractori-
ness. Neural computation 26. Publisher: MIT Press, 237–263 (2014).
95. Coleman, T. P. & Sarma, S. S. A computationally efficient method for nonparametric modeling of
neural spiking activity with point processes. Neural Computation 22. Publisher: MIT Press, 2002–
2030 (2010).
96. Pistohl, T., Ball, T., Schulze-Bonhage, A., Aertsen, A. & Mehring, C. Prediction of arm movement
trajectories from ECoG-recordings in humans. Journal of Neuroscience Methods 167. ISBN: 0165-
0270 (Print)$\backslash$n0165-0270 (Linking) Publisher: Elsevier, 105–114. doi:10.1016/j.
jneumeth.2007.10.001 (2008).
97. Wang, W., Collinger, J. L., Degenhart, A. D., Tyler-Kabara, E. C., Schwartz, A. B., Moran, D. W.,
et al. An Electrocorticographic Brain Interface in an Individual with Tetraplegia. PLoS ONE 8. ISBN:
1932-6203 (Electronic)$\backslash$n1932-6203 (Linking) Publisher: Public Library of Science, 1–
8. doi:10.1371/journal.pone.0055344 (2013).
76
98. Yang, Y ., Sani, O. G., Chang, E. F. & Shanechi, M. M. Dynamic network modeling and dimensional-
ity reduction for human ECoG activity. en. Journal of Neural Engineering 16, 056014. doi:10.1088/
1741-2552/ab2214 (2019).
99. Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning: Data Mining, Infer-
ence, and Prediction. Biometrics (2002).
100. Byrd, R. H., Schnabel, R. B. & Shultz, G. A. A trust region algorithm for nonlinearly constrained
optimization. SIAM Journal on Numerical Analysis 24. Publisher: SIAM, 1152–1170 (1987).
101. Ljung, L. System identification: theory for the user (Prentice Hall, Upper Saddle River, New Jersey,
1998).
102. Wong, Y . T., Putrino, D., Weiss, A. & Pesaran, B. Utilizing movement synergies to improve decoding
performance for a brain machine interface. Proceedings of the Annual International Conference of
the IEEE Engineering in Medicine and Biology Society, EMBS. ISBN: 9781457702167 Publisher:
IEEE eprint: NIHMS150003, 289–292. doi:10.1109/EMBC.2013.6609494 (2013).
103. Blei, D. M., Kucukelbir, A. & McAuliffe, J. D. Variational Inference: A Review for Statisticians.
eprint: 1601.00670. doi:10.1080/01621459.2017.1285773 (2016).
104. Kingma, D. P. & Welling, M. Auto-Encoding Variational Bayes. arXiv:1312.6114 [cs, stat]. arXiv:
1312.6114 (2014).
105. Beal, M. J. VARIATIONAL ALGORITHMS FOR APPROXIMATE BAYESIAN INFERENCE tech. rep.
Publication Title: Physics (1998).
106. Michaels, J. A., Dann, B., Intveld, R. W. & Scherberger, H. Predicting Reaction Time from the Neural
State Space of the Premotor and Parietal Grasping Network. Journal of Neuroscience 35. ISBN: 0270-
6474, 11415–11432. doi:10.1523/JNEUROSCI.1714-15.2015 (2015).
107. Bressler, S. L. & Menon, V . Large-scale brain networks in cognition: emerging methods and prin-
ciples 6. ISSN: 13646613 Publication Title: Trends in Cognitive Sciences. doi:10.1016/j.tics.
2010.04.004 (2010).
108. Schoffelen, J. M., Oostenveld, R. & Fries, P. Neuronal coherence as a mechanism of effective corti-
cospinal interaction. Science 308, 111–113. doi:10.1126/science.1107027 (2005).
109. Berens, P., Keliris, G. A., Ecker, A. S., Logothetis, N. K. & Tolias, A. S. Feature selectivity of the
gamma-band of the local field potential in primate primary visual cortex. frontiers in Neuroscience 2.
Publisher: Frontiers, 199–207. doi:10.3389/neuro.01.037.2008 (2008).
77
110. Flint, R. D., Ethier, C., Oby, E. R., Miller, L. E. & Slutzky, M. W. Local field potentials allow accurate
decoding of muscle activity. Journal of Neurophysiology 108. ISBN: 1522-1598 (Electronic)$\backslash$r0022-
3077 (Linking) eprint: /dx.doi.org/10.1038/ncomms5447, 18–24. doi:10.1152/jn.00832.2011
(2012).
111. Heldman, D. A., Wei Wang, Chan, S. S. & Moran, D. W. Local field potential spectral tuning in
motor cortex during reaching. IEEE Transactions on Neural Systems and Rehabilitation Engineering
14. Conference Name: IEEE Transactions on Neural Systems and Rehabilitation Engineering, 180–
183. doi:10.1109/TNSRE.2006.875549 (2006).
112. He, J. & Fu, Z.-f. Modal analysis ISSN: 03787753 Publication Title: Environmental Fluid Dynamics
eprint: arXiv:1011.1669v3. doi:10.1016/B978-0-12-088571-8.01001-9 (2013).
113. Peeters, B. & De Roeck, G. Reference-based stochastic subspace identification for output-only modal
analysis. Mechanical Systems and Signal Processing 13. ISBN: 0888-3270 Publisher: Elsevier, 855–
878. doi:10.1006/mssp.1999.1249 (1999).
114. Abbaspourazad, H., Wong, Y . T., Pesaran, B. & Shanechi, M. M. M. Identifying multiscale hidden
states to decode behavior in Engineering in Medicine and Biology Society (EMBC), 2018 39th An-
nual International Conference of the IEEE 2018-July. ISSN: 1557170X (IEEE, 2018), 3778–3781.
doi:10.1109/EMBC.2018.8513242.
115. Perko, L. Differential Equations and Dynamical Systems doi:10.1007/978-1-4613-0003-8
(Springer New York, New York, NY, 2001).
116. Ray, S., Crone, N. E., Niebur, E., Franaszczuk, P. J. & Hsiao, S. S. Neural correlates of high-gamma
oscillations (60-200 Hz) in macaque local field potentials and their potential implications in electro-
corticography. The Journal of neuroscience : the official journal of the Society for Neuroscience 28.
Publisher: Society for Neuroscience, 11526–36. doi:10.1523/JNEUROSCI.2848-08.2008 (2008).
117. Stringer, C., Pachitariu, M., Steinmetz, N., Reddy, C. B., Carandini, M. & Harris, K. D. Spontaneous
behaviors drive multidimensional, brainwide activity. doi:10.1126/science.aav7893.
118. Stavisky, S. D., Willett, F. R., Wilson, G. H., Murphy, B. A., Rezaii, P., Avansino, D. T., et al. Neural
ensemble dynamics in dorsal motor cortex during speech in people with paralysis. eLife 8. Publisher:
eLife Sciences Publications Ltd. doi:10.7554/eLife.46015 (2019).
119. Hurwitz, C., Srivastava, A., Xu, K., Jude, J., Perich, M. G., Miller, L. E., et al. Targeted Neural
Dynamical Modeling. arXiv:2110.14853 [q-bio]. arXiv: 2110.14853 (2021).
78
120. Bondanelli, G., Deneux, T., Bathellier, B. & Ostojic, S. Network dynamics underlying OFF responses
in the auditory cortex. eLife 10 (eds Latham, P., Shinn-Cunningham, B. G. & Hennequin, G.) Pub-
lisher: eLife Sciences Publications, Ltd, e53151. doi:10.7554/eLife.53151 (2021).
121. Gardner, R. J., Hermansen, E., Pachitariu, M., Burak, Y ., Baas, N. A., Dunn, B. A., et al. Toroidal
topology of population activity in grid cells. en. Nature. Bandiera abtest: a Cc license type: cc by
Cg type: Nature Research Journals Primary atype: Research Publisher: Nature Publishing Group
Subject term: Network models;Neural circuits Subject term id: network-models;neural-circuit, 1–6.
doi:10.1038/s41586-021-04268-7 (2022).
122. Jazayeri, M. & Ostojic, S. Interpreting neural computations by examining intrinsic and embedding
dimensionality of neural activity. en. Current Opinion in Neurobiology. Computational Neuroscience
70, 113–120. doi:10.1016/j.conb.2021.08.002 (2021).
123. Buesing, L., Macke, J. H. & Sahani, M. Spectral learning of linear dynamics from generalised-linear
observations with application to neural population data. Advances in Neural Information Processing
Systems (NIPS). ISBN: 9781627480031, 1–9 (2012).
124. Kalman, R. E. A new approach to linear filtering and prediction problems. Journal of Fluids Engi-
neering 82. Publisher: American Society of Mechanical Engineers, 35–45 (1960).
125. Hornik, K., Stinchcombe, M. & White, H. Multilayer feedforward networks are universal approxi-
mators. en. Neural Networks 2, 359–366. doi:10.1016/0893-6080(89)90020-8 (1989).
126. De Jong, P. & MacKinnon, M. J. Covariances for Smoothed Estimates in State-space Models. Biometrika
75, 601–602 (1988).
127.
˚
Astr¨ om, K. J. Introduction to Stochastic Control Theory en. Google-Books-ID: KregHuYKKV oC
(Courier Corporation, 2012).
128. Fraccaro, M., Kamronn, S., Paquet, U. & Winther, O. A Disentangled Recognition and Nonlinear
Dynamics Model for Unsupervised Learning. arXiv:1710.05741 [cs, stat]. arXiv: 1710.05741 (2017).
129. Bowman, S. R., Vilnis, L., Vinyals, O., Dai, A. M., Jozefowicz, R. & Bengio, S. Generating Sentences
from a Continuous Space. arXiv:1511.06349 [cs]. arXiv: 1511.06349 (2016).
130. Dosovitskiy, A. & Brox, T. Generating Images with Perceptual Similarity Metrics based on Deep
Networks. arXiv:1602.02644 [cs]. arXiv: 1602.02644 (2016).
131. Razavi, A., Oord, A. v. d., Poole, B. & Vinyals, O. Preventing Posterior Collapse with delta-V AEs.
arXiv:1901.03416 [cs, stat]. arXiv: 1901.03416 (2019).
79
132. Zhao, Y . & Park, I. M. Variational Latent Gaussian Process for Recovering Single-Trial Dynam-
ics from Population Spike Trains. Neural Computation 29. arXiv: 1604.03053, 1293–1316. doi:10.
1162/NECO_a_00953 (2017).
133. Goodfellow, I., Bengio, Y . & Courville, A. Deep Learning (MIT Press, 2016).
134. Kingma, D. P. & Ba, J. Adam: A Method for Stochastic Optimization. arXiv:1412.6980 [cs]. arXiv:
1412.6980 (2017).
135. Sani, O. G., Abbaspourazad, H., Wong, Y . T., Pesaran, B. & Shanechi, M. M. Modeling behaviorally
relevant neural dynamics enabled by preferential subspace identification. en. Nature Neuroscience.
Publisher: Nature Publishing Group, 1–10. doi:10.1038/s41593-020-00733-0 (2020).
136. Carlsson, G. Topology and data. en. Bulletin of the American Mathematical Society 46, 255–308.
doi:10.1090/S0273-0979-09-01249-X (2009).
137. Saxena, S., Russo, A., Cunningham, J. & Churchland, M. M. Motor cortex activity across movement
speeds is predicted by network-level strategies for generating muscle activity. en. bioRxiv. Publisher:
Cold Spring Harbor Laboratory Section: New Results, 2021.02.01.429168. doi:10.1101/2021.02.
01.429168 (2021).
138. Hurwitz, C., Kudryashova, N., Onken, A. & Hennig, M. H. Building population models for large-
scale neural recordings: opportunities and pitfalls. arXiv:2102.01807 [cs, q-bio]. arXiv: 2102.01807
(2021).
139. Afshar, A., Santhanam, G., Yu, B. M., Ryu, S. I., Sahani, M. & Shenoy, K. V . Single-trial neural cor-
relates of arm movement preparation. Neuron 71. ISBN: 1097-4199 (Electronic)$\backslash$n0896-
6273 (Linking) Publisher: Elsevier Inc. eprint: NIHMS150003, 555–564. doi:10.1016/j.neuron.
2011.05.047 (2011).
140. Williams, A. H., Kim, T. H., Wang, F., Vyas, S., Ryu, S. I., Shenoy, K. V ., et al. Unsupervised Dis-
covery of Demixed, Low-Dimensional Neural Dynamics across Multiple Timescales through Tensor
Component Analysis. Neuron. Publisher: Elsevier Inc., 1–17. doi:10.1016/j.neuron.2018.05.
015 (2018).
141. Finn, C., Goodfellow, I. & Levine, S. Unsupervised Learning for Physical Interaction through Video
Prediction. arXiv:1605.07157 [cs]. arXiv: 1605.07157 (2016).
142. Oh, J., Guo, X., Lee, H., Lewis, R. & Singh, S. Action-Conditional Video Prediction using Deep
Networks in Atari Games. arXiv:1507.08750 [cs]. arXiv: 1507.08750 (2015).
80
143. Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: Pre-training of Deep Bidirectional Trans-
formers for Language Understanding. arXiv:1810.04805 [cs]. arXiv: 1810.04805 (2019).
144. Lewis, M., Liu, Y ., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., et al. BART: Denoising
Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehen-
sion. arXiv:1910.13461 [cs, stat]. arXiv: 1910.13461 (2019).
145. Kidzi´ nski, Ł., Yang, B., Hicks, J. L., Rajagopal, A., Delp, S. L. & Schwartz, M. H. Deep neural
networks enable quantitative movement analysis using single-camera videos. en. Nature Communi-
cations 11. Number: 1 Publisher: Nature Publishing Group, 4054. doi:10.1038/s41467-020-
17807-z (2020).
146. Kalidindi, H. T., Cross, K. P., Lillicrap, T. P., Omrani, M., Falotico, E., Sabes, P. N., et al. Rotational
dynamics in motor cortex are consistent with a feedback controller. eLife 10 (eds Goldberg, J. H.
& Ivry, R. B.) Publisher: eLife Sciences Publications, Ltd, e67256. doi:10.7554/eLife.67256
(2021).
147. Shenoy, K. V ., Sahani, M. & Churchland, M. M. Cortical Control of Arm Movements: A Dynamical
Systems Perspective. Annual Review of Neuroscience 36. ISBN: 1545-4126 (Electronic) 0147-006X
(Linking), 337–359. doi:10.1146/annurev-neuro-062111-150509 (2013).
148. Rezende, D. J. & Mohamed, S. Variational Inference with Normalizing Flows tech. rep. eprint:
1505.05770v6 ().
149. Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., et al. Generative
Adversarial Networks arXiv:1406.2661 [cs, stat]. 2014. doi:10.48550/arXiv.1406.2661.
150. Oord, A. v. d., Li, Y . & Vinyals, O. Representation Learning with Contrastive Predictive Coding
arXiv:1807.03748 [cs, stat]. 2019. doi:10.48550/arXiv.1807.03748.
151. Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A Simple Framework for Contrastive Learning of
Visual Representations tech. rep. arXiv:2002.05709. arXiv:2002.05709 [cs, stat] type: article (arXiv,
2020). doi:10.48550/arXiv.2002.05709.
152. Baevski, A., Hsu, W.-N., Xu, Q., Babu, A., Gu, J. & Auli, M. data2vec: A General Framework for
Self-supervised Learning in Speech, Vision and Language tech. rep. arXiv:2202.03555. arXiv:2202.03555
[cs] type: article (arXiv, 2022).
153. Chen, X., Fan, H., Girshick, R. & He, K. Improved Baselines with Momentum Contrastive Learning.
arXiv:2003.04297 [cs]. arXiv: 2003.04297 (2020).
81
154. Grill, J.-B., Strub, F., Altch´ e, F., Tallec, C., Richemond, P. H., Buchatskaya, E., et al. Bootstrap your
own latent: A new approach to self-supervised Learning tech. rep. arXiv:2006.07733. arXiv:2006.07733
[cs, stat] type: article (arXiv, 2020). doi:10.48550/arXiv.2006.07733.
155. Kingma, D. P. & Dhariwal, P. Glow: Generative Flow with Invertible 1x1 Convolutions arXiv:1807.03039
[cs, stat]. 2018. doi:10.48550/arXiv.1807.03039.
156. Song, Y ., Durkan, C., Murray, I. & Ermon, S. Maximum Likelihood Training of Score-Based Diffusion
Models arXiv:2101.09258 [cs, stat]. 2021. doi:10.48550/arXiv.2101.09258.
157. Ramesh, A., Dhariwal, P., Nichol, A., Chu, C. & Chen, M. Hierarchical Text-Conditional Image
Generation with CLIP Latents arXiv:2204.06125 [cs]. 2022. doi:10.48550/arXiv.2204.06125.
158. Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., et al. Learning Transferable
Visual Models From Natural Language Supervision arXiv:2103.00020 [cs]. 2021. doi:10.48550/
arXiv.2103.00020.
159. Markowitz, D. A., Curtis, C. E. & Pesaran, B. Multiple component networks support working mem-
ory in prefrontal cortex. Proceedings of the National Academy of Sciences 112. ISBN: 10.1073/pnas.1504172112,
11084–11089. doi:10.1073/pnas.1504172112 (2015).
160. Lawlor, P. N., Perich, M. G., Miller, L. E. & Kording, K. P. Linear-nonlinear-time-warp-poisson
models of neural activity. Journal of Computational Neuroscience 45. Publisher: Springer New York
LLC, 173–191. doi:10.1007/s10827-018-0696-6 (2018).
161. Perich, M. G., Lawlor, P. N., Kording, K. P. & Miller, L. E. Extracellular neural recordings from
macaque primary and dorsal premotor motor cortex during a sequential reaching task. CRCNS.
doi:10.6080/K0FT8J72 (2018).
162. O’Doherty, J. E., Cardoso, M. M. B., Makin, J. G. & Sabes, P. N. Nonhuman Primate Reaching with
Multichannel Sensorimotor Cortex Electrophysiology. Zenodo. doi:10.5281/zenodo.3854034
(2020).
163. Makin, J. G., O’Doherty, J. E., Cardoso, M. M. B. & Sabes, P. N. Superior arm-movement decoding
from cortex with a new, unsupervised-learning algorithm. en. Journal of Neural Engineering 15.
Publisher: IOP Publishing, 026010. doi:10.1088/1741-2552/aa9e95 (2018).
82
Appendices
A Datasets
We have used various datasets for the analyses in this thesis. In below, we explain the details of these
datasets.
A.1 Saccade task
A macaque monkey (Monkey A) performed saccadic eye movements during visually-guided oculomotor
delayed response task [159] (Fig. 6.2a). Each experimental session (13 sessions in total) consisted of
several trials towards one of eight peripheral targets on a screen. All surgical and experimental procedures
were performed in compliance with the National Institutes of Health Guide for Care and Use of Laboratory
Animals and were approved by the New York University Institutional Animal Care and Use Committee
[159]. Trials began with the illumination of a central fixation square. The subject was trained to maintain
its eyes on the square for about 500-800ms. After this baseline period, a visual cue was flashed for 300ms
at one of the eight peripheral locations to indicate the target of the saccade (Target On event). After a delay,
the central fixation square was extinguished, indicating the Go command to start the saccade (Go event).
The subject was trained to perform the saccade (Saccade Start event) and maintain fixation on the target
for an additional 300ms. A fluid reward was then delivered. The visual stimuli were controlled via custom
LabVIEW (National Instruments) software. Eye position was tracked with an infrared optical eye tracking
system (ISCAN) and from these positions some of the task events such as Saccade Start were identified.
In this task, there are eight task conditions, each representing trials to one of the eight targets. During the
task, LFP signals were recorded from lateral PFC with a semi-chronic 32-microelectrode array microdrive
(SC32-1, Gray Matter Research). For the analyses in this thesis, raw LFP signals were low-pass filtered at
300 Hz and down-sampled to 20 Hz leading to a LFP observation every 50ms.
83
A.2 Naturalistic 3D reach-and-grasp task
Two macaque monkeys (Monkey J and Monkey C) performed a naturalistic reach-and-grasp task in a
50×50×50 cm
3
workspace for a liquid reward across seven experimental sessions [56] (Fig. 6.2b). All
surgical and experimental procedures were performed in compliance with the National Institute of Health
Guide for Care and Use of Laboratory Animals and were approved by the New York University Institutional
Animal Care and Use Committee [56]. During the task, the subject naturalistically reached for an object
positioned on a wand, grasped it, released it and then returned its hand to a natural resting position [56].
The wand was continuously moved around by the experimenter within a diverse spatial area in front of the
subject [56]. The task was performed continuously in time without any instructions to isolate reach-and-
grasp movement components. A total of 23 retroreflective markers were attached on the subject’s right arm
and monitored using infrared and near-infrared motion capture cameras (Osprey Digital RealTime System,
Motion Analysis Corp., USA) at a sampling rate of 100 frames s
− 1
. We labeled 3D marker trajectories on
the arm and hand (Cortex, Motion Analysis Corp., USA). The behavior variables were taken as the arm
kinematics, i.e., the position and velocity of the wrist marker in the x, y and z directions, or the joint angles.
On each frame, motion capture camera data acquisition was synchronized to the neural recordings using
a synchronization trigger pulse. The task lacked pre-defined trial structure or pre-defined target locations.
Therefore, we identified the trial starts and ends from the velocity of the hand movement [56], where the
start of the trials was set to the start of the reach, and the end of the trials was set as the end of return and
hold durations of the movement (Fig. 6.2b). To show condition-average visualizations, we partitioned the
trials into two different conditions corresponding to left-ward or right-ward reaches along the horizontal axis
in front of the subject, respectively. The horizontal axis was chosen for this division because it explained
the largest variability in the reach locations. Simultaneous spiking and LFP activity was recorded from M1,
PMd, PMv and PFC in the left (contralateral) hemisphere with an array containing 137 microelectrodes
(large-scale micro-drive system, Gray Matter Research, USA).
A.3 2D random-target reaching task
A macaque monkey (Monkey T) performed a 2D random-target reaching task with an on-screen cursor in a
total of three experimental sessions [160, 161] (Fig. 6.2c). All surgical and experimental procedures were
consistent with the guide for the care and use of laboratory animals and approved by the institutional animal
84
care and use committee of Northwestern University [160, 161]. The subject controlled the cursor using
a two-link planar manipulandum while seated in a primate chair. Hand movements were constrained to a
horizontal plane within a 20×20 cm
2
workspace. The task consisted of several sections in each of which
the subject performed 4 sequential reaches to random visual targets that appeared on the screen to receive
a liquid reward (Fig. 6.2c). Within each section, after reaching the target and holding for a short period,
the next target appeared in a pseudo-random location within a circular region (radius = 5-15 cm, angle =
360 degrees) centered on the current target. On average, the next target appeared approximately 200ms after
the subject reached the current target. The task naturally consisted of non-stereotyped reaches to different
locations on a 2D screen unlike traditional center-out cursor control tasks with stereotyped conditions45.
Here, 2D cursor position and velocity in x and y directions were used as behavior variables. To show
condition-average visualizations, we partitioned the reaches into 8 different conditions given the direction
angle between the start and end point of the cursor trajectory. The angle of movement specifies the 8
conditions, which correspond to movement angle intervals of 0-45, 45-90, 90-135, 135-180, 180-225, 225-
270, 270-315, and 315-360, respectively. The subject was implanted with a 100-electrode array (Blackrock
Microsystems) in PMd. After spike sorting, two sets of units were excluded from analysis in the original
work [160]: 1) the units with low firing rates (smaller than 2 spike/s), 2) the units that had high correlations
with other units. This led to 46, 49 and 57 number of units across different recording sessions.
A.4 2D grid reaching task
A macaque monkey (Monkey I) performed a 2D grid reaching task by controlling a cursor on a 2D surface
in a virtual reality environment [162, 163] (Fig. 6.2d). All animal procedures were performed in accordance
with the U.S. National Research Council’s Guide for the Care and Use of Laboratory Animals and were
approved by the UCSF Institutional Animal Care and Use Committee [162, 163]. Circular targets with 5mm
visual radius within an 8-by-8 square grid or an 8-by-17 rectangular grid were presented to the subject and
the cursor was controlled with the subject’s fingertip position. Fingertip position was monitored by a six-axis
electromagnetic position sensor (Polhemus Liberty, Colchester, VT) at 250 Hz and then non-causally low-
pass filtered to reject sensor noise (4th order Butterworth filter with 10 Hz cutoff). The subject was trained
to acquire the target by holding the cursor in the respective target-acceptance zone, a square of 7.5mm edge
length centered around each target, for 450ms. After acquiring the target, a new target was drawn from
85
the possible set of targets. In most sessions, this set was generated by replacement, i.e., the last acquired
target could be drawn as the new target. However, the last acquired target was removed from the set in some
sessions. Even though there was no inter-trial interval between consecutive reaches, there existed a 200ms
“lockout interval” after target acquisition where no new target could be acquired. 2D cursor position and
velocity in x and y directions were used as behavior variables. To show condition-average visualizations,
we partitioned the reaches into 8 different conditions based on their direction angle as in the 2D random-
target reaching task. One 96-channel silicon microelectrode array (Blackrock Microsystems, Salt Lake City,
UT) was implanted into the subject’s right hemisphere (contralateral) M1. Total of seven sessions (sessions
20160622/01 to 20160921/01) were used in our manuscript [64]. We analyzed the pool of top 30 neurons
sorted based on the individual neuron behavior prediction accuracies.
86
Baseline
fixation
Target
appears
Go cue
(square disappears)
Visual delay saccade
Saccade task
a
Time
Start Target On Go Saccade Start
End
2D random-target
reaching task
3D reach-and-grasp task
Self-initiated reach
to an object
Object
grasp
Self-initiated return to
resting position
Time Hold at
resting position
Baseline Target
appears
Self-initiated
movement to
target
Hold Next target
appears
Self-initiated
movement to
target
c
b
Time
Target
fixation
Hold
Central
square appears
Baseline Preparation Movement
2D grid
reaching task
d
Start Target
appears
Self-paced
reach to
target
Next target
appears
Self-paced
reach to
target
Time
Figure 6.2: Experimental datasets are shown for the (a) saccade task, (b) 3D naturalistic reach-and-grasp
task, (c) 2D random-target reaching task, (d) 2D grid reaching task.
87
Abstract (if available)
Abstract
Brain activity exhibits various spatiotemporal dynamical patterns emerging from the coordinated action of neuronal populations in time and across many brain regions. As such, we can view the brain as a complex dynamical system with high-dimensional observations in the form of neural population activity, which can be described in terms of lower-dimensional latent factors. Therefore, developing models and algorithms that extract latent factors of the neural population activity allows us to better understand the brain mechanisms during tasks and to build more accurate brain-machine interfaces. Such models should characterize multiple scales of neural population activity while enabling accurate and flexible inference of latent factors for basic science and neurotechnology applications. The proposed research involves multiple advancements including a dynamic latent model of multiscale neural population activity along with its learning and inference algorithms, a novel discovery about the similarities and differences in low-dimensional neural dynamics across the small-scale and large-scale brain networks, a novel nonlinear dynamic model using deep learning techniques to incorporate nonlinearities while enabling the flexible inference properties such as causal or non-causal inference in the presence of missing observations, and validating the low-dimensional manifold structures emerging in neural population activity using artificial neural networks. Overall, the outcomes of this research will advance future neuroscientific studies of small-scale and large-scale neural population activity and provide general tools to extract low-dimensional latent factors from high-dimensional time-series.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Geometric and dynamical modeling of multiscale neural population activity
PDF
Switching dynamical systems with Poisson and multiscale observations with applications to neural population activity
PDF
Detection and decoding of cognitive states from neural activity to enable a performance-improving brain-computer interface
PDF
Multiscale spike-field network causality identification
PDF
Latent space dynamics for interpretation, monitoring, and prediction in industrial systems
PDF
Deep learning for subsurface characterization and forecasting
PDF
Physics-aware graph networks for spatiotemporal physical systems
PDF
Data-driven modeling of the hippocampus & the design of neurostimulation patterns to abate seizures
PDF
Interaction between Artificial Intelligence Systems and Primate Brains
PDF
Dealing with unknown unknowns
PDF
Decoding memory from spatio-temporal patterns of neuronal spikes
PDF
Human motion data analysis and compression using graph based techniques
PDF
Efficient machine learning techniques for low- and high-dimensional data sources
PDF
Dynamic latent structured data analytics
PDF
Efficient graph learning: theory and performance evaluation
PDF
Invariant representation learning for robust and fair predictions
PDF
Learning multi-annotator subjective label embeddings
PDF
Data-driven learning for dynamical systems in biology
PDF
Scaling up temporal graph learning: powerful models, efficient algorithms, and optimized systems
PDF
Exploiting latent reliability information for classification tasks
Asset Metadata
Creator
Abbaspourazad, Hamidreza
(author)
Core Title
Dynamical representation learning for multiscale brain activity
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Degree Conferral Date
2022-12
Publication Date
09/09/2024
Defense Date
08/25/2022
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
brain,brain machine interfaces,deep learning,dimensionality reduction,dynamical modeling,latent factors,machine learning,motor cortex,neural networks,neural population activity,Neuroscience,OAI-PMH Harvest,representation learning
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Shanechi, Maryam (
committee chair
), Lidar, Daniel (
committee member
), Ortega, Antonio (
committee member
), Valero-Cuevas, Francisco (
committee member
)
Creator Email
habbaspo@usc.edu,hamidreza.a.azad@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC111939863
Unique identifier
UC111939863
Legacy Identifier
etd-Abbaspoura-11180
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Abbaspourazad, Hamidreza
Type
texts
Source
20220909-usctheses-batch-979
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
brain
brain machine interfaces
deep learning
dimensionality reduction
dynamical modeling
latent factors
machine learning
motor cortex
neural networks
neural population activity
representation learning