Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Geometric and dynamical modeling of multiscale neural population activity
(USC Thesis Other)
Geometric and dynamical modeling of multiscale neural population activity
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Geometric and Dynamical Modeling of Multiscale Neural Population Activity by Han-Lin Hsieh A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) December 2022 Copyright 2023 Han-Lin Hsieh I dedicate this thesis to Curiosity, an insatiable beast in my brain, and the meaning of my life. ii Acknowledgements ”I finally finish it!” is the first sentence coming up in my head when I finish this thesis. It’s been quite a while since I decided and started my PhD program. At that point, I knew this decision will change me in many perspectives, but I didn’t realize that change is such deep. Like what mathematician Andrew Wiles describes, doing research is stumbling and bumping around in a dark room until you find the light switch, and then repeat this process in follow-up rooms until you figure out the whole mansion. This progress is bittersweet, but I guess the meaning of my life is torturing myself in front of the idol of truth to feast the flamingCuriosityinmybrain. TherearesomanypeopleImustthankduringthischasing. I first sincerely and gratefully thank my advisor, Prof. Maryam M. Shanechi, for her guidance andsupportthroughoutmyPhDjourney. ThefreedomIfeelandthesupportIreceiveinher labarewonderful. EventheresearchprojectIproposedisalittlecrazy,Maryamstillworked with me to refine the idea, resolve theoretical/technical challenges, and make it successful at the end! She lets me do the research at my own pace, but I can always consult her whenever I faced any problems. Such solid support is the main reason I can insistently finalize my research. It’s my pleasure to be Maryam’s PhD student. Mycommitteemembersalsospenttheirtimeandeffortinreviewingmyworkandgiving me useful suggestions. I thank my committee member, Prof. Krishna S. Nayak. I got to know Krishna when I joined EE690 in my PhD 1st year, and that meeting was great. His kindness and his honest attitude to his research really impresses and affects me. I thank my committee member, Prof. Francisco Valero-Cuevas. He kindly invited me to his lab to present my work, and he and his lab members gave me rich and useful feedback from another viewpoints that I didn’t know before. These are useful suggestions for completing iii my research. I also thank Prof. Bijan Pesaran and his team. This dissertation cannot be done without the NHP data they collected. I thank all my lab colleagues, especially Yuxiao Yang, Omid G. Sani, and Hamidreza Abbaspour (Salar). The discussions with all you guys are always pleasant and insightful. Youaremygoodconsultants,especiallyinprogramming. Attendingconferencesandhanging out with you guys are joyful. I also want to thank the supportive administrative assistant, Ann-Hua Yu (Annie), for your responsive emails and reliable support even when I sent you some last-minute requests. I really appreciate that. I thank my landlord, Carmen Patlan, for being my family in Los Angeles. In the past 8 years, your house is my second home abroad. I thank my friends, Szu-Yu Lee, Wei-Chun Hung, Fang-Yi Yu, and Nian-Ze Lee. The fun time with you guys is great and invaluable to me. Every time I faced tough challenges, chatting with you always made me energetic again and then I could face the problems directly. It’s great to have good friends like you. I thank my girlfriend Lynn Lee. Being together with you is one of the best thing in my life. I cannot finish my PhD without your encouragement and comfort (maybe stimulation, too? XD). I’m blessed to have a life partner like you who always stand on my side firmly. Finally, I would like to thank my parents, Chien-Cheng Hsieh and Feng-Yueh Hsieh. When I was a child, you always stood behind me. Even though you don’t understand what I’m doing, you know that it means a lot to me and support me to pursue it, which is all I need. You take good care of the family, so I can focus on my research and keep moving forward with this strong backup. I dedicate my most gratitude to your understanding and sacrifice. iv Table of Contents Dedication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Chapter 1: Multiscale Modeling and Decoding Algorithm . . . . . . . . . . . . . . . 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.1 Multiscale encoding model . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.2 Multiscale filter (MSF) . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.3 The adaptive multiscale filter . . . . . . . . . . . . . . . . . . . . . . 10 1.2.4 Numerical simulations . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.2.5 NHP experimental setup and validation . . . . . . . . . . . . . . . . . 17 1.2.6 Neural data processing . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.2.7 NHP decoding analyses . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.3.1 Simulation validations . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.3.2 NHP spike-field decoding of naturalistic 3D reaches with MSF . . . . 24 1.3.3 MSFimprovesperformanceduetoadditionofinformationacrossscales rather than dominance of information in one scale . . . . . . . . . . . 27 1.3.4 MSF improvement is greatest in the low-information regime . . . . . 28 1.3.5 MSFimprovementissimilarregardlessofthedegreeofoverlapbetween spike and LFP channels . . . . . . . . . . . . . . . . . . . . . . . . . 30 1.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Chapter 2: Learning Rate Calibration Algorithm for Adaptive PPF and KF . . . . . 35 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.2.1 The calibration algorithm for continuous neural signals . . . . . . . . 42 2.2.2 The calibration algorithm for discrete-valued spikes . . . . . . . . . . 49 2.2.3 Calibration algorithm for non-periodic state evolution . . . . . . . . . 53 2.2.4 Numerical simulations . . . . . . . . . . . . . . . . . . . . . . . . . . 53 2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 v 2.3.1 Thecalibrationalgorithmcomputestheconvergencetimeandtheerror covariance accurately with continuous signals . . . . . . . . . . . . . 57 2.3.2 Use of the inverse function to compute the learning rate . . . . . . . 59 2.3.3 Parameter adaptation profiles conform to the analytically computed error covariance and convergence time closely . . . . . . . . . . . . . 61 2.3.4 The calibration algorithm generalizes to different state evolutions . . 62 2.3.5 The calibration algorithm for discrete spiking activity . . . . . . . . . 63 2.3.6 The effect of learning rate on decoding . . . . . . . . . . . . . . . . . 67 2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Chapter 3: Geometric Dynamical Modeling . . . . . . . . . . . . . . . . . . . . . . . 74 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 3.2.1 Learning the manifold underlying the neural activity . . . . . . . . . 78 3.2.2 Regressing neural activity onto the manifold: the Hausdorff distance . 84 3.2.3 Paired EM algorithm: identify the GDM model parameters . . . . . . 85 3.2.4 Real-time latent state estimation using a calibrated particle-Kalman filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 3.2.5 The geometric dimensionality reduction algorithm: linear projected regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 3.2.6 NHP experimental setup and validation . . . . . . . . . . . . . . . . . 93 3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 3.3.1 Intrinsic geometric coordinate is a natural coordinate for neural pop- ulation activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 3.3.2 Intrinsic geometric coordinate dominates the neural noise deviation . 96 3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Chapter 4: Conclusion and ongoing work . . . . . . . . . . . . . . . . . . . . . . . . 100 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 vi List of Figures 1.1 Adaptive MSF architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2 MSF performs better than PPF and KF in closed-loop BMI simulations. . . 22 1.3 Adaptive MSF can learn all the spike-field multiscale model parameters si- multaneously in real time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 1.4 Adaptive MSF performance converges to the performance of an MSF that knows the true parameters.. . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 1.5 Sample true and decoded trajectories. . . . . . . . . . . . . . . . . . . . . . . 26 1.6 MSF improves decoding by combining information from spikes and LFPs. . . 27 1.7 MSF improvement when adding spike to LFP channels or vice versa is similar regardless of the degree of overlap between spike and LFP channels, and is greatest in the low-information regime. . . . . . . . . . . . . . . . . . . . . . 29 2.1 Closed-loop neural system architecture. . . . . . . . . . . . . . . . . . . . . . 36 2.2 Flowchart of the calibration algorithm. . . . . . . . . . . . . . . . . . . . . . 40 2.3 The calibration algorithm accurately computes the steady-state error covari- ance and convergence time for continuous signals. . . . . . . . . . . . . . . . 58 2.4 Adaptation profiles for model parameters with continuous signals. . . . . . . 60 2.5 The calibration algorithm generalizes to training datasets including non- periodic state trajectories. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 2.6 The calibration algorithm accurately computes the steady-state error covari- ance for discrete spiking activity. . . . . . . . . . . . . . . . . . . . . . . . . 64 2.7 Adaptation profiles for model parameters with discrete spiking activity. . . . 65 2.8 The effect of calibration algorithm on closed-loop BMI decoding with contin- uous ECoG/LFP activity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 2.9 The effect of calibration algorithm on closed-loop BMI decoding with discrete spiking activity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 vii 3.1 GM describes the neural population activity more efficiently than LM. . . . 95 3.2 Neural deviation is better described by the intrinsic geometric coordinate. . . 96 viii Abstract Neuralpopulationdynamicsinthemotorcortexunderlyingnaturalisticmovementisnonlin- ear, multi-dimensional, and encoded across multiple spatiotemporal scales of neural popula- tion activity. To model and decode this complex biological system, we take two approaches. Thefirstapproachisengineering: wedevelopanewadaptivemultiscalefilter(MSF,chapter 1) for decoding NHP’s intentions in real time by running at the millisecond time-scale of spikes while adding information from fields at their slower time-scales. This jointly decoding idea is inspired by the fact that modern technology can simultaneously record neural signals in various scales, from spiking of individual neurons to large neural populations measured with local field potential (LFP). From information theory, multiscale decoding is always better than the single-scaled one, and this motivates us developing MSF. Furthermore, we extend it to adaptive MSF learning all spike-field multiscale model parameters simultane- ously in real time at their different time-scales. Here, both learning speed and accuracy are controlled by a hyper-parameter, the learning rate. For the best performance, we develop a novel analytical calibration algorithm for optimal selection of the learning rate (chapter 2). The key is deriving the explicitly analytical functions that predict the effect of learn- ing rate on the estimation error and the convergence time. We validate the adaptive MSF by numerical simulations and experiments over a non-human primate (NHP) whose spikes and LFPs are recorded from motor cortex during a naturalistic 3D reach-and-grasp task. Our simulations show that the adaptive MSF can add information across scales while accu- rately learning all model parameters in real time. NHP data shows that MSF outperforms single-scale decoding. This improvement is largest in the low-information regime, and is similar regardless of the degree of overlap between spikes and LFPs; The second approach ix is scientific: we develop a new geometric dynamical model (GDM, chapter 3) guided by the nonlinear manifold underlying the neural population activity from the motor cortex. It is inspiredbyaprominentfeature,therotation,thathasbeenobservedinlinear/nonlinearlow- dimensional projections of neural population activity across many datasets. We hypothesize that the cause of these rotations is that there exist nonlinear multi-dimensional (multi-D) manifolds with holes underlying neural population activity. Thus we develop GDM which learns a nonlinear multi-D manifold with a hole and fits the intrinsic dynamical model on top of this manifold. We apply GDM to NHP’s neural firing rates, and the result shows that the intrinsic geometric coordinate provides a more natural space to describe the dynamics of neural population activity compared to the extrinsic Euclidean coordinate. Overall, our adaptive MSF and GDM provide tools to study neural population activity from engineering and scientific angles. They are tools for exploring neurotechnology and neuroscience in the future. x Chapter 1 Multiscale Modeling and Decoding Algorithm 1.1 Introduction Behaviorisencodedacrossmultiplespatiotemporalscalesofneuralactivity,fromthespiking activity of individual neurons to larger population activity measured using local field poten- tials (LFP) and electrocorticogram (ECoG) 1–4 . Today we can concurrently record spikes and fields in real time to measure the brain at these spatiotemporal scales. This recording capability introduces the new computational challenge of modeling the encoding of behavior inallthesescalessimultaneously, anddevelopingnewdecodingalgorithmsthatcancombine information about behavior across all these scales. Multiscale modeling and decoding is challenging because of significant differences in sta- tistical profiles and time-scales across scales of activity 5–7 . In particular, spiking activity is binary-valued with a millisecond time-scale 8–14 —where the 1 or 0 binary value at each time-step represents the presence or absence of a spike at that time-step, respectively. In contrast, LFP/ECoG are continuous-valued with slower time-scales (e.g., milliseconds to tens of milliseconds) 5,6,15–23 . Thus a multiscale modeling and decoding framework needs to explicitly take into account such differences. While the vast majority of motor brain-machine interfaces (BMI) to date have exclu- sively decoded the intended movement from a single scale of activity—either spikes, LFP, or ECoG 20,24–34 —, recent BMI work takes important steps toward using both spikes and 1 fields by computing spike counts at the same time-scale as LFP 23,35,36 and by using a single Kalman filter (KF) to decode both signals 23,35 . These studies have indicated that using LFP in addition to spikes can improve BMI systems (see Discussion). However, these studies doesn’t account for the differences in statistical profiles and time-scales of spikes and LFP, andtheymodeledbothsignalsasGaussianandattheslowertime-scaleofLFP.Recentwork has indicated that precisely characterizing the spikes as a time-series of 1’s and 0’s and at a fast millisecond time-scale by using a point process filter (PPF) can improve the speed of adaptive learning and the performance in closed-loop BMIs 12,37 . Thus, together, these recent bodies of work 12,23,35–37 further motivate the development of a mathematically rigor- ous multiscale computational framework that can simultaneously characterize the different statistical profiles and time-scales of spikes and fields, and combine information across these scales. To develop a multiscale modeling and decoding framework, three elements need to be designed: a multiscale encoding model structure, a learning algorithm to fit the spike-field multiscale model parameters, and a decoder that estimates the underlying behavior (e.g., intended movement) by combining information from spikes and fields while running at their multiple time-scales. First, the encoding model structure should explicitly characterize the binary and continuous nature and different time-scales of spikes and fields, respectively. Second, the learning algorithm should have the ability to run in real time and to adap- tively track changes in neural encoding over time due to learning, plasticity, or recording non-stationarity. Additionally, such adaptation is important because neural representations could change during closed-loop operation of a BMI. For example, neural representations of movement could be different when controlling the native arm or imagining movements comparedtowhencontrollinganeuroprosthetic 38–43 . Finally, thedecodingalgorithmshould combine information from binary spikes and continuous fields while running at the faster time-scale of spikes. 2 Here we develop a multiscale encoding model structure, a multiscale adaptive learning algorithm, and a multiscale filter (MSF) for decoding that exhibit all the above require- ments. To address differences in statistical properties, we build a neural encoding model that represents the spikes as a point process and the field features as Gaussian processes. To model the differences in time-scales, we characterize the spikes at their fast millisecond time-scales while modeling the slower field activity as missing observations at intermediate time-steps when no new field sample is obtained. Using this encoding model, we then derive MSF that updates the decoded behavior at the fast millisecond time-scale of spikes while also adding information from fields at slower time-scales. Finally, to learn the spike-field multiscalemodelparametersinclosedloop, wedesignanadaptiveMSF.WeshowthatMSF provides a unified decoding framework for multiscale neural activity and specializes to well- known optimal single-scale decoders—PPF for spikes and KF for field features—when only one scale of activity is available. We test the multiscale framework using both closed-loop simulations and non-human primate (NHP) multiscale spike-LFP activity recorded during a naturalistic 3-dimensional (3D)reach-and-grasptask 44 . Insimulations,weshowthatadaptiveMSFcanaccuratelylearn the spike-field multiscale model parameters in closed-loop operation of the BMI. Moreover, theMSFcanrunatthefasttime-scaleofspikeswhileaddinginformationfromfieldsattheir slower time-scale to improve decoding accuracy. We also validate the MSF for decoding the 7 joint angles of the NHP arm. First, we show that MSF can improve decoding accuracy by using both spikes and LFPs. Second, we demonstrate that this improvement is due to the addition of information from both spikes and LFPs rather than, simply, the dominance of informationfromthemorestronglytunedsignal. Third,MSFimprovementissimilarevenin the two extreme cases when the spike and LFP-recorded channels either completely overlap or have no overlap, showing the benefit of multiscale decoding regardless of which channels contain spikes or fields. Taken together, this multiscale modeling, adaptive learning, and decodingframeworkcanserveasatooltostudyneuralrepresentationsacrossspatiotemporal 3 scales and has the potential to enhance future closed-loop neurotechnologies such as motor BMIs. Finally,thiswholechapterisacompactversionofourpublishedpaper 45 . Pleaserefer to it for more details. 1.2 Methods In this section, we present the multiscale encoding model and then derive the corresponding MSF and adaptive multiscale learning algorithm, i.e., adaptive MSF. We also describe the closed-loopsimulationsetupandtheexperimentalNHPdata. Figure1.1showsthecomplete structure of the adaptive MSF. 1.2.1 Multiscale encoding model We write a unified neural encoding model for multiscale spike-field activity. We first cate- gorize the neural activity into binary (spikes) and continuous (LFP/ECoG) modalities. We then characterize the different statistical profiles and time-scales of these modalities using different likelihood functions. First,spikingactivitycanberepresentedasabinarytime-seriesthatspecifiesthepresence or absence of a spike at a given time with a 1 or 0, respectively. We denote the binary spike event of neuron c at time t by N c t and model the spike time-series for each neuron as a point process 8–10,13,14,37,46–51 . Given C total neurons, we denote the spiking activity of all neurons at time t by N 1:C t = [N 1 t ,...,N C t ] ′ . Assuming that neurons are conditionally-independent given the encoded behavior 8–10,37,46,47,49–51 , the point process likelihood model forall neurons is given as p(N 1:C t |x t )= C Y c=1 (λ c (x t )∆) N c t exp(− λ c (x t )∆) , (1.1) 4 where ∆ is the time-bin taken to be small enough to contain at most 1 spike, x t represents the behavioral states (e.g., kinematics such as position or velocity), and λ c (x t )=exp(β c +α ′ c x t ) (1.2) Estimated intention Intention estimator Spikes LFP/ECoG Optimal learning rate Estimated parameters Calibration algorithm Feedback Task Actuator Motor intention Parameter PPF Parameter KF Multiscale filter LFP/ECoG + Spikes Figure 1.1: Adaptive MSF architecture. MSF models the differences in time-scales and statistical profiles of spikes and field features. MSF models the spikes as a point process at their fast millisecond time-scale and the field features as Gaussian processes with possibly slower time-scales. MSF can then decode the behavior at the fast millisecond time-scale of spikes. In a closed-loop BMI, this creates fast control and feedback rates for neuroprosthetic control. Moreover, MSF adds information from field features whenever new field observations are made, even if they are made at a slower time-scale. Finally, initially, adaptive MSF can learn the spike-field multiscale modelparametersduringclosed-loopBMIoperation,adaptively,andatmultipletime-scales. The learning rate for adaptation can be systematically chosen to satisfy a desired steady- stateerrorandconvergencetimefortheestimatedparametersusingacalibrationalgorithm. Once parameters converge, adaptation (consisting of the components outside the magenta dashedbox)stopsandthetrainedMSFisusedtodecodebehaviorfrommultiscalespike-field activity. 5 is the firing rate of neuron c at time t with spike model parameters β c andα c (these param- eters would have to be learned; see Section 1.2.3). Second, previous work 15,18,19,21,23,35,52,53 have shown that various frequency bands of the power spectral density (PSD) of field signals are informative of the encoded behavior and can be modeled as a linear function of the behavioral variables such as kinematics. Thus, as a good candidate feature to model, we can pick the log-power features of field signals (e.g., LFP or ECoG). We characterize these log-power features using a linear Gaussian model 15,18,19,23,35,52,53 y t =C˜ x t +z t ∼N (C˜ x t ,Z), (1.3) wherey t includes all log-power features,C is a parameter matrix to be learned (see Section 1.2.3), ˜ x t = [1,x ′ t ] ′ with 1 allowing us to model the baseline activity (i.e., bias) in each band, and z t is a zero-mean white Gaussian noise with covariance matrix Z. This can be rephrased as a likelihood function N(a,G) denoting a Gaussian distribution with mean a and covarianceG. Finally, using (1.1) and (1.3), we can construct the unified multiscale encoding model as p(y t ,N 1:C t |x t )=p(y t |x t )p(N 1:C t |x t ), (1.4) where we have assumed that field features and spikes are conditionally independent given (i.e.,conditionedon)thebehavioralstatex t (e.g.,kinematics). Herep(y t |x t )andp(N 1:C t |x t ) aregivenin(1.3)and(1.1), respectively. Thus(1.4)providesthecompleteunifiedmultiscale encoding model. 1.2.2 Multiscale filter (MSF) OurgoalistoderiveMSFasarecursiveBayesianfilter(asisalsothecaseforPPFandKF). To derive such a filter, we need a prior model for the behavioral states and a corresponding 6 observation model for the multiscale neural recordings. The observation model is given by (1.4). To write the prior model, similar to previous studies (e.g., see review in 26 ), we use a random-walk as x t =Ax t− 1 +w t , (1.5) wherex t is the behavioral state to be decoded (e.g., kinematics), A is a parameter matrix, andw t is a zero-mean white Gaussian noise with covariance matrix W. The parameters of the prior model are typically fitted using behavioral data (e.g., 12,54,55 ). Nowwecanuse(1.4)and(1.5)toderivearecursiveBayesianfilterthatcombinesinforma- tionacrossscales. Wedenotethemultiscaleneuralobservationattime t byh t ={y t ,N 1:C t }. Hence h 1:t includes all observations up to time t. The goal of MSF is to find the minimum mean-squared estimate (MMSE) of behavioral states x t at time t given h 1:t , which is equal to the mean of the posterior density p(x t |h 1:t ). MSF can be derived in two steps: prediction and update. The prediction density is defined as p(x t |h 1:t− 1 ). We denote the mean and the covariance matrix of the prediction density byx t|t− 1 andU t|t− 1 , respectively. Similarly, we denote the mean and the covariance of the posterior density p(x t |h 1:t ) by x t|t and U t|t , respectively. In the prediction step, we use (1.5) to compute the prediction mean and covariance at time t as x t|t− 1 =Ax t− 1|t− 1 (1.6) U t|t− 1 =AU t− 1|t− 1 A ′ +W. (1.7) 7 Wethenderivetheupdatestepthatcomputestheposteriormeanandcovarianceattime t using the Bayes rule. The posterior density p(x t |h 1:t )=p(x t |h t ,h 1:t− 1 ) is given by p(x t |h t ,h 1:t− 1 )= p(h t |x t ,h 1:t− 1 )p(x t |h 1:t− 1 ) p(h t |h 1:t− 1 ) = p(y t ,N 1:C t |x t )p(x t |h 1:t− 1 ) p(h t |h 1:t− 1 ) = p(y t |x t )p(N 1:C t |x t )p(x t |h 1:t− 1 ) p(h t |h 1:t− 1 ) . (1.8) We get the second equality above because p(h t |x t ,h 1:t− 1 ) = p(h t |x t ) = p(y t ,N 1:C t |x t ) ac- cording to the spike and field observation models (1.1)–(1.3); these models indicate that the distribution of the current spike or field observations is fully determined by the current behavioral statex t and does not need the past spike or field observations. We get the third equality from (1.4). Computing the posterior over time from (1.8) is computationally expensive due to the nonlinear point process model of spikes in (1.1). We thus use a Laplace approximation to approximate the posterior (1.8) as a Gaussian distribution 9,56,57 . Taking logarithm on both sides of (1.8), we get L u (x t |h 1:t )=L y (y t |x t )+L N (N 1:C t |x t )+L p (x t |h 1:t− 1 )+constant, (1.9) where L u is the logarithm of the posterior density p(x t |h 1:t ) and L y , L N , and L p are the logarithms of the three distributions in the numerator in (1.8), in order. The denominator p(h t |h 1:t− 1 ) in (1.8) is a constant with respect tox t . Using the Laplace approximation 9,56,57 , the covariance and mean of the posterior density p(x t |h 1:t ) can be approximated as U − 1 t|t =− " ∂ 2 L u ∂x 2 t # x t|t− 1 (1.10) x t|t =x t|t− 1 +U t|t " ∂L u ∂x t # x t|t− 1 , (1.11) 8 where [·] a denotes the evaluation of the inside expression at value a. Now to solve for the posterior mean and covariance, we substitute L u from (1.9) into (1.10) and (1.11). Further, we use the fact that given the prior model in (1.5) and assuming that the posterior density p(x t− 1 |h 1:t− 1 ) is Gaussian, the prediction density p(x t |h 1:t− 1 ) will also be Gaussian with mean x t|t− 1 and covariance U t|t− 1 , which are already computed in the prediction step from (1.6) and (1.7). We can thus obtain the update step as U − 1 t|t =U − 1 t|t− 1 − " ∂ 2 L y ∂x 2 t + ∂ 2 L N ∂x 2 t # x t|t− 1 (1.12) x t|t =x t|t− 1 +U t|t " ∂L y ∂x t + ∂L N ∂x t # x t|t− 1 . (1.13) Now we can use the specific forms of the observation models for fields in (1.3) and spikes in (1.1), and plug them in (1.12) and (1.13) to get U − 1 t|t =U − 1 t|t− 1 +C ′ Z − 1 C + " C X c=1 α c α ′ c λ c (x t )∆ # x t|t− 1 (1.14) x t|t =x t|t− 1 +U t|t × " C ′ Z − 1 y t − C˜ x t + C X c=1 α c N c t − λ c (x t )∆ # x t|t− 1 , (1.15) Theupdatestepthusconsistsof (1.14)and(1.15). Takentogether,MSFisgivenbyequations (1.6), (1.7), (1.14), and (1.15) to extract information simultaneously from spikes and field features. Note that the update step in (1.14) and (1.15) is given assuming that both new spike observations N 1:C t and new field observations y t are available at time t. However, the time- scale at which field features are computed is typically slower than ∆, which is the time-scale at which spikes are observed (i.e., the bin width for the spikes). Thus to take into account the different time-scales of spikes and fields, at the intermediate time-steps when a new field observation (i.e., data-sample) is not available and only new spike observations are made, we set C = 0. Thus in these intermediate time-steps, MSF specializes to a PPF 9,12,37 . So 9 MSF runs at the fast millisecond time-scale of spikes (every ∆ millisecond) but can also add information from fields at their slower time-scale. Note also that in the special case that we only have field observations, we can set α c = 0,∀c and the MSF specializes to a KF. Also, if only spikes are recorded, we can set C =0 for all time-steps and the MSF specializes to PPF. Thus MSF is a generalized filter that can be used to decode information across various scales of activity or at each scale separately. 1.2.3 The adaptive multiscale filter To construct the MSF, we first need to design an approach to learn the spike-field multiscale model parameters in (1.4). One approach is a maximum likelihood (ML) method to maxi- mize the likelihood function in (1.4) given the observed multiscale neural signals{y t ,N 1:C t } and behavioral states x t during a training session in which the behavioral states are mea- sured, e.g., when subjects imagine movements or move their arms for a motor BMI problem. However,recentstudiessuggestthebenefitofadaptiveparameterlearningduringclosed-loop BMI control 12,37,38,43,54,58–65 . For example, in motor BMIs, there may be changes in neural representations of movement between BMI and manual control/imagined movements, thus necessitatinganadaptivelearningapproach 12,37,38,43,54,58,60–65 . Insuchanadaptiveapproach, intended kinematics are inferred using various intention estimation methods 12,37,54,66,67 , for example by assuming a straight-to-target velocity intention or using an optimal feedback- control (OFC) model of the brain. Regardless of the method of intention estimation, an adaptive learning algorithm will then need to be devised to learn the spike-field multiscale model parameters in closed loop. Here, we first present an ML learning approach and then present a closed-loop adaptive learning algorithm, termed adaptive MSF, to learn these parameters. Inwhatfollows,weassumethatx t istheintendedkinematics,whichiseitherobservedin atrainingsessionorinferredinclosedloopusinganintentionestimationtechnique. Notethat ingeneral,however,x t couldbeanybehavioralstateofinterestdependingontheapplication. 10 Given the conditional independence of fields and spikes in (1.4), the mathematically optimal approach is to learn their model parameters in (1.1) and (1.3) separately given the intended kinematics x t . For spikes, we define ϕ c = [β c ,α ′ c ] ′ as the vector consisting of all unknown spike model parameters for neuron c in (1.1). Since neurons are taken to be conditionally independent given the intended kinematicsx t 8–10,37,47,49–51 , we can learn eachϕ c separately. Thus, for conciseness, from now on in this section, we remove the superscript/subscript c. For ML, the optimal solution forC andZ of the field observation model in (1.3) is given by the standard linear regression techniques 54,61 . Moreover, the optimal β and α for each neuron can be found by numerically maximizing the log-likelihood function in (1.1) using standard methods for generalized linear models 10,68 . To design the adaptive MSF, using (1.2), we denote the dependence of the firing rate on the model parameters as λ (t|ϕ ) = exp(β +α ′ x t ) = exp(˜ x ′ t ϕ ), with ˜ x t = [1,x ′ t ] ′ . Note that we changed the notation for the firing rate from λ (x t ) in (1.2) to λ (t|ϕ ). This is because the latter notation is used within adaptive MSF to learn the spike model parameters ϕ and thus treats these parameters as a random unknown vector; however, it treats x t (e.g., intended kinematics) as deterministic and known because they can be inferred using an intention estimation method during adaptive learning (figure 1.1). In contrast, the former notation λ (x t ) is used in the MSF to decode the behavioral statesx t (e.g., kinematics) after parameters have been learned. Thus, in the former notation, the behavioral state x t is a random unknown vector that needs to be estimated whereas ϕ is deterministic and known (because it is already learned). From λ (t|ϕ ) = exp(˜ x ′ t ϕ ), we see that the roles of ˜ x t and ϕ are interchangeable (i.e., symmetric) in (1.1); this means that we can assume the knowledge of one and estimate the other or vice versa using the same PPF. Therefore, given the observed spikes N t and the 11 intended kinematics x t , we can learn ϕ adaptively using a PPF with a prior model for ϕ . We construct this prior model as a random-walk given by 37 ϕ t =ϕ t− 1 +q t (1.16) whereq t is a white Gaussian noise with the covariance matrixQ. We denote the mean and covariance of the prediction density of ϕ by ϕ t|t− 1 and Q t|t− 1 , respectively, and the mean and covariance of its posterior density byϕ t|t andQ t|t , respectively. The PPF forϕ for any neuron c is then given as 45 ϕ t|t− 1 =ϕ t− 1|t− 1 (1.17) Q t|t− 1 =Q t− 1|t− 1 +Q (1.18) Q − 1 t|t =Q − 1 t|t− 1 + ˜ x t ˜ x ′ t λ (t|ϕ t|t− 1 )∆ (1.19) ϕ t|t =ϕ t|t− 1 +Q t|t ˜ x t [N t − λ (t|ϕ t|t− 1 )∆] (1.20) Note that the white noise q t in (1.16) represents our uncertainty about the unknown spike model parameter ϕ rather than a biophysical noise. We take Q = qI n , where I n is the identity matrix of dimension n. Thus q dictates the learning rate during adaptation in this PPF 69 , i.e., how fast parameters are learned. As adaptive MSF learns the spike model parameters in (1.1) separately from the field model parameters, we can select the learning rate q based on our prior work on optimal learning rate calibration for the PPF 69 . This is a topic in chapter 2. Now,toadaptivelylearntheunknownfieldmodelparametersin(1.3),wecanforexample use batch-based ML methods 15,65 . Here, to enable fast adaptation and systematic learning rate calibration 69 , we use an alternative Bayesian adaptive algorithm for these parameters (chapter 2). This Bayesian algorithm can update the parameters at every time-step when a new field feature is computed. Moreover, it enables the systematic selection of the learning rate to guarantee a desired steady-state parameter estimation error and convergence time 12 (to the true parameter value) during adaptation 69 . Similar to the case of spiking activity, takingthecovariancematrixZ in(1.3)asdiagonal,wecanlearntheparametersofeachfield featureseparatelygiventheintendedkinematicsx t . AssumingB featuresin(1.3), wedefine C =[ψ 1 ,...,ψ B ] ′ andZ =diag(Z 1 ,...,Z B ) whereψ b and Z b are the field model parameters and the noise covariance of feature b in (1.3), respectively. Similar to the case of spikes, we construct the prior model forψ b as a linear Gaussian random-walk model. Thus we get the following state-space model for the parameters ψ b ψ b t =ψ b t− 1 +s b t (1.21) y b t = ˜ x ′ t ψ b t +z b t (1.22) where s b t is a white Gaussian noise with the covariance matrix S b . Since we can learn each feature’s parameters separately, in what follows we remove the superscript b. If the noise variance Z of z t is known, then ψ can be learned using the KF corresponding to the state- spacemodelin(1.21)and(1.22)andwiththeknowledgeoftheintendedkinematics ˜ x t ,which can be treated as a coefficient vector for the linear observation model (1.22). We denote the mean and covariance of the prediction density of ψ by ψ t|t− 1 and S t|t− 1 , respectively, and the mean and covariance of its posterior density byψ t|t andS t|t , respectively. Then the KF recursions forψ are given as ψ t|t− 1 =ψ t− 1|t− 1 (1.23) S t|t− 1 =S t− 1|t− 1 +S (1.24) S − 1 t|t =S − 1 t|t− 1 + ˜ x t ˜ x ′ t Z − 1 (1.25) ψ t|t =ψ t|t− 1 +S t|t ˜ x t Z − 1 (y t − ˜ x ′ t ψ t|t− 1 ) (1.26) Similar to the case of spikes, the covariance matrix S dictates the learning rate of this adaptive KF. We denote S = sI n . Again since the optimal way to learn the spike-field 13 multiscale model parameters is to learn the field model parameters in (1.3) separately from the spike model parameters, we can select s using the learning rate calibration algorithm in ourpriorworkfortheKF 69 tosatisfyadesiredconvergencetimeandsteady-stateparameter estimation error. This is a topic in chapter 2. Finally, we can also estimate the noise variance Z adaptively if needed. Denoting f t = y t − ˜ x ′ t ψ t|t− 1 , wecanlearnZ andψ adaptivelytogetherbyusingthefollowingequationafter (1.23), (1.24) and before (1.25) and (1.26) Z t|t = 1 L− 1 t X j=t− L+1 (f j − ¯ f) 2 − 1 L t X j=t− L+1 ˜ x ′ j S j|j− 1 ˜ x j (1.27) where ¯ f = 1 L P t j=t− L+1 f j is the sample mean, and L is the number of samples used in estimating Z. Here (1.27) is given using the covariance matching technique 70 . 1.2.4 Numerical simulations To validate the multiscale framework, we first use extensive Monte-Carlo numerical sim- ulations. We simulate the brain strategy in closed-loop BMI control with an OFC model 12,37,50,69,71,72 . We assume that the subject performs a self-paced center-out-and- back reaching task to take a cursor on a computer screen from the center to one of eight peripheral targets and then return it to the center to initiate another trial 12 . The intended kinematics within this task are simulated using the OFC model. Given these intended kinematics, the spike and fields are in turn simulated using the encoding model in (1.4). We briefly present the simulation setup; details of OFC simulations have been given in our prior work 37,50,69,72 . 14 1.2.4.1 Simulating the intended kinematics using the OFC model of BMI In the OFC model, the kinematics evolve as x t =Ax t− 1 +Bu t +w t = I 2 ∆ I 2 0 α I 2 x t− 1 + 0 I 2 u t +w t (1.28) where ∆ is the time-step, α is the damping ratio for the velocity, and u t is the velocity control command decided by the brain. We set ∆ = 5 ms and α = 0.98 according to our prior NHP data 12,37 . We also take x t = [d ′ t ,v ′ t ] ′ where d t and v t are the 2D cursor position and velocity at time t. In the OFC model, the brain has learned the kinematic model internally through experience 37,71,73,74 . The brain determines the control command by minimizing a cost function that quantifies the goal of the task. For the target-directed movements in the center-out-and-back reaching task, we use the following quadratic cost function 12,37,69,71,73 J = ∞ X t=1 ∥d t− 1 − d ∗ ∥ 2 +w v ∥v t− 1 ∥ 2 +w u ∥u t ∥ 2 (1.29) where d ∗ is the target location, and w v and w u are weights selected to match the manual movementprofiles 37 . Thethreetermsinthecostfunctionenforcemovingtowardsthetarget location, stopping at the target, and minimal effort 75 , in order. Given the linear state model in (1.28) and the quadratic cost function in (1.29), the optimal control commandu t is given by the well-known infinite horizon linear quadratic Gaussian (LQG) solution 37,76 u t =− L(x t− 1 − x ∗ ) (1.30) where x ∗ = [d ∗′ ,0 ′ ] ′ and L is the feedback gain matrix calculated offline using the discrete algebraic Riccati equation (DARE) 76 . 15 We can now simulate the kinematics from (1.28) and (1.30). Also, in the adaptive MSF and to perform intention estimation, we use the OFC model of BMI, which has been shown in prior work to lead to accurate trained decoders 37 . 1.2.4.2 Simulating the multiscale neural activity We use the OFC-simulated intended kinematics to generate the multiscale spike-field neural signals using (1.3) and (1.1). We simulate 210 field features and 30 spike channels. We use a velocity-tuning model, assuming that neural signals encode velocity v t 12,54,58,64,77–79 . Given this velocity tuning model and given that x t = [d ′ t ,v ′ t ] ′ , for the field features, each row of C can be written as ψ ′ = [ξ, 0,0,η x ,η y ] = [ξ, 0 ′ ,η ′ ] (since there is no tuning to position directly). The bias ξ and the modulation depth ∥η ∥ are selected randomly and uniformly between [− 1,1] and [5,7], respectively. The preferred direction for each feature, i.e., the direction of the [η x ,η y ] vector, is randomly selected between [0,2π ]. The noise variance Z for each feature is randomly selected in [270.9,331.1]. These ranges for the field model parameters were chosen such that the signal to noise ratio (SNR) of the simulated field features is close to the SNR of the field features in our experimental NHP data. Here, we define SNR= std( ψ ′ ˜ x t )/ √ Z, where std(·) is the standard deviation, ψ is a row of the parameter matrix C corresponding to one feature in the field observation model (1.22), and Z isthenoisevarianceofeachfieldfeatureinthefieldobservationmodel(1.22). Theaverage field feature SNR in our data and simulations (using the parameter settings) are 0.227 and 0.223, respectively. Similarly, for spikes and given a velocity tuning model, we express the spike model parameters for each neuron as ϕ = [β, 0,0,α x ,α y ] ′ = [β, 0 ′ ,α ′ v ] ′ . We select the baseline firing rate randomly between [3 .3,5] Hz and the maximum firing rate randomly between [29,57] Hz. Again, the direction ofα v is selected randomly between [0,2π ]. 16 1.2.4.3 Center-out-and-back task requirements In the center-out-and-back task, we take each of the 8 targets to have a 0.06 radius and be distributed on a circle of 0.3 radius. The subject starts a trial by holding the cursor at the center. One of 8 targets is then selected randomly. The subject needs to move the cursor from the center to the target within at most 1 second and hold within the target for 300 ms withoutinterruption(sothereach-and-holdcantakeatmost1.3seconds). Thesubjectthen needs to move the cursor back to the center with the same reach-and-hold requirements. A trial is considered successful if reach-and-hold requirements are met in both the center-out andthebackmovements. Sincesimulationsareself-paced,thesubjectcanfinisheachtrialin less than the maximal duration (2.6 second). We simulate 400 trials with each non-adaptive filter (KF, PPF, and MSF) and 1000 trials with adaptive MSF. 1.2.4.4 Performance measures Asourdecodingmeasures, weusesuccessratethatevaluatesthenumberoftrialscompleted per minute and percent correct that finds the percentage of trials correctly completed. For theadaptiveMSF,wealsoexaminetheadaptationprofilesofthespike-fieldmultiscalemodel parameters over time to assess whether their estimates converge to their true values. 1.2.5 NHP experimental setup and validation We also test the MSF for decoding NHP spike-LFP activity recorded from motor cortical areas during a naturalistic 3D reach task towards targets placed at various locations in the 3D environment 44 . We decoded the 7 arm joint angular trajectories during the reaching movements using either spiking activity with the PPF, LFP activity with the KF, or multi- scale spike-LFP activity with the MSF. The 7 joints are shoulder elevation, elevation angle, shoulder rotation, elbow flexion, pro supination, wrist flexion, and wrist deviation. As our performancemetric,weusedthecorrelationcoefficientbetweenthetrueanddecodedangular trajectories. 17 An electrode array with 137 electrodes (large-scale microdrive system, Gray Matter Re- search, USA) covering parts of dorsal premotor cortex (PMd), ventral premotor cortex (PMv), andprimarymotor cortex (M1)wasused torecord spike-field activities. Rawneural signals were recorded with a 30 kHz sampling rate in real time as the subject performed the reaching task. Reflective markers were placed on subject’s right arm and monitored using infrared and near-infrared cameras at a sampling rate of 100 frames/sec. We obtained the 7 joint angular trajectories from the marker trajectories using an NHP musculoskeletal model 80 and inverse kinematics. 1.2.6 Neural data processing We extracted the spikes by passing the raw neural signals through a band-pass filter (0.3-6.6 kHz) and finding the threshold crossing events using a threshold of 3.5 standard deviations belowthemeanfilteredsignal. ToobtaintheLFPsignals,welow-passfilteredtherawneural signalswith400Hzcut-offfrequencyandthendown-sampledthesignalto1kHz. Fromeach LFP channel, we extracted the log-power in seven frequency bands: theta (4–8 Hz), alpha (8–12 Hz), beta 1 (12–24 Hz), beta 2 (24–34 Hz), gamma 1 (34–55 Hz), gamma 2 (65–95 Hz), and gamma 3 (130–170 Hz) 22 . The band power was calculated by performing common average referencing (CAR) first, and then using the short-time Fourier transform (STFT) with a 300 ms causal moving window every 50 ms. Taken together, in the experiment, the time-scale of spikes (i.e., ∆) was 10 ms and the time-scale of LFP log-power features was 50 ms. 1.2.7 NHP decoding analyses For each joint, we take the kinematic statex t as the corresponding joint angle. For general- ity,wesettheparametermatrixA=1andfitthenoisecovariance W usinglinearregression from the behavioral joint trajectory traces. To investigate whether we can combine infor- mation from spikes and LFP in the MSF while running at their different time-scales, we 18 performed channel addition analyses. In these analyses, we added various numbers of spike channels (randomly-selected) to a fixed number of LFP channels (randomly-selected) and vice versa. Overall, we performed the analyses across 7 experimental sessions. We group 40 electrodes as a channel pool in our channel addition analyses for computational tractability. We use ML to fit the spike-field multiscale model parameters ( C andZ in (1.3) and β c and α c in (1.1)). 1.2.7.1 Adding LFP activity to spiking activity To see the benefit of adding LFP to spikes in the MSF, we first select 10 random spike channels from the channel pool. We then use a PPF to decode the 7 joint angular tra- jectories using these spike channels within 20-fold cross-validation. We assess performance using the cross-validated correlation coefficient between the true and decoded joint angular trajectories. We then add LFP channels (randomly selected from the same channel pool) to the spike channels one-by-one, decode the spike-LFP activity using the MSF, and again assessperformanceusing20-foldcross-validation. Werepeatthisprocess50times,eachtime selecting a different random set of spike and LFP channels from the pool. We compute the average improvement in decoding performance as a function of the number of LFP channels added to the 10 spike channels within the MSF. In addition to 10 spike channels, we repeat the entire process of adding random LFP channels to random spike channels for 20 and 30 spike channels as well. 1.2.7.2 Adding spiking activity to LFP activity To see the benefit of adding spikes to LFP in the MSF, we repeat the above process by adding random spike channels one-by-one to 10, 20, or 30 random LFP channels. This time, we first decode the joint angular trajectories using the LFP channels alone within a KF, and then add the spike channels one-by-one and decode the combined spike-LFP activity within the MSF. 19 1.2.7.3 Investigating the effect of LFP and spike channel overlap Finally, to assess whether the degree of overlap between the spike and LFP channels affects the improvement in the MSF, we form two non-overlapping pools of 40 channels. In our channel addition analyses, we always select the spike channels from the first pool (called overlapping pool). Then, we pick the LFP channels in one of two ways: (i) to study MSF decoding with overlapping channels, we select the LFP channels to be exactly the same as the spike channels; (ii) to study MSF decoding with non-overlapping channels, we pick the LFP channels from the second non-overlapping pool. This construction allows us to assess whether the improvement when adding LFP to spikes or vice versa is a function of whether the spike-LFP come from the same channels or different channels. Toenableafaircomparisonbetweenoverlappingandnon-overlappingMSFdecoding, we design a two-step procedure for ensuring that the overlapping LFP channels have the same decoding performance as the non-overlapping LFP channels. This way, within the MSF channel addition analyses, the only difference between using the overlapping LFP pool and the non-overlapping LFP pool is the overlap with the spike channels. We define the quality of an LFP channel as the cross-validated (20-fold) KF decoding correlation coefficient using that channel. In the first step, we pick the two 40-channel pools such that they have a similar LFP quality as a whole. In the second step, when selecting a subset of LFP channels from each pool, we make sure that these two LFP channel subsets have the same decoding performance using an iterative selection algorithm detailed below. For the first step, we select a first pool of 40 LFP channels and measure the quality of each LFP channel in this first pool. Then, for each channel in the first pool, we find another channel not in the first pool whose LFP quality is closest to it. We repeat this process to select 40 channels (not in the first pool) and form the second pool, which we call the non-overlapping pool. For the second step, by designing an iterative algorithm, each time we select k random LFP channels in the overlapping pool (k =10, 20, or 30), we select k LFP channels in the 20 non-overlapping pool with a similar population decoding performance. In the first iteration, we select k random LFP channels from the overlapping pool (the overlapping subset) and note their rank in terms of individual channel quality. Then we select k LFP channels with the same rank, this time in the non-overlapping pool to construct the non-overlapping subset. In the second iteration, if the KF decoding performance of this non-overlapping subset is lower (higher) than that of the overlapping subset, we replace the worst (best) channel in the non-overlapping subset with the best (worst) channel in the rest of the non- overlappingpool. Werepeatthisiterativereplacementuntilthedecodingperformanceofthe non-overlappingsubsetishigher(lower)thanthatoftheoverlappingsubset. Theninthelast iteration,wetrysubstitutingthelastreplacedchannelinthenon-overlappingsubsetwithany remaining channels in the rest of the non-overlapping pool to find the channel combination that makes the decoding performanceof overlapping and non-overlapping subsets asclose as possible. We require the difference between the decoding correlation coefficients of the non- overlappingandoverlappingLFPchannelsubsetstobelessthan0.02. Ifwecannotmeetthis requirement, we select a new overlapping subset with k channels randomly and repeat the above iterative procedure. In total, we simulate 50 paired overlapping and non-overlapping LFP channel subsets for each k, where k = 10, 20, or 30. 1.3 Results To validate the multiscale framework and investigate its properties compared with single- scale decoding (i.e., using PPF for spikes and KF for fields), we use both Monte Carlo closed-loop BMI simulations and NHP decoding analyses. 1.3.1 Simulation validations We first validate the multiscale framework using numerical simulations. We find the perfor- manceofsingle-scaleandmultiscalefiltersbysimulatingaclosed-loopBMIwithaself-paced 21 KF PPF MSF Success rate (trials/min) 5 15 25 35 KF PPF MSF Percent correct 0 0.2 0.4 0.6 0.8 1 ) B ( ) A ( Figure 1.2: MSF performs better than PPF and KF in closed-loop BMI simula- tions. MSF improves the success rate and percent correct compared with spike decoding with PPF and field decoding with KF in closed-loop BMI simulations. Black vertical lines indicate the 95% confidence bounds of the means. center-out-and-back reaching task. We assess decoding performance using the success rate and percent correct measures. We compare spike-field decoding with MSF, spike decod- ing with PPF, and field decoding with KF. Simulation details can be found in Numerical Simulations section 1.2.4. Our simulations demonstrate the validity of the derivation for the MSF and adaptive MSF, and their ability to run at the fast time-scale of spikes while combining information from spikes and fields. We also show that adaptive MSF can learn all the spike-field multiscale model parameters at their different time-scales, in closed loop, and simultaneously. 22 variance (×100) 2.5 1.5 2 -1 1 -1 -1 3 -5 5 -4 2 5 4 3 0 800 1600 0 800 1600 0 800 1600 Time (sec) (A) (B) (C) Parameter adaptation True value 95% confidence bound β α x ξ η x η y α y Figure 1.3: Adaptive MSF can learn all the spike-field multiscale model parame- ters simultaneously in real time. (A) Parameters corresponding to a sample neuron. (B) Parameters corresponding to a sam- ple field feature. (C) The noise covariance of a sample field feature. 1.3.1.1 MSF runs at the fast time-scale of spikes and adds information from fields WefindthatMSFcanrunatthefasttime-scaleofspikesandmoreovercombineinformation across spikes and fields to improve performance compared with both PPF and KF. Figure 1.2 shows the comparison of MSF with PPF and KF. Both the MSF success rate (figure 1.2A) and percent correct (figure 1.2B) are significantly higher than PPF and KF (one-sided t-test, P < 10 − 9 ). These results show that MSF can run at the fast time-scale of spikes while adding information from fields at their slower time-scales. 1.3.1.2 AdaptiveMSFcanlearnallthespike-fieldmultiscalemodelparameters We next examined whether adaptive MSF could learn all the spike-field multiscale model parameters in (1.1)–(1.4) adaptively during closed-loop BMI control. Figure 1.3 shows the adaptation profiles of all spike-field multiscale model parameters, i.e., how their estimates evolve in time. We see that all estimated parameters converge to their true values. In figure 1.3Aand1.3B,wealsousedourcalibrationalgorithmtoselectlearningratesthatminimized 23 the convergence time to true values while ensuring that the steady-state estimation error of each parameter is confined within 10% of its average value. We found that all parameters satisfied these constraints; the 95% confidence bound at steady state is within plus and minus 10% of the average value for each parameter. These results suggest that adaptive MSF can learn all the spike-field multiscale model parameters at their different time-scales, simultaneously, efficiently, and accurately in real time. 1.3.1.3 AdaptiveMSFperformanceconvergestotheMSFwithtrueparameters To evaluate the accuracy of adaptive MSF in learning the parameters, we also examined the decoding performance during parameter adaptation. We tracked the success rate and percentcorrectaveragedinmovingwindowsconsistingof50trials(figure1.4). Asparameters converge,decodingtrajectoriesbecomestraighterandlessnoisy(figure1.4A).After250trials (approximately 500 seconds) and even before some spike-field multiscale model parameters fully converge (figure 1.3), the trajectories look straight. This proficient performance even before full convergence of all parameters is due to feedback correction that occurs in closed- loop BMI and compensates for the effect of slight parameter errors on decoding. Time- evolution of percent correct and success rate in figure 1.4B is consistent with the trend of the trajectory evolution in figure 1.4A. Moreover, after convergence, percent correct and successratearebothclosetothoseobtainedfromanMSFthatknowsthetrueparametersas evident from a comparison to the dashed horizontal lines in figure 1.4B. These results again indicate that adaptive MSF accurately learns the spike-field multiscale model parameters. 1.3.2 NHP spike-field decoding of naturalistic 3D reaches with MSF Our simulations showed the validity of our derivations in obtaining the MSF and adaptive MSF. Moreover, these simulations showed the ability of MSF and adaptive MSF to take into account different time-scales and statistical profiles of multiscale activity. To further 24 0.3 0 -0.3 -0.6 -0.3 0 0.3 X Y (A) Percent correct Success rate (trials/min) Number of trials 250 500 750 1000 0 0 0.2 0.4 0.6 0.8 1 5 15 25 35 The 1-50 trials The 51-100 trials The 201-250 trials The 951-1000 trials 0.3 0 -0.3 Y -0.6 -0.3 0 0.3 X (B) Percent correct Success rate Figure 1.4: Adaptive MSF performance converges to the performance of an MSF that knows the true parameters. (A) Sample decoded trajectories as a function of time into adaptation, which is indicated by the trial number since the beginning of adaptation. (B) Percent correct and success rate as a function of time into adaptation. Both metrics converge after 250 trials. Transparent coloredblocksmarkthetimeperiodcorrespondingtothetrialsindicatedinthesubfiguresin (A) with the same color scheme. Dashed horizontal lines show the performance of an MSF that knows the true parameters. 25 Shoulder elevation Wrist flexion Wrist deviation Time (sec) 0 4 8 12 Actual motion PPF (10 spike) MSF (10 spike + 30 LFP) (A) Actual motion KF (10 LFP) MSF (10 LFP + 30 spike) (B) Shoulder elevation Wrist flexion Wrist deviation 1 0 2 0.5 0 -0.5 0 -0.2 Rad Time (sec) 0 4 8 12 1 0 2 0.5 0 -0.5 0 -0.2 Figure 1.5: Sample true and decoded trajectories. True and decoded angular trajectories of the 3 most active joints are shown (joints whose movement standard deviations divided by peak-to-peak range of movement are the largest). In all figures, spike implies spike channel and LFP implies LFP channel. (A) Adding LFP channels to spike channels within the MSF improves decoding accuracy. (B) Adding spike channels to LFP channels within the MSF improves decoding accuracy. 26 0 30 0 30 Pearson’s r number of LFP number of spike (A) 0.6 0.5 0.4 0.5 0.4 0.3 0.6 0.5 0.4 (B) (C) 0 30 0 30 number of LFP number of spike 0 30 0 30 number of LFP number of spike 10 spike 20 spike 30 spike 10 LFP 20 LFP 30 LFP Figure 1.6: MSF improves decoding by combining information from spikes and LFPs. Figures show the average correlation coefficients between decoded and true angular trajec- tories. Spike impliesspikechannel andLFPimpliesLFPchannel. Solid/dashedcurves show the mean correlation coefficient and shaded regions show the 95% confidence bound of the mean. (A–C)MSFperformanceasrandomLFPchannelsareaddedone-by-oneto10(blue), 20(red),or30(green)randomspikechannels(leftpanel),orviceversa(rightpanel)forthree sample sessions. Here the start of the curves (i.e., 0 on the x-axis) indicates pure single-scale decoding, i.e., spike decoding with PPF (left panel) or LFP decoding with KF (right panel). validate the multiscale framework and investigate whether MSF can combine information across spikes and fields, we used it to decode 7 arm joint angles from NHP multiscale spike- LFP activity recorded during 3D reaching movements (figure 1.5). We investigated (1) whether MSF can improve performance compared with spike decoding with PPF and LFP decoding with KF, (2) whether this improvement was due to addition of information across scalesordominanceofinformationinonescale, (3)whethertheimprovementwasafunction ofthenumberofspikeorLFPchannelsincluded,and(4)whethertheimprovementdepended on the degree of overlap between spike and LFP channels used in the MSF. 1.3.3 MSF improves performance due to addition of information across scales rather than dominance of information in one scale We first examined whether MSF can improve performance compared with spike decoding with PPF. We compared performance of PPF with a fixed number of spike channels (10, 20, or 30) to that of MSF that added LFP channels one-by-one to these spike channels. Figure 27 1.5A shows sample true and decoded joint trajectories when LFP channels are added to spike channels. We found that, regardless of the number of spike channels, MSF improved performance compared with spike decoding with PPF (paired one-sided t-test, P < 10 − 18 ). Figure 1.6A–C left panels show the improvement in performance in 3 example sessions as a function of the numbers of LFP channels added to a fixed number of spike channels in the MSF. We next examined whether the improvement in performance was due to addition of information. In particular, it is possible that performance is improved simply because the more strongly tuned scale of activity dominates the other scale. Alternatively, performance may be improved because MSF is able to add information from both spikes and LFP. Thus, to show that improvement is due to addition of information, we should demonstrate an improvementboth whenaddingLFPchannelsone-by-onetoafixednumberofspikechannels, and, vice versa, when adding spike channels one-by-one to a fixed number of LFP channels. Consequently, we also added spike channels one-by-one to a fixed number of LFP channels (10,20,or30)withintheMSF.Figure1.5Bshowssampletrajectoriesinthiscase. Wefound that, regardless of the number of LFP channels, adding spikes within the MSF improved performance compared with LFP decoding using the KF (figure 1.6A–C right panels, paired one-sided t-test, P < 10 − 32 ). The narrow 95% confidence bounds of the mean correlation coefficients indicate the significance of all improvements. Taken together, these results suggest that MSF combines non-redundant information contained within spikes and LFP, while running at their different time-scales. 1.3.4 MSFimprovementisgreatestinthelow-informationregime We studied how MSF improvement over PPF or KF changed as a function of the number of spike or LFP channels included in these single-scale decoders (i.e., PPF and KF, respec- tively). When adding spikes to LFP, the improvement was greatest when few LFP channels were available (figure 1.7A). Similarly, adding LFP to spikes was most beneficial when few 28 10 LFP 20 LFP 30 LFP 0 0.1 Correlation coefficient improvement (A) average 1 2 3 4 56 7 Dataset 10 20 30 number of channels (B) non-overlapping overlapping 0.1 0 Correlation coefficient improvement 10 spike 20 spike 30 spike 0 0.1 0.2 Correlation coefficient improvement (C) average 1 2 3 4 56 7 Dataset 10 20 30 number of channels (D) 0.1 0 Correlation coefficient improvement non-overlapping overlapping Figure 1.7: MSF improvement when adding spike to LFP channels or vice versa is similar regardless of the degree of overlap between spike and LFP channels, and is greatest in the low-information regime. (A)MSFimprovementwhen30spikechannelsareaddedto10(blue),20(red),or30(green) LFPchannels(i.e.,themaximalincreaseincorrelationcoefficientintherightpanelsoffigure 1.6). The bar on the far right shows the average of the 7 bars on the left corresponding to the 7 sessions. Black vertical lines indicate the 95% confidence bounds of the means. MSF improvement is greatest in the low-information regime, i.e., when there are 10 LFP channels (blue). (B) MSF improvement when adding k spike channels to k LFP channels for k = 10,20,or 30 with complete or no overlap. Hollow bars show the case where the spike and LFP channels are exactly the same (100% overlap), and solid bars show the case with no overlap. Black vertical lines indicate the 95% confidence bounds of the means over 7 sessions. MSF improvement is similar regardless of the degree of overlap. Note that in (B) the number of spike channels is equal to the number of LFP channels unlike (A) where there are always 30 spike channels. (C) Figure convention is the same as (A) except that we add 30 LFP channels to 10 (blue), 20 (red), or 30 (green) spike channels (i.e., the maximal increase in correlation coefficient in the left panels of figure 1.6). (D) Figure convention is the same as (B) except that we add k LFP channels to k spike channels for k =10,20,or 30 withcompleteornooverlap. MSFimprovementissimilarregardlessofthedegreeofoverlap. 29 spikechannelswereavailable(figures1.7C).Thelatterisinlinewithapriorstudy 23 thathas shownthebenefitofusingLFPwithinaKalmanfilterwhenfewspikechannelsareavailable. Finally, we explored which LFP bands contributed the most to the MSF improvement. Werepeatedthesamechannel-additionanalysesforeachofthe7LFPbandsseparately, i.e., by adding 7 LFP bands separately to spikes. We found that both the high gamma (gamma 2 and gamma 3) and the beta (beta 1 and beta 2) bands resulted in MSF improvement but the high-gamma (gamma 2 and gamma 3) contributed the most to this improvement. This result is consistent with prior single-scale field decoding results that show the importance of these bands in motor representations 22,23,81–84 . 1.3.5 MSF improvement is similar regardless of the degree of overlap between spike and LFP channels WenextexaminedwhetherMSFimprovementdependedonthedegreeofoverlapbetweenthe channelsfromwhichspikesandLFPswererecorded. Forexample, itmaybethatspikesand LFPs on different channels contain more non-redundant information compared with those on the same channels. If so, the improvement may be greater if spikes and LFP come from different channels. We thus performed our channel addition analyses in two scenarios, when spikes and LFPs came from the exact same channels or when they came from completely different channels. Details can be found in the Decoding Analyses section 1.2.7. We found that when adding spikes to LFPs (figure 1.7B), MSF improvement was similar regardless of whether spikes and LFPs were recorded from the same channels or not (P > 0.37,two-sidedt-testover7sessions). SimilarresultsheldwhenaddingLFPstospikes(figure 1.7D, P > 0.42). These results suggest that there is benefit in multiscale decoding even if only the same channels are available for spike and LFP recording (e.g., if spikes are available on all channels). Furthermore, this result may suggest that spikes on a given channel do not fully determinetheLFPonthatchannel, i.e., thatLFPcontainsnon-redundantinformation compared with spikes. 30 1.4 Discussion We developed a multiscale modeling, adaptive learning, and decoding framework for spike- field activity that characterized the differences in spikes and fields in terms of statistical properties (binary spikes vs. continuous field features) and time-scales (e.g., milliseconds for spikes and tens of milliseconds for fields). We validated the framework within a motor task both using extensive closed-loop BMI simulations and with NHP motor cortical recordings duringa3Dnaturalisticreachtask. Themultiscaleencodingmodelconsistsofacombination of point process and Gaussian process likelihood functions. The multiscale filter (MSF) can run at the fast time-scale of spikes, while adding information from field features at their slower time-scales. Finally, the adaptive multiscale filter (adaptive MSF) learns the spike and field model parameters simultaneously, at their own time-scales, and in real time, which means that the processing of a new sample of neural data at the current time-step can be done causally and efficiently and be completed before the next data sample is obtained in the next time-step. Our closed-loop BMI simulations demonstrated that the MSF outperforms single-scale filters (figure 1.2) and that the adaptive MSF can accurately learn all the spike-field multi- scale model parameters simultaneously in closed loop (figures 1.3 and 1.4). We also decoded 7 3D joint angular trajectories of the NHP’s arm with motor cortical spike-LFP activity (figure 1.5). The NHP decoding results showed that MSF improved the single-scale decoder performances (KF and PPF) with various numbers of spike and LFP channels, that this improvement was due to addition of information across scales rather than dominance of in- formation in one scale (figures 1.6, 1.7A and 1.7C), and that the improvement was similar regardless of the degree of overlap between spike and LFP channels (figures 1.7B and 1.7D). ToexplorewhetherMSFcouldcombineinformationfromspikesandfields, itwascritical to show improvement in a bi-directional way: we had to show not only that adding spikes to a fixed number of LFP channels can improve performance, but vice versa. Otherwise, performance could have been improved simply due to a single scale dominating the other 31 scale. We thus performed extensive channel-addition and decoding analyses on the NHP spike-field data and showed that indeed the improvement was bi-directional. This result suggests that MSF combines non-redundant information from spikes and LFP. Further, this result may imply that the recorded spikes do not fully determine the recorded LFP on the same electrode array, thus giving rise to non-redundant information in LFP compared with spikes. One reason for this non-redundant information may be the different sources of LFP and spikes. Indeed, a main contributor to LFP are the synaptic currents while the spiking activity measures fast action potentials 5,6 . We also examined whether the improvement in MSF depends on the degree of overlap between the electrodes that record spikes and LFPs. We found that even in the two extreme cases where spikes and LFPs are recorded either from the exact same electrodes or from completely different electrodes, the MSF improvement was similar. One explanation for this result could again be that spikes and LFP contain non-redundant information even on the same electrode because of their different sources 5 . Another possible explanation could be that the cortical activity that gives rise to the recorded LFP covers a wider spatial range compared with the activity that generates an action potential on the same electrode. Prior studieshaveindicatedthatthespatialreachofthecorticalregionthatgeneratestheLFPcan be from a few hundred micrometers to several millimeters 6 . For example, this spatial range could depend on the degree of synchronization between the surrounding neural sources 6 . Thus the recorded LFP from an electrode may carry information about many more neurons than just the neurons recorded on that electrode (even potentially carry information about neurons not recorded by the entire array), thus again resulting in non-redundant motor- related information. To learn the spike-field multiscale model parameters, we developed adaptive MSF. Prior studies have shown that neural representations of behavior during BMI control could be different compared with during natural behavior 38–43 . Thus learning the decoder parameters adaptivelyandduringBMIcontrolcanimproveBMIperformance 12,37,38,43,54,58–65 . Moreover, 32 prior studies have shown that faster time-scales of adaptation result in faster convergence of model parameters 37,65 . We thus developed the adaptive MSF to fit the spike-field multiscale model parameters simultaneously, at the fastest time-scale possible, and in real time. In particular, MSF allows the spike model parameters to be updated at the millisecond time- scale of the spikes (i.e., with every 0 and 1 spike event) and the field model parameters to be updated at their own time-scale (i.e., at every time-step that a new field feature is computed). We showed that MSF could accurately learn all these parameters in closed-loop simulations. Here, we focused on developing the multiscale framework as a computational tool for studying information encoding and for developing future neurotechnologies. We then vali- dated the framework using closed-loop simulations and offline NHP data analyses. Future studies that implement the MSF in closed-loop BMI experiments are critical to show the extent to which MSF can benefit real-time BMI designs. MSF adds information from fields to information from spikes, while still running at the fast time-scale of spikes; it can thus provide a fast rate of control and feedback in closed loop. MSF also precisely models the spikes as point processes. Prior closed-loop BMI studies have shown the benefit of shorter latency 85,86 andfastercontrolandfeedbackrates 12,37 onclosed-loopBMIperformance. Also, point process models of spikes have been shown to be beneficial in spike-based closed-loop BMIs 12,37 . Thus,thesepriorclosed-loopstudiessuggestthatthefasttime-scale,thefastrate of control and feedback, and the precise modeling of the statistical properties of both binary spikes and continuous field features enabled by the MSF may help with future closed-loop BMIs that use multiple scales of activity. Investigating the properties of MSF in closed loop is an important area for future investigation. Taken together, we developed a multiscale computational framework consisting of mul- tiscale encoding models, adaptive learning algorithms, and decoders for simultaneous spike- field activity that takes into account the different statistical characteristics and time-scales of these signals. Regardless of the number of spike or field channels or the degree of overlap 33 between these channels, MSF could run at the fast time-scale while combining informa- tion. This multiscale framework has significant implications as a tool for developing future neurotechnologies such as motor BMIs. 34 Chapter 2 Learning Rate Calibration Algorithm for Adaptive PPF and KF 2.1 Introduction Aswementioninchapter1,recenttechnologicaladvanceshaveenabledthereal-timerecord- ing and processing of different neural signal modalities, including the electrocorticogram (ECoG), local field potentials (LFP), and spiking activity 5 . This real-time recording ca- pability has allowed for the motor brain-machine interfaces (BMI) to restore movement to disabledpatientsbyrecordingtheneuralactivityinrealtime,decodingfromthisactivitythe motor intent of the subject, and using the decoded intent to actuate and control an external device 24,25,27–34 . For example, the multiscale filter (MSF) we derive in chapter 1 is a kind of BMI. In general, the closed-loop BMI should learn system parameters adaptively, and this requires a good learning rate. In this chapter, we derive the corresponding learning rate calibration algorithm for selecting the learning rate optimally. The content of this chapter is a compact version of our published paper 69 . Please refer to it if you want to know more details. We first briefly introduce the architecture of the closed-loop neural systems, which need tolearnanencodingmodelthatrelatestheneuralsignal(e.g.,spikes)totheunderlyingbrain 35 User-specified objective Calibration algorithm Encoding model parameter estimation Brain state decoder Device (e.g., prosthetic, stimulator) Neural signal Optimal learning rate Next brain state Output New model parameter Brain states in training experiment Figure 2.1: Closed-loop neural system architecture. Closed-loop neural systems often need to learn an encoding model adaptively and in real time. Theencodingmodeldescribestherelationshipbetweenneuralrecordingsandthebrain state. For example, the relevant brain state in motor BMIs is the intended velocity and in DBS systems is the disease state, e.g., in Parkinson’s disease. The neural system uses the learned encoding model to decode the brain state. This decoded brain state is then used, for example, to move a prosthetic in motor BMIs or to control the stimulation pattern in DBS systems. Acriticalparameterforanyadaptivelearningalgorithmisthelearningrate, which dictates how fast the encoding model parameters are updated as new neural observations are received. An analytical calibration algorithm will enable achieving a predictable level of accuracy and speed in adaptive learning to improve the transient and steady-state operation of neural systems. state (e.g., motor intent) for each subject. The encoding model is often taken as a paramet- ric function and is used to derive mathematical algorithms, termed decoders, that estimate the subject’s brain state from their neural activity. These closed-loop neural systems run in real time and often require the encoding model parameters to be learned in closed loop, online and adaptively (figure 2.1). For example, in motor BMIs, neural encoding differs for movement of the BMI compared to that of the native arm or to imagined movements 38–41 . 36 Hence encoding model parameters are better learned adaptively in closed-loop BMI oper- ation 12,37,38,54,58,60–65 . Another reason for real-time adaptive learning is the non-stationary natureofneuralactivitypatternsovertime, forexampleduetolearninginmotorBMIs 38–40 , due to new experience in the hippocampus 87,88 , or due to stimulation-induced plasticity in DBS systems 89–91 . Adaptive learning algorithms in closed-loop neural systems, such as adaptive Kalman filters (KF), are typically batch-based. They collect batches of neural activity, fit a new set of parameters in each batch using maximum-likelihood techniques, and update the model parameters 54,61,65 . In addition to these methods, adaptive point pro- cess filters (PPF) have also been developed for tracking plasticity in offline datasets 9,87,88,92 . Recently, control-based state-space algorithms have been designed for adaptive learning of point process spike models during closed-loop BMI operation, and have improved the speed of real-time parameter convergence compared with batch-based methods 12,37 . A critical design parameter in any adaptive algorithm is the learning rate, which dictates how fast model parameters are updated based on a new observation of neural activity (Fig. 2.1). The learning rate introduces a trade-off between the convergence time and the steady- state error of the estimated model parameters 93 . Increasing the learning rate decreases the convergence time, allowing for parameter estimates to reach their final values faster. How- ever, thisfasterconvergencecomesatthepriceofalargersteady-stateparameterestimation error. Similarly, a smaller learning rate will decrease the steady-state error, but lower the speed of convergence. Hence principled calibration of the learningrate is critical for fast and accurate learning of the encoding model, and consequently for both the transient and the steady-state performance of the decoder. To date, however, adaptive algorithms have chosen the learning rate empirically. For example, in batch-based methods, once a new batch estimate is obtained, the parameter es- timatesfrompreviousbatchesareeitherreplacedwiththesenewestimates 54 oraresmoothly changedbyweighted-averagingbasedonadesiredhalf-life 61,65 . Inadaptivestate-spacealgo- rithms, such as adaptive PPF, learning rate is dictated by the choice of the noise covariance 37 in the prior model of the parameter decoder, which is again chosen empirically 9,37,94 . Given the significant impact of the learning rate on both the transient and the steady-state per- formance of closed-loop neurotechnologies, it is important to develop a principled learning rate calibration algorithm that can meet a desired error or convergence time performance for any neural recording modality (such as spikes, ECoG, and LFP) and across applications. In addition to neurotechnologies, designing such a calibration algorithm is also of great im- portance in general signal processing applications. Prior adaptive signal processing methods have largely focused on non-Bayesian gradient decent algorithms. These algorithms, how- ever, do not predict the effect of the learning rate on error or convergence time (except for a limited case of scalar linear models; see Discussions) and hence can only provide heuristics for tuning the learning rate 95,96 . A calibration algorithm that can write an explicit function for the effect of the learning rate on error and/or convergence time for both linear and non- linear observation models would also provide a novel approach for learning rate selection in other signal processing domains 59,97–99 . For example, in image processing 99 , in electrocar- diography 97 , in anesthesia control 59 , and in unscented Kalman filters 98 , adaptive filters with learning rates are used in decoding system states or in learning system parameters in real time (see Discussions). Here, we develop a mathematical framework to optimally calibrate the learning rate for Bayesian adaptive learning of neural encoding models. We derive the calibration algorithm both for learning a point process model of discrete-valued spiking activity and for learning a Gaussian model of continuous-valued neural activities (e.g., LFP or ECoG). Our framework derives an explicit analytical function for the effect of learning rate on parameter estimation error and/or convergence time. Minimizing the convergence time and the steady-state error covariance are competing requirements. We thus formulate the calibration problem through the fundamental trade-off that the learning rate introduces between the convergence time and the steady-state error, and derive the optimal calibration algorithm for two alternative 38 objectives: satisfying a user-specified upper-bound on the steady-state parameter error co- variance while minimizing the convergence time, and vice versa. For both objectives, we derive analytical solutions for the learning rate. The calibration algorithm can pre-compute the learning rate prior to start of real-time adaptation. Weshowthatthecalibrationalgorithmcananalyticallysolvefortheoptimallearningrate for both point process and Gaussian encoding models. We use extensive Monte-Carlo simu- lations of adaptive Bayesian filters operating on both discrete-valued spikes and continuous- valuedneuralobservationstovalidatetheanalyticalpredictionsofthecalibrationalgorithm. With these simulations, we demonstrate that the learning rate selected analytically by the calibration algorithm minimizes the convergence time while satisfying an upper-bound on the steady-state error covariance or vice versa. Thus the algorithm results in fast and accu- ratelearningoftheencodingmodel. Inadditiontotheencodingmodel, wealsoexaminethe influenceofthecalibrationalgorithmondecodingbytakingamotorBMIsystem,whichuses discrete-valued spikes or continuous-valued neural activity (e.g., ECoG or LFP), as an ex- ample. Using extensive closed-loop BMI simulations that closely conform to our non-human primateexperiments 37,72,94 , weshowthatanalyticallyselectingtheoptimallearningratecan improve both the transient operation of the BMI by allowing its decoding performance to converge faster, and the steady-state performance of the BMI by allowing it to learn a more accuratedecoder. Wealsoshowthatlargelearningratesleadtoinaccurateencodingmodels anddecoders,andsmalllearningratesdelaytheconvergenceofencodingmodelsanddecoder performance. By providing a novel analytical approach for learning rate optimization. this calibration algorithm has significant implications for closed-loop neurotechnologies and for other signal processing applications (see Discussions). 39 Find the optimal learning rate Observation signal type? Input: The range of firing rates in (2.12). Input: The range of the noise covariance Z in (2.1). Objective? User-specified criteria (theorem 2): 1. Error tolerance at convergence time (E rest ). 2. Upper-bound of the convergence time (C bd ). User-specified criteria (theorem 2): Upper-bound of the error covariance (V bd ). Calculate H ave and h 1 in theorem 1. Continuous: calculate H ave and h 1 in theorem 1. Discrete: calculate M ave and a 1 in theorem 3. Continuous signal Discrete signal Convergence time Steady-state error covariance Objective: steady-state error covariance Find the optimal learning rate (theorem 2): Use equation (2.10). Find the optimal learning rate (theorem 2): Continuous: use equation (2.9). Discrete: take a 1 =h 1 and use equation (2.9). Figure 2.2: Flowchart of the calibration algorithm. 40 2.2 Methods We derive the calibration algorithm for adaptation of two widely-used neural encoding models—the linear Gaussian model of continuous-valued signals such as LFP and ECoG and the point process model of the spiking activity. In the former case, the calibration algo- rithm adjusts the learning rate of a new adaptive KF that we present here, and in the latter case it adjusts the learning rate of an adaptive PPF. We design the calibration algorithm for adaptivePPFandKF,asthesefiltershavebeenvalidatedinclosed-loopnon-humanprimate and human experiments both in our work and in other studies 12,37,38,54,58,60–65 . However, to date, the learning rates in these filters have been selected using empirical tuning. Instead, the new calibration algorithm provides a novel analytical approach for selecting the learning rate to achieve a predictable and desired level of parameter error and convergence time in these adaptive filters. In both the adaptive PPF and the new adaptive KF, the learning rate is dictated by the noise covariance of the decoder’s prior model for the parameters. In what follows, we derive calibration algorithms for two possible objectives: to keep the steady-state parameter error covariancesmallerthanauser-specifiedupper-boundwhileminimizingtheconvergencetime, or to keep the convergence time faster than a user-specified upper-bound while minimizing the steady-state error covariance. We first derive analytical expressions for both the steady- state error covariance and the convergence time as a function of the learning rate by writing the recursive error dynamics and the corresponding recursive error covariance equations for the adaptive PPF and adaptive KF. By taking the limit of these recursions as time goes to infinity, we find the analytical expressions for the steady-state error covariance and the convergence time as a function of the learning rate. We then find the inverse map of these functions, which provide the optimal learning rate for a desired objective. We also introduce the numerical simulation setup used to evaluate the effect of the calibration algorithm on both encoding models and decoding. The flowchart of the calibration algorithm is in figure 2.2. Readers mainly interested in the results can skip the rest of this section. 41 2.2.1 The calibration algorithm for continuous neural signals In this section, we derive the calibration algorithm for continuous signals such as LFP and ECoG. We first present the observation model and the adaptive KF for these signals. We then find the steady-state error covariance and the convergence time as functions of the learning rate. Finally, we derive the inverse functions to select the optimal learning rate. 2.2.1.1 Adaptive KF We denote the continuous observation signal, such as ECoG or LFP, from channel c by y c t . This continuous signal can, for example, be taken as the LFP or ECoG log-power in a desired frequency band as these powers have been shown to be related to the underlying brain states 6,100 . As in various previous work 15,19 , we construct the continuous observation model as a linear Gaussian function of the underlying brain state y c t =(ψ c ) ′ ˜ v t +z c t (2.1) Here ˜ v t = [1,v ′ t ] ′ is a column vector, where v t is the encoded brain state. Also, ψ c = [ξ c ,(η c ) ′ ] ′ is a column vector containing the encoding model parameters to be learned. In particular, ξ c is the baseline power andη c depends on the application. Finally, z c t is a white Gaussian noise with variance Z c . As an example, in motor BMIs, we take the brain state v t as the intended velocity command whether in moving one’s arm or in moving a BMI. We thus select η c = [η c x ,η c y ] ′ = ∥η c ∥[cos(θ c ),sin(θ c )] ′ with ∥η c ∥ the modulation depth and θ c the preferred direction of channel c. The goal of adaptation is to learn the encoding model parameters in (2.1), i.e., ψ c . In some cases, it may also be desired to learn Z c adaptively. Here, wefirstfocusonadaptivelearningoftheparameters ψ c andthederivation of the calibration algorithm. We then present a method to learn Z c concurrently with the parameters. 42 WewritearecursiveBayesiandecodertolearntheparametersψ recursivelyinrealtime. Assuming that all channels are conditionally independent 12,37 , we can adapt the parameters for each channel separately. For convenience, we drop the superscript of the channel in what follows. A recursive Bayesian decoder consists of a prior model for the parameters, which modelstheiruncertainty; italsoconsistsofanobservationmodelthatrelatestheparameters to the neural activity. The observation model is given by (2.1). We build the prior model by modeling the uncertainty ofψ as a random-walk 37 ψ t =ψ t− 1 +s t (2.2) wheres t is a white Gaussian noise with covariance matrix S =sI n (s>0), whereI n is the identity matrix and n is the parameter dimension. We define s as the learning rate since it dictateshowfastparametersareupdatedintheBayesiandecoderasnewneuralobservations are made 101 . Our goal is to solve for the optimal s that achieves a desired trade-off between the steady-state error covariance and convergence time. Combining(2.1)and(2.2)andsinceboththepriorandobservationmodelsarelinearand Gaussian, wecanderivearecursiveKFtoestimateψ t fromy 1:t includingallobservationsup to time t. KF finds the minimum mean-square error (MMSE) estimate of the parameters, which is given by the mean of the posterior density. Denoting the posterior and prediction means by ψ t|t and ψ t|t− 1 , and their covariances by S t|t and S t|t− 1 , respectively, the KF recursions are given as ψ t|t− 1 =ψ t− 1|t− 1 (2.3) S t|t− 1 =S t− 1|t− 1 +S (2.4) S − 1 t|t =S − 1 t|t− 1 + ˜ v t ˜ v ′ t Z − 1 (2.5) ψ t|t =ψ t|t− 1 +S t|t ˜ v t Z − 1 (y t − ˜ v ′ t ψ t|t− 1 ) (2.6) 43 Hence given the encoded brain/behavioral state ˜ v t in a training session, we can learn the parameters adaptively using (2.3)–(2.6). In our motor BMI example, the encoded brain state is the intended velocity and can be either observed or inferred using a supervised training session 15,37,38,54,55,60,61 as we describe in simulation setup. In applications such as motor BMIs, there is typically a second decoder that takes the estimated parameters from (2.3)–(2.6) to decode the brain state, e.g., the kinematics (figure 2.1). However, this brain statedecoderdoesnotaffecttheparameterdecoder 37,54,61 . Wediscussthesimulationdetails later in this Methods section. 2.2.1.2 Overview of the two objectives for the calibration algorithm We define ψ ∗ as the unknown true value of the parameters ψ to be learned. Under mild conditions 69 which are satisfied in our problem setup, ψ t|t in (2.6) is an asymptotically unbiased estimator (lim t→∞ E[ψ t|t ] = ψ ∗ ). There are two objectives that the calibration algorithm can be designed for. First, we can minimize the convergence time—defined as the time it takes for the difference ( ψ ∗ − E[ψ t|t ]) to converge to 0—subject to an upper-bound constraint on the steady-state error covariance of the estimated parameters. Second, we can minimize the steady-state error covariance of the estimated parameters, i.e., the Euclidean 2-norm ∥Cov[ψ t|t ]∥, while keeping the convergence time below a desired upper-bound. We derive the calibration algorithm for each of these objectives and provide them in Theorems 1 and 2. 2.2.1.3 Calibration algorithm: analytical functions to predict the learning rate effect on parameter error and convergence time Regardlessoftheobjective,toderivethecalibrationalgorithmwefirstneedtowritetheerror dynamics in terms of the learning rate s. We denote the estimation error byg t =ψ ∗ − ψ t|t . We denote the estimation error covariance at time t by S ∗ t|t = E[g t g ′ t ] = Cov[ψ t|t ] since ψ t|t is asymptotically unbiased 69 . We denote the limit of S ∗ t|t in time, which is the steady-state 44 error covariance, byS ∗ + . Our goal is to express the steady-state error covarianceS ∗ + and the convergence time of E[g t ] as functions of the learning rate s. To find the steady-state error covariance S ∗ + as a function of s, we first derive a recursive equationtocomputeS ∗ t|t − S t|t from(2.3)–(2.6)asafunctionofthelearningrate. Bysolving thisrecursiveequationandtakingthelimitas t→∞withsomeapproximations, weexpress S ∗ + as a function of the learning rate s. Similarly, by finding a recursive equation for E[ g t ] as a function of s and solving it using an approximation, we express the convergence time of E[g t ] as a function of the learning rate s. To make the derivation tractable, we assume that the encoded behavioral state v t during the training session (i.e., the experimental session in which parameters are being learned adaptively) is periodic with period T. This holds in many cases, for example in motor BMIs in which the training session involves making periodic center-out-and-back movements 37,54,61 . We will show later that even in cases where the behavioral state is not periodic, our derivations of the steady-state error covariance as a function of the learning rate allow for accurate calibration to achieve the desired objectives. Thederivationsarelengthyandarethusprovidedinourpublishedwork 69 . Belowwepresent the conclusions of the derivations in the following theorem. This theorem is the basis for the calibration algorithm in the case of adaptive KF for continuous neural signal modalities. Theorem 1. Assume that the encoded statev t is periodic with period T. We define H ave = 1 T P T t=1 ˜ v t ˜ v ′ t Z − 1 and write its eigenvalue decomposition as H ave =U× diag(h 1 ,...,h n )× U ′ with (0<h i ≤ h i+1 ). We also define κ m = p h 2 m s 2 +4h m s− h m s 2h m (m=1,...,n) 45 The steady-state error covariance, S ∗ + , can be expressed as S ∗ + =U κ 2 1 +sκ 1 2κ 1 +s . . . κ 2 n +sκ n 2κ n+s U ′ (2.7) where κ 2 m +sκ m 2κ m+s = s √ h 2 m s 2 +4hms = 1 √ h 2 m + 4hm s is monotonically increasing with respect to s. The convergence dynamics of the expected error E[g t ] can be expressed as E[g t ]=(U κ 1 κ 1 +s . . . κ n κ n+s U ′ )× E[g t− 1 ] (2.8) where κ m κ m+s =1− √ h 2 m s 2 +4hms− hms 2 is monotonically decreasing with respect to s. From (2.8), the behavior of the expectation of the estimation error E[g t ] across time is dominated by the largest diagonal term, κ 1 κ 1 +s , whose inverse we define as the convergence rate. Since U is the unitary matrix of the eigenvalue decomposition of H ave , which is not related to s, U is independent of the learning rate s and the diagonal terms of S ∗ + are strictlyincreasingfunctionsofs. Thisisintuitivelysoundsinceahigherlearningrateresults in a larger error covariance at steady state. Also, the inverse of convergence rate in (2.8) is monotonically decreasing with respect to s. This monotonically decreasing relationship is also intuitively sound: a faster convergence rate requires a larger learning rate. These relationships clearly show the trade-off between the steady-state error covariance S ∗ + and the convergence time. All these properties will be confirmed in the Results section. Now that we have an analytical expression for the steady-state error covariance and the convergence rate as functions of the learning rate s ((2.7) and (2.8), respectively), all we 46 need to do is to find the inverse of these functions to solve for the optimal learning rate s from a given upper-bound onS ∗ + or on the convergence time. 2.2.1.4 Calibration algorithm: the inverse functions to compute the learning rate We now derive the inverse functions of equations (2.7) and (2.8) to compute the optimal learning rate s for each of the two objectives in the calibration algorithm. To derive the inverse function for computing the learning rate corresponding to a given steady-state error covariance, weformulatetheoptimizationproblemasthatofcalculatingthelargestlearning rate s that satisfies ∥S ∗ + ∥=lim t→∞ ∥Cov[ψ t|t ]∥≤ V bd , where V bd is the desired upper-bound on the steady-state error covariance. We want the largest learning rate that satisfies this relationship because the convergence time is a decreasing function of the learning rate and hence will benefit from larger rates. The key step in solving this inequality is observing that the 2-norm∥S ∗ + ∥=lim t→∞ ∥Cov[ψ t|t ]∥ is the largest singular value ofS ∗ + , which is also the largest eigenvalue ofS ∗ + due to its positive definite property. Since the eigenvalues of S ∗ + are analyticfunctionsofthelearningrateinTheorem1, wecansolvetheinequalityanalytically. The details of this derivation are in our published work 69 . For the learning rate optimization to satisfy a given convergence time upper-bound, the goal is to calculate the smallest learning rate s that makes ∥E[gt]∥ ∥E[g 0 ]∥ ≤ E rest within the given time C bd , where C bd is the upper-bound of the convergence time and E rest is the relative estimationerror(e.g.,5%)atwhichpointweconsidertheparameterstohavereachedsteady state. We want the smallest learning rate that satisfies the convergence time constraint because the steady-state error decreases with smaller learning rates. The key in solving this inequality is noting that ∥E[g t ]∥ converges exponentially with the inverse convergence rate definedinTheorem1. So ∥E[gt]∥ ∥E[g 0 ]∥ canbewrittenasafunctionofthelearningrate sexplicitly. The derivation details are in our publication 69 . 47 We provide the conclusions of the above derivations resulting in the inverse functions for both objectives in the following theorem: Theorem 2. Calibration objective 1 to constrain steady-state error: Assume the time-step in (2.8) is ∆ seconds and h 1 is the smallest eigenvalue of H ave defined in Theorem 1. The optimal learning rate to achieve an upper-bound V bd on the steady-state error covariance while allowing for the fastest convergence time is given by s= 4h 1 1 V 2 bd − h 2 1 with 1 V 2 bd >h 2 1 (2.9) Calibration objective2 toconstrain convergencetime: Define C T = 1 4h 1 × (E rest ) ∆ C bd , which is independent of the learning rate s. The optimal learning rate to achieve an upper-bound C bd on the convergence time, defined to be the time-point at which the relative parameter error is E rest , is given by s= C T 4h 2 1 × ( 1 C T − 4h 1 ) 2 (2.10) To summarize, if the objective is to bound the steady-state error covariance, then the user will select the upper-bound V bd , calculate H ave defined in Theorem 1, and apply (2.9) to find the optimal learning rate s. If the objective is to bound the convergence time, the user will select the upper-bound C bd , what percentage of error at convergence time they are willing to tolerate E rest , calculateH ave , and use (2.10) to find the optimal learning rate s. 2.2.1.5 Concurrent estimation of the noise variance Z Sofarwehaveassumedthattheobservationnoisevariance,Z,in(2.1)isknown(forexample through offline learning). However, this variance may need to be estimated online just like the encoding parameters ψ . We can address this scenario by using our knowledge of the range of possible Z’s, i.e., (Z min and Z max ) and use the calibration algorithm to compute 48 the learning rate for both Z min and Z max . Then for the first calibration objective, we can select the smaller of the two s’s corresponding to Z min and Z max . This smaller s gives the mostconservativechoicetoassureagivenupper-boundforthesteady-stateerrorcovariance. Similarly,forthesecondcalibrationobjective,wecanselectthelargerofthetwos’stoassure agivenupper-boundontheconvergencetime. Thismethodisvalidsincethelearningrateis a monotonic function of Z. We can see this by noting thatH ave in theorem 1 is monotonic with respect to Z, and so are its eigenvalues{h 1 ,...,h n }. From (2.9) and (2.10), the learning rate s is also a monotonic functions of h 1 . Together, these imply that the learning rate is a monotonic function of Z. Finally, to adaptively estimate Z in real time, we can use the covariance matching tech- nique 70 . Denoting q t = y t − ˜ v ′ t ψ t|t− 1 , we can estimate Z adaptively by adding the following equation to the recursions in (2.3)–(2.6): Z t|t = 1 L− 1 t X j=t− L+1 (q j − ¯q) 2 − 1 L t X j=t− L+1 ˜ v ′ j S j|j− 1 ˜ v j (2.11) where ¯q = 1 L P t j=t− L+1 q j is the sample mean, and L is the number of samples used in estimatingZ. Here(2.11)isderivedusingthecovariancematchingtechnique. Thederivation detail can be found in literature 70 . 2.2.2 The calibration algorithm for discrete-valued spikes We now follow the same formulation used for continuous-valued signals, such as LFP or ECoG, to derive the calibration algorithm for the discrete-valued spiking activity. The derivation follows similar steps but, due to the nonlinearity in the observation model, has some differences that we point out. Given the nonlinearity, in this case, the calibration algorithm can be derived for the main first objective, i.e., to keep the steady-state error covariance below a desired upper-bound while minimizing convergence time (figure 2.2). 49 2.2.2.1 Adaptive PPF The spiking activity can be modeled as a time-series of 0’s and 1’s, representing the lack or presence of spikes in consecutive time-steps, respectively. This discrete-time binary time- series can be modeled as a point process 8,10,13,68,87,88 . A point process is specified by its instantaneous rate function. Prior work have used generalized linear models (GLM) to model the firing rate as a log-linear function of the encoded state (e.g., the intended velocity in a motor BMI)v t 8–10,12,37,68 . Denoting the binary spike event of neuron c at time t by N c t , and the time-step by ∆ as before, the point process likelihood function is given by 8,10 p(N c t |v t )=(λ c (v t )∆) N c t e − λ c (vt)∆ (2.12) Here λ c (·) is the firing rate of neuron c and is taken as λ c (v t )=exp(β c +(α c ) ′ v t ) (2.13) whereϕ c =[β c ,(α c ) ′ ] ′ are the encoding model parameters to be learned. For spikes, a PPF can estimate the parameters given a training dataset in which the encoded brain state can be either observed or inferred 8–10,92 . For example, adaptive PPF has been used to track neural plasticity in the rat hippocampus 87,88,92 . For motor BMIs, a closed-loop adaptive PPF has been developed to learnϕ c using an optimal feedback-control model to infer the intended velocity, resulting in fast and robust parameter convergence 37 . As in the adaptive KF case, the adaptive PPF assumes that all neurons are conditionally independent so every ϕ c can be updated separately 9,37,92 . From now on, we remove the superscript c for convenience. Denote the true unknown value of ϕ by ϕ ∗ . We model our uncertainty aboutϕ in time as a random-walk 37 ϕ t =ϕ t− 1 +q t (2.14) 50 where q t is a white Gaussian noise with covariance matrix Q = rI n (r > 0) and r is the learningratehere. Denotingtheposteriorandpredictionmeansbyϕ t|t andϕ t|t− 1 , and their covariances by Q t|t and Q t|t− 1 , respectively, the adaptive PPF finds ϕ t with the following recursions 37 ϕ t|t− 1 =ϕ t− 1|t− 1 (2.15) Q t|t− 1 =Q t− 1|t− 1 +Q (2.16) Q − 1 t|t =Q − 1 t|t− 1 + ˜ v t ˜ v ′ t λ (t|ϕ t|t− 1 )∆ (2.17) ϕ t|t =ϕ t|t− 1 +Q t|t ˜ v t [N t − λ (t|ϕ t|t− 1 )∆] (2.18) Hereλ (t|ϕ t|t− 1 )=exp(˜ v ′ t ϕ t|t− 1 )andasbefore ˜ v t =[1,v ′ t ] ′ ,wherev t istheencodedbehavioral state (e.g., rat position in a maze or intended velocity in BMI), which is either observed or inferred. In studying the hippocampal place cell plasticity, for example, rat position can be observed. In motor BMIs, the intended velocity can be inferred using a supervised training session 37,54,61 as we present in the simulation setup. We now derive a calibration algorithm for the learning rate r in the adaptive PPF (2.15)–(2.18). The calibration algorithm mini- mizes the estimated parameter convergence time of E[ϕ t|t ]→ϕ ∗ under a given upper-bound constraint on the steady-state error covariance∥Cov[ϕ ∗ − ϕ t|t ]∥. 2.2.2.2 Calibration algorithm: analytical function and inverse function Learning rate calibration for spikes can again be posed as an optimization problem. We denote the error vector by e t = ϕ ∗ − ϕ t|t and the error covariance by Cov[e t ] = Q ∗ t|t . We can show that ϕ t|t , which is PPF’s estimate of the parameters, is asymptotically unbiased (lim t→∞ E[ϕ t|t ] = ϕ ∗ ) under some mild conditions 69 . We define the steady-state error co- variance as Q ∗ + = lim t→∞ Q ∗ t|t . Thus the goal of the optimization problem is to select the optimal learning rate r that minimizes the convergence time of E[e t ]→0 while keeping the 2-norm of the steady-state error covariance Q ∗ + smaller than the user-defined upper-bound. 51 We derive the calibration algorithm similar to the case of continuous signals. We first find a recursive equation for Q t|t − Q ∗ t|t using (2.15)–(2.18). We then solve this equation and take the limit t→∞ with some approximations to write the steady-state error covariance Q ∗ + as an analytic function of the learning rate r. For tractability of derivations, for now we assume that the behavioral state in the training set, e.g., the intended velocity {v t }, is periodic with period T. As we also mentioned in the case of continuous signals, this assumptionisreasonableinmanyapplicationssuchasmotorBMI.However, wewillshow in the Results section that the calibration algorithm still works even when this assumption is violated. The derivation detail is presented in our publication 69 . The derivation shows that the steady-state error covariance Q ∗ + is given as follows: Theorem3. Assume the encoded statev t is periodic with period T. We write the eigenvalue decomposition of M ave = 1 T P T t=1 ˜ v t ˜ v ′ t λ (t|ϕ ∗ )∆ as U × diag(a 1 ,...,a n )× U ′ with (0 < a i ≤ a i+1 ) and we denote b m = p a 2 m r 2 +4a m r− a m r 2a m (m=1,...,n) The steady-state error covariance, Q ∗ + , can be expressed as Q ∗ + =U b 2 1 +b 1 r 2b 1 +r . . . b 2 n +bnr 2bn+r U ′ (2.19) Compared with the steady-state error covariance S ∗ + for continuous signals in (2.7), the steady-state error covariance for spikes Q ∗ + in (2.19) has exactly the same form when re- placing h i with a i and s with r. Hence to compute the optimal learning rate r from (2.19), we can again apply (2.9) while replacing h i with a i and s with r. Note that M ave includes the firing rate λ (t|ϕ ∗ ), which is related to the unknown true parameter ϕ ∗ . Thus we use our knowledge of the minimum and maximum possible firing rates to calculate the extreme 52 values of the learning rate r from (2.9), and select the minimum of the two s’s as the most conservative value to keep the steady-state error covariance under the given bound V bd . 2.2.3 Calibration algorithm for non-periodic state evolution For both discrete and continuous signals, we considered a periodic behavioral state (e.g., intended velocity) in the training data for the derivations to satisfy the mild conditions 69 . However, the derivation of (2.7), (2.8), and (2.19) are based on H ave and M ave for the continuous and discrete signals, respectively, which are simply the average value of functions of the state {v t }. So the core information needed in the calibration algorithm is not the state periodicity, but its expected value, which we can compute empirically for any state evolution. We show using simulations in the Results section that the calibration algorithm works even in the case of random evolution for the states{v t } in the training experiment. 2.2.4 Numerical simulations To validate the calibration algorithm, we run extensive closed-loop numerical simulations. We show that the calibration algorithm allows for fast and precise learning of encoding model parameters, and subsequently for a desired transient and steady-state behavior of the decoders (figure 2.1). While the calibration algorithm can be applied to learn encoding models and decoders for any brain state, as a concrete example, we use a motor BMI to validate the algorithm. InmotorBMIstherelevantbrainstateistheintendedmovement. TheBMIneedstolearn an encoding model that relates the neural activity to the subject’s intended movement. We simulate a closed-loop BMI within a center-out reaching task with 8 targets. In this task, the subject needs to take a cursor on a computer screen to one of 8 peripheral targets, and then return it to the center to initiate another trial 12,55 . To simulate how subjects generate a pattern of neural activity to control the cursor, we use an optimal feedback- control (OFC) model of the BMI that has been validated in prior experiments 37,71,72,94 . We 53 then simulate the spiking activity as a point process using the encoding model in (2.12) and simulate the ECoG/LFP log-powers as a Gaussian process 15 using the encoding model in (2.1). We test the calibration algorithm for adaptive learning of the ECoG/LFP and the spikemodelparameters. Weassesstheabilityofthecalibrationalgorithmtoenablefastand accuratelearningoftheencodingmodels, andtoleadtoadesiredtransientandsteady-state performance of the decoder. To simulate the intended movement, we use the OFC model. We assume that movement evolves according to a linear dynamical model 37,50,72 x t =Ax t− 1 +Bu t +w t (2.20) where x t = [d ′ t ,v ′ t ] ′ is the kinematic state at time t, with d t and v t being the position and velocity vectors in the two-dimensional space, respectively. Hereu t is the control signal that the brain decides on to move the cursor and w t is white Gaussian noise with covariance matrixW. Also,A andB are coefficient matrices that are often fitted to subjects’ manual movements 12,37,54,55,61 . Similar to prior work 37,54,61,72,102 , we write (2.20) as d t v t = I 2 ∆ I 2 0 α I 2 d t− 1 v t− 1 + 0 I 2 u t + 0 I 2 w t (2.21) where ∆ is the time-step and α is selected according to our prior non-human primate data 12,37 . The OFC model assumes that the brain quantifies the task goal within a cost function and decides on its control commands by minimizing this cost. For the center-out movement task, the cost function can be quantified as 37,71,72,102 J = ∞ X t=1 ∥d t− 1 − d ∗ ∥ 2 +w v ∥v t− 1 ∥ 2 +w r ∥u t ∥ 2 (2.22) 54 where d ∗ is the target position, and w v and w r are weights selected to fit the profile of manual movements. For the linear dynamics in (2.20) and the quadratic cost in (2.22), the optimal control command is given by the well-known infinite horizon linear quadratic Gaussian (LQG) solution as 37,50,72,76 u t =− L(x t− 1 − x ∗ ) (2.23) wherex ∗ =[d ∗′ ,0 ′ ] ′ is the target for position and velocity (as the subject needs to reach the target position and stop there). Here L is the gain matrix, which can be found recursively and offline by solving the discrete-time Riccati equation 76 . By substituting (2.23) in (2.20), we can compute the intended kinematics of the subject in response to visual feedback of the current decoded cursor kinematicsx t in our simulations 37 . Details are provided in our prior work 37,72,94 . The subject’s intended movement is in turn encoded in neural activity. We first test the performance of the calibration algorithm for continuous ECoG/LFP recordings and then for discrete spike recordings. Forthecontinuoussignals,wesimulate30LFP/ECoGfeatureswhosebaselinepowersand preferred directions in (2.1) are randomly selected in [1,6] dB and [0,2π ], respectively. The modulationdepth,∥η ∥,ineachchannelisrandomly-selectedin[7,10]andthenoisevariances are randomly-selected in [320,380]. The initial value, ψ 0|0 , and the true value, ψ ∗ , of each channel are selected randomly and independently. The eight targets are around a circle with radius 0.3. Each trial including the forward and the back movement in the center-out- and-back task takes 2 secs. During the training experiment, the subject reaches the targets in the counter-clockwise order repeatedly. To assess whether the calibration algorithm can analytically compute the steady-state error covariance and convergence time for a given learning rate accurately, we simulate 3000 trials under each learning rate considered. 55 For spikes, we simulate 30 neurons. Here since the state v t is the intended velocity, we can also interpret (2.13) as a modified cosine-tuning model by writing it as λ c (v t )=exp(β c +∥α c ∥∥v t ∥cos(θ t − θ c )) (2.24) where θ t is the direction of v t , θ c is the preferred direction of the neuron (or direction of α c = ∥α c ∥[cosθ c ,sinθ c ] ′ ), and finally ∥α c ∥ is the modulation depth. For each neuron, we select the baseline firing rate randomly between [4 ,10] Hz and the maximum firing rate randomlybetween[40,80]Hz. Weselecteachneuron’spreferreddirectionin(2.24)randomly between[0,2π ]. Thetasksetupisequivalenttothecontinuoussignalcase. Wesimulate1000 trials for each learning rate considered. 2.3 Results We first investigate whether the calibration algorithm can analytically approximate two quantities well: the steady-state error covariance and the convergence time of the encoding model parameters. We do so by running multiple closed-loop BMI simulations with differ- ence learning rates. These Monte-Carlo simulations allow us to compute the true value of the two quantities. We then compare these true values with the analytically-computed val- ues from the calibration algorithm. We find that, for both continuous and discrete signals, the calibration algorithm accurately computes its desired quantity (i.e, either the error co- variance or the convergence time) for any type of behavioral state trajectory in the training data (i.e., periodic or not). Thus the calibration algorithm can find the optimal learning rate for a desired trade-off between the parameter convergence time and error covariance. We also show how the inverse function can be used to compute the learning rate for a desired trade-off. Moreover, we examine how the calibration algorithm—and consequently the learned encoding model—affects decoding performance. We show that, by finding the optimal learning rate, the calibration algorithm results in fast and accurate decoding. In 56 particular, compared to the optimal rate, larger learning rates could result in inaccurate steady-state decoding performance and smaller learning rates result in slow convergence of the decoding performance. 2.3.1 The calibration algorithm computes the convergence time and the error covariance accurately with continuous signals We first assess the accuracy of the analytically-computed error covariance and convergence time by the calibration algorithm. As described in detail in Numerical simulations section, we run a closed-loop BMI simulation in which the subject performs a center-out-and-back task to eight targets in counter-clockwise order. We simulate 30 LFP/ECoG features. We define the convergence time as the time when the estimated parameters reach within 5% of the their true values, i.e., ∥ψ t|t − ψ ∗ ∥ ≤ 0.05×∥ ψ 0|0 − ψ ∗ ∥ (so E rest = 0.05; as defined before ψ t|t , ψ ∗ , and ψ 0|0 are the current parameter estimate, the true parameter value, and the initial parameter estimate, respectively.) Figure 2.3A shows the true and the analytically-computed error covariance and convergence time as a function of the learning rate. The analytically-computed values are close to the true values. From figure 2.3A, the average normalized root-mean-square errors (RMSE) between the true and the analytically- computed values for the convergence time and the steady-state error covariance are 3.6% and 1.6%, respectively (where normalization is done by dividing by the range of possible convergence time and covariance values). Figure 2.3A shows that as the learning rate s increases, the error covariance increases and the convergence time decreases. Also, the error covariance is inversely related to the convergence time. These trends demonstrate the fundamental trade-off between steady-state error covariance and convergence time. In the above we considered estimating the encoding model parameters ψ t|t in (2.6). As derived in (2.11), when the noise variance Z in (2.1) is unknown, we can also estimate this variance in real time and simultaneously with the parameters. We thus repeated our closed-loop BMI simulations, this time simultaneously estimating the noise variance Z t|t 57 -3 -2 -1 0 variance 1 convergence time (sec) 1 2 3 4 convergence time (sec) 1 2 3 4 -3 -2 -1 0 1 variance -7 -5 -3 s -7 -5 -3 s -3 -1 1 variance -7 -5 -3 s 2 4 convergence time (sec) Analytically-computed (baseline) Analytically-computed ( η x ) Analytically-computed ( η y ) True (baseline) True (η x ) True (η y ) Sample simulation True value 0 2 4 0 2 4 0 2 4 0 2000 0 200 0 20 sec (×10 3 ) sec 4 5 noise variance (×100) noise variance (×100) 4 5 (A) s = 5×10 -7 (B) s = 5×10 -5 s = 5×10 -3 Figure 2.3: The calibration algorithm accurately computes the steady-state error covariance and convergence time for continuous signals. (A) The analytically-computed and the true error covariance and convergence time of the encoding model parameters (baseline, η x , and η y in (2.1)) for different learning rates s. The top left panel shows the relation between the three quantities. The other three panels are projections of this plot to three planes, showing each of the three pair-wise relationships. All axes are in log scale. True quantities are computed from BMI simulations with periodic center-out-and-backtrainingdatasets. Theanalytically-computedvaluesareobtainedbythe calibrationalgorithmaccordingtoequations(2.7)and(2.8). Theanalytically-computedand truevaluesmatchtightly,showingthatthecalibrationalgorithmcanaccuratelycomputethe learning rate for a desired trade-off between steady-state error and convergence time. (B) Adaptive estimation of the unknown observation noise variance using (2.11) under different learning rates s. The bottom three panels are zoomed-in versions of the top panels to show the transient behavior of the estimated noise variance, which converges to its true value in all cases. to show that it converges to the true value regardless of the learning rate s. Figure 2.3B shows that Z t|t converges to the true value with all tested learning rates, which cover a large range (5× 10 − 7 to 5× 10 − 3 ). Moreover, even when estimating bothψ t|t and the noise variance Z t|t jointly, the analytically-computed error covariance is still close to the true one (normalized RMSE is 4.5%). Overall, the analytically-computed error covariance is robust 58 to the uncertainty in Z t|t because Z t|t converges to the true value at steady state regardless of the learning rate (figure 2.3B). 2.3.2 Use of the inverse function to compute the learning rate Here we show how the inverse functions in Theorem 2 can be used to select the learning rate. In our example, we require the 95% confidence bound of the estimated encoding model parameters (i.e., ± 2 standard deviations of error) to be within 10% of their average value. Thus this constraint provides the desired upper-bound on the steady-state error covariance V bd . In general, V bd can be selected in any manner desired by the user. Once V bd is specified, we use (2.9) and find the optimal value of the learning rate as s 1 = 5.6× 10 − 5 . Hence the calibration algorithm dictates that the learning rate should be smaller than s 1 to satisfy the desired error covariance upper-bound. Let’s now suppose that we want to ensure that the convergence time is within a given range. In our example, we require the estimation error to converge within 7 minutes, where convergenceisdefinedasreachingwithin5%ofthetruevalue( E rest =0.05). Thisconstraint sets the upper-bound on the convergence time to be C bd =7 min=420 sec. The calibration algorithm using (2.10) dictates that the learning rate needs to be larger than 4.75× 10 − 5 . Taken together, for the above constraints for error covariance and convergence time, any learningrate4.75× 10 − 5 0.36; figures 2.8 and 2.9). Moreover, large learningratesresultinpoorandunstablesteady-statedecodingduetoinaccurateestimation of the model parameters. This is evident by observing that for large learning rates, BMI decoding RMSE widely oscillates as a function of time at which adaptation stops for both continuous ECoG/LFP observations and discrete spike observations (figures 2.8B and 2.9B, respectively). This result shows that due to the large steady-state error, steady-state pa- rameter estimates change widely depending on exactly when we stop the adaptation. Thus the decoder does not converge to a stable performance. It is interesting to note that due to feedback-correction in closed-loop BMI, the decoder can tolerate a larger steady-state parameter error than we would typically allow if our only goal is to track the encoding model parameters. This is evident by noting, for example, that using a learning rate of s = 5× 10 − 3 for continuous signals results in a relatively large steady-state parameter error as shown in figure 2.4 (The 95% confidence bound is about ± 30% of the modulation depth). However, for the purpose of BMI decoding, this learning rateresultsinnolossofperformanceatsteadystatecomparedtosmallerlearningrates, and allows for a faster convergence time (figure 2.8). Hence the user-defined upper-bound on the 68 steady-stateerrorcovarianceisdependentontheapplicationandthegoalofadaptation. For closed-loop decoding, a larger error covariance could be tolerated, and as a result, a faster convergencetimecanbeachieved. Incontrast,ifthegoalistoaccuratelytracktheevolution of encoding models over time, for example to study learning and plasticity, a lower steady- stateerrorcovarianceshouldbetargeted. Regardlessofthedesiredupper-boundontheerror covariance,thecalibrationalgorithmcancloselyapproximatethecorrespondinglearningrate that satisfies this upper-bound while allowing for the fastest possible convergence. 2.4 Discussion Developing closed-loop neurotechnologies to treat various neurological disorders requires adaptively learning accurate encoding models that relate the recorded activity—whether in the form of spikes, LFP, or ECoG—to the underlying brain state. Fast and accurate adap- tive learning of encoding models is critically affected by the choice of the learning rate 93 , which introduces a fundamental trade-off between the steady-state error and the conver- gence time of the estimated model parameters. Despite the importance of the learning rate, currently a principled approach for its calibration is lacking. Here, we developed a princi- pled analytical calibration algorithm for optimal selection of the learning rate in adaptive methods. We designed the calibration algorithm for two possible user-specified adaptation objectives, either to keep the parameter estimation error covariance smaller than a desired value while minimizing convergence time, or to keep the parameter convergence time faster thanagivenvaluewhileminimizingerror. Wealsoderivedthecalibrationalgorithmbothfor discrete-valuedspikesmodeledasnonlinearpointprocessesandforcontinuous-valuedneural recordings, such as LFP and ECoG, modeled as linear Gaussian processes. We showed that thecalibrationalgorithmallowsforfastandaccuratelearningofencodingmodelparameters (figures 2.4 and 2.7), and enables fast convergence of decoding performance and accurate steady-state decoding (figures 2.8 and 2.9). We also demonstrated that larger learning rates 69 make the encoding model and the decoding performance inaccurate, and smaller learning rates delay their convergence. The calibration algorithm provides an analytical approach to predict the effect of the learning rate in advance, and thus to select its optimal value prior to real-time adaptation in closed-loop neurotechnologies. Toderivethecalibrationalgorithm,weintroducedaformulationbasedonthefundamen- tal trade-off that the learning rate dictates between the steady-state error and the conver- gence time of the estimated parameters. Calibrating the learning rate analytically requires deriving two functions that describe how the learning rate affects the convergence time and thesteady-stateerrorcovariance, respectively. However, currentlynoexplicitfunctionsexist forthese two relationships forBayesianfilters, such asthe Kalman filteror thepointprocess filter. We showed that the two functions can be analytically derived (equations (2.7), (2.8), and (2.19)) and can accurately predict the effect of the learning rate (figures 2.3 and 2.6). We obtained the calibration algorithm by deriving two inverse functions that solve for the learning rate based on a given upper-bound of the error covariance (equation (2.9)) or the convergence time (equation (2.10)), respectively. To allow for tractable analytical solutions for the learning rate, we performed the deriva- tionsforthecaseinwhichthebehavioralstateinthetrainingexperimentevolvedperiodically over time. This is the case in many applications; for example, in motor BMIs, models are often learned during a training session in which subjects perform a periodic center-out-and- back movement. However, we found that the calibration algorithm only depended on an average value of the behavioral state rather than on its periodic characteristics. We thus showed that the calibration algorithm could accurately predict the effect of the learning rate on parameter error and convergence time for a general behavioral state evolution in the training experiments (figures 2.5 and 2.6). The match between the analytical prediction of the calibration algorithm and the simulation results suggest the generalizability of the calibration algorithm across various behavioral state evolutions. 70 The selected learning rate in the calibration algorithm depends on the user-specified upper-bound on the error covariance or convergence time. The values of these upper-bounds could be chosen by the user based on the goal of adaptation. If the adaptation goal is to accuratelytracktheencodingmodelparameters(e.g.,tostudylearning),thentheacceptable error upper-bound may be selected to be small. In such a case, the calibration algorithm would select a small learning rate. However, we showed that if the goal of calibration is to enableaccuratedecodinginaclosed-loopBMI,thenlargererrorsintheestimatedparameters may be tolerated. This is due to feedback-correction in BMIs, which can compensate for the parameter estimation error (figures 2.8 and 2.9). The calibration algorithm would then select larger learning rates to improve how fast decoding performance converges to high values. However, even in this case, there is a limit to how large the learning rate can be. A learning rate that is too large will result in unstable and inaccurate performance of the decoder (figures 2.8 and 2.9). This result shows the importance of the calibration algorithm regardless of the goal of adaptation. The calibration algorithm may also serve as a tool to help examine the interaction be- tween model adaptation and neural plasticity. In closed-loop neurotechnologies, learning or stimulation-induced plasticity can change neural representations over time and result in neural adaptation. For example, in motor BMIs, the brain can change its encoding of move- ment (e.g., the directional tuning of neurons) to improve neuroprosthetic control 38–40,42,55 . Neural and model adaptation result in a “two-learner system” and can interact 55 . It is important to study whether model adaptation interferes with neural adaptation in these closed-loop systems, and if so whether this interference depends on how fast models are adapted. By accurately adjusting the convergence time and hence the speed of model adap- tation, the calibration algorithm may provide a useful tool in studying such interference in careful experiments. Moreover, if neural plasticity is significantly affected by the speed of 71 model adaptation, the calibration algorithm could help carefully adjust this speed for a de- sired neural adaptation outcome. It is also important to examine this interference problem theoretically 103 . While our main goal was to derive the calibration algorithm for closed-loop neurotech- nologies, this algorithm can also be used in other domains of signal processing. We derived the calibration algorithm to select the learning rate and predict its effect on error and con- vergencetimeinadaptiveBayesianfilters. Priorworkinothersignalprocessingapplications have focused vastly on the non-Bayesian LMS or steepest-decent adaptive filters 93,96 . How- ever,LMSisonlyapplicabletolinearobservationmodels 93 . Moreover,steepest-decentfilters that use non-linear cost functions to specify the goal of adaptation cannot predict the effect of learning rate on error or convergence time and thus only provide heuristics for learning rate selection 93 . Finally, LMS or steepest-decent filters are not Bayesian filters, unlike the KF or the PPF (equations (2.3)–(2.6) and (2.15)–(2.18)). Using a Bayesian filter for pa- rameter adaptation has the advantage that it can extend to nonlinear stochastic observation models (such as the point process model of spikes) and to cases where the dynamics of the unknown parameter, ψ or ϕ , is not a simple random-walk 9,37,70 . Here, we derived a learn- ing rate calibration algorithm for Bayesian filters both with continuous linear observation models (KF) and with discrete nonlinear observations models (PPF). Importantly, we de- rived explicit analytical functions (2.9) and (2.10) to predict the effect of the learning rate on steady-state error and convergence time for a Bayesian filter. This allowed us to com- pute an optimal value for the learning rate analytically to achieve a desired user-specified performance metric. Here our focus was on deriving an analytical calibration algorithm both for nonlinear point process models of spikes and linear Gaussian models of continuous neural recordings. Thus to validate our analytical approach, we used extensive closed-loop Monte-Carlo sim- ulations. These simulations allowed us to examine the generalizability of the calibration 72 algorithm across different neural signal modalities. The closed-loop simulations closely con- formed to our prior non-human primate experiments 12,37 . Prior studies have shown that these closed-loop simulations can mimic the observed experimental effects and thus provide a useful validation testbed for algorithms 37,72,86,94 . Moreover, the calibration algorithm ad- justed the learning rate of adaptive PPF and adaptive KF decoders, which have been shown tobesuccessfulforreal-timeBMItrainingandcontrolusingspikesorLFPinnon-humanpri- mateandhumanexperimentsbothinourworkandotherstudies 12,15,37,38,54,58,60–65 . However, prior experiments, including ours, selected the learning rates empirically in these decoders. Given that the calibration algorithm is run prior to experiments, and based on the success of adaptive PPF and KF in prior animal and human experiments, we expect our calibration algorithm to be seamlessly incorporated in BMIs regardless of the neural signal modality. The calibration algorithm allows the optimal learning rate to be computed prior to running the adaptation experiments to achieve a predictable speed and accuracy in adaptive learn- ing. Implementing the calibration algorithm in animal models of adaptive BMIs using both spikes and LFP is the topic of our future investigation. Finally, the calibration algorithm has the potential to be generalized to Bayesian filters beyond the KF and PPF, e.g., the unscented Kalman filter 98 or an adaptive filter with a binomial distribution as the observation model 59 . The derivations of equations (2.7) and (2.8)inTheorems1and3arebasedontherecursiveequationforestimationerrordynamics, which is derived from the desired Bayesian filter. This implies that for other observation models different from a linear Gaussian model in KF or a nonlinear point process model in PPF, once we write down their corresponding Bayesian adaptive filters 104 , we can derive the calibration algorithms by writing the corresponding recursive error equations. Thus this calibrationalgorithmhasthepotentialtobegeneralizedandappliedtoothertypesofsignals with various stochastic models. This will be a topic of our future investigation. 73 Chapter 3 Geometric Dynamical Modeling 3.1 Introduction In chapters 1 and 2, we develop a multiscale model (1.1)–(1.5) for jointly modeling various types of neural signals (spikes, local field potentials, electrocorticogram, etc.) with hetero- geneous characteristics (discrete/continuous, fast/slow time rate, etc.). We also derive the multiscale filter (MSF) with its corresponding adaptive learning algorithm plus the optimal learning rate computed by our calibration algorithm. We validate that the MSF can benefit brain machine interface (BMI) by many numerical simulations and data analysis. Overall, our MSF is a successful engineering approach in decoding subjects’ intention from brain signals. In this chapter, we switch gears and explore neural dynamics from a more scientific angle. UnlikeMSFwhichmodelsneuralsignalsbygeneralizedlinearmodels,inthischapter, we model neural population activity nonlinearly with geometric guidance. This geometric strategy creates an interpretable, analytical, and nonlinear model of neural population ac- tivity with a tractable dynamical model, but these fabulous properties come with the price of many new challenges because our manifold is with a hole, and the whole dynamical mode is on a non-Euclidean geometry! This is the source of those fabulous properties and difficult challenges. In the following, we explain the context behind this geometric strategy step by step. 74 Thefirstquestionis: whywemodeltheneuralpopulationactivitywithalow-dimensional (low-D, compared to the number of recorded neural signals) nonlinear manifold? This is because that even though hundreds of neural signals can be recorded simultaneously in real time, on the contrary, the literature consistently shows that such rich and complex neural population activity is actually embedded in a low-D subspace 105 . This fact is critical considering the significant number of neurons we recorded, and it motivates researchers to categorize this low-D subspace by various dimensionality reduction methods. The standard methods are based onlinear projection with various costfunctions, like principal component analysis (PCA) 106 , demixed PCA (dPCA) 107 , jPCA 108 , reduced rank regression (RRR) 109 , etc. These methods are mature, and they really capture most of the variance in neural population activity by a relatively low-D subspace compared to the number of recorded neural signals. Recently, people have started realizing that this neural population activity is not just lived in a low-D linear subspace, but a low-D nonlinear manifold 110 . Following this obser- vation, many multidimensional scaling (MDS) typed methods are applied to various kinds of neural population activity 111 . Exampled methods are isometric mapping (Isomap) 112 , lo- callylinearembedding(LLE) 113 ,localtangentspacealignment(LTSA) 114 ,uniformmanifold approximation and projection (UMAP) 115 , etc. Besides various geometric properties (and the corresponding cost functions) that each method focuses, all of them root on the same algorithmic structure and geometric assumption: they are lookup tables and their geometry of the latent state is topologically homotopy equivalent to Euclidean space R n . The algo- rithmic structure of the lookup table limits these methods in extending to new, unobserved data, which is necessary for applications with real-time new observation such as filtering. For the geometric space of the latent state, the question is: what if the geometry underlying neural population activity is not homotopy equivalent toR n ? In the last ten years, a pervasive feature, rotation, has been found in various types of neuralactivity,includingreaching 108 ,smelling 113 ,sleeping 112 ,etc. Thispointsoutaquestion: 75 what’s the mechanism behind this rotating phenomenon all over the place? One common approach is modeling it by nonlinear dynamics in R n , but we have another hypothesis: this rotation is guided by a geometry with holes 116 . Apparently, in math, complete space- exploringtrajectoriesonamanifoldwithholeswillrotatenaturallyeventhoughtheyarejust guidedbysimpledynamics. Toyexamplesaremovingona1Dloop(S 1 )ora2Dsphere(S 2 ). However, this intuitive but insightful idea leads to many questions: (1) How do we know if there is really a hole underlying neural population activity in the high-dimensional (high-D) space? (2) How to learn this manifold and build a static geometric model analytically? (3) How about the dynamics on top of this manifold? (4) Which algebraic properties of this manifold must we take into account while building the model? All these questions are very challenging but critical if we want to explore this direction deeply and rigorously. Currently, to the best of our knowledge, only the 1 st question has been dealt with in literature. The solution is the topological data analysis (TDA) based on the algebraic topology 117,118 . Given recorded neural data in the high-D space, TDA generates multi- ple bar plots, and each bar corresponds to the birth-death of a hole throughout the scanning progress in TDA. People applied TDA to neural data and found rich high-D hole-relevant structures 112,119,120 . But TDA doesn’t say where the hole is or how the hole looks like. This kind of existence result is less than enough for an analytical geometric model. BesidesTDA,otherworksprojectneuraldatato2D/3Dsubspacebylinear/nonlineardi- mensionality reductions and then recognize holes visually 121–124 . However, different high-D manifolds can have the same low-D projection, so we cannot completely rely on projec- tions to identify the geometric shape visually. In general, an analytical multi-dimensional (multi-D)geometricdynamicalmodel(GDM)isessentialforquantitativelyexploringthege- ometricpropertiesunderlyingneuralpopulationactivity. Thisallowsustoanswertheabove questions (2)–(4) in detail with numerical verification. For example, how many geometric dimensionsdoweneedtocapturetheneuraldatavariance? Howdoestheneuralpopulation activity move on the manifold? Which coordinate does the neural deviation (i.e., the neural 76 noise) follow, the extrinsic Euclidean coordinate or the intrinsic geometric coordinate? All these questions must be answered with quantification. Tothebestofourknowledge,priorworksfittingthemanifoldwithholesandtheintrinsic geometric coordinate are by either connecting knots from data linearly 112 or by manifold Gaussian process latent variable model (mGPLVM), which maps the manifold to the neural recording space by GPLVM plus the Lie algebra of the targeted manifold 125 . The former model is limited to 1D, which is not enough to capture the variance in neural population activity; Thelattermodel, mGPLVM,issufferedfromthecurseofdimensionalityduetothe Monte Carlo process in learning the model. Following this, mGPLVM has only be verified in low n-D manifolds in literature (n ≤ 3 in simulations and n = 1 in neural data) 125 . Therefore, we need a new efficient algorithm that learns the multi-D manifold with holes analytically. This is the starting point of what we achieve in this chapter. In this chapter, we present an all-in-one solution for questions (2)–(4) above. We first develop an analytical multi-D geometric model (GM) in section 3.2.1. The GM algorithm includesfittingthemaintopologicalstructure,the1Dhole(i.e.,theloop),andtheappending coordinates around it so the model dimension is not limited to one. Following our GM, we design a 2-stage GDM with a corresponding learning algorithm, the 2-stage (paired) expectation-maximization (EM) algorithm, in section 3.2.3. The modeling and learning are challenging because the quotient algebra and the geometric property of the GM must be treated specifically. These algebraic and geometric properties also motivate us developing a special filter, calibrated particle-Kalman filter (C-PFKF), for GDM in section 3.2.4. Finally, following GM and GDM, we develop a manifold-specific dimensionality reduction algorithm, thelinearprojectedregression(LPR),insection3.2.5. ItsroletoGMisthesameasPCAto R n . WeapplytheGMandGDMonneuralpopulationactivityfromtwonon-humanprimates (NHPs) while they did the reach-and-grasp task. Our results show that the GM describes theneuralpopulationactivitymoreefficientlythanlinearmodel(LM)amongmulti-Dspace. This also holds between GDM and linear dynamical model (LDM). Surprisingly, GDM only 77 works on data when not only the modeled geometric manifold but also the neural deviation (i.e., the neural noise) follow the intrinsic geometric coordinate. These results suggest that the intrinsic geometric coordinate provides a more natural space to describe the dynamics of neural population activity than the extrinsic Euclidean coordinate, and our analytical GM and GDM make verifying this geometric idea quantitatively come true. 3.2 Methods GDMconsistsoftwostatedynamicmodels,onefortheloopcoordinate(loop-C)andanother fortheappendingcoordinate(app-C),withtheircorrespondingmanifoldobservationmodels representing the geometric structure in data. In the following, we will first show the steps for identifying and fitting the manifold including loop-C and app-C. Second, we develop an unsupervisedEMalgorithmwithparameterconstraintstolearnthefullGDMthatdescribes the dynamics both on the nonlinear loop geometry and on the appending dimension space. Third, we develop a new filter, two-stage calibrated particle-Kalman filter (C-PFKF), with specifically designed steps considering the manifold geometry. Finally, we develop a geomet- ric dimensionality reduction algorithm, the linear projected regression (LPR), for both GM and GDM. 3.2.1 Learning the manifold underlying the neural activity Oneofthecoreideasinourgeometricapproachisthatneuralactivityevolvesoveranonlinear manifold which is not homotopy equivalent to Euclidean space R n . Instead, this manifold is homotopy equivalent to a nonlinear manifold with holes. Here this manifold is homotopy equivalent to a loop (i.e., a 1D hole). We express the fact that neural activity evolves on this nonlinear loopy manifold as y t =g(d l t ,d a t )+ϵ t (3.1) 78 wherey t istheneuralactivity,andg isthemodelofthemanifoldwithvariablesd l t represent- ing the loop coordinate (the main geometric structure) and d a t representing the appending coordinate (the auxiliary geometric structure). ϵ t models the noise. To learn g from y t , we should (1) denoise y t to make the underlying manifold emerge clearly, (2) investigate whether the underlying manifold is homotopy equivalent to a loop, (3) learn the main loop structure in g, and (4) add app-C along the loop continuously to complete g. We develop these steps in detail below. 3.2.1.1 Denoising the neural activity with trial averaging When the noiseϵ t is large, manifoldg may be covered by the noise cloud. To more robustly identify the manifold from y t , we denoise y t using trial average following the steps below and call the denoised activity ˜ y t . Note that when noiseϵ t is small, this denoising process is not necessary, and ˜ y t =y t . Assuming that there are totally N trials, we categorize them into M groups according to the behavioral condition in the trial and denote the number of trials in group m by N m ( P m N m =N). In each group, we select k trialsy (1) 1:n 1 ,y (2) 1:n 2 ,...,y (k) 1:n k randomly without replacement. Every trial y (i) has a different number of time-steps n i . To average the trials, we first align their beginning and ending points by shifting the time indexes from 1 : 1 : n i to 0: 1 n i − 1 :1, so all trials have the same time index at the beginning and the ending point. Next, we connect trial i samples following the time order (i.e., a linear interpolation among y (i) 1:n i ), so trial i becomes a continuous function in the domain [0,1]. Finally, we average these k trials at the chosen sampling points (we use 0: 1 999 :1, 1000 points in total). Group m has about ⌊ Nm k ⌋ trial averages, and we collect all trial averages across M groups as the clean neural activity ˜ y t . In all subsequent steps in learning the manifold g, the order of ˜ y t in time is not important. That is why we can collect trial averages across groups without worrying about their order, and the subscript t of ˜ y t is just the index of the samples, not time anymore. 79 3.2.1.2 Discovering the type of the underlying manifold using sequential max- min and TDA Afterdenoising,themanifoldunderlying ˜ y t canbemorerobustlyidentified. Thenextstepis determiningthetypeofmanifoldandinparticularifthismanifoldishomotopyequivalentto aloop. Thiscanbedonebyapplyingtopologicaldataanalysis(TDA),analgorithmdetecting all m-D holes (m∈N) from samples inR n using methods in algebraic topology 117,118 . Due tothecomputationalcomplexityofTDA,wefirstreducethenumberofsamplesbyselecting a few landmark points ψ TDA 1:M (we set M = 500 in all analyses) from ˜ y t through sequential max-min algorithm 126 ; selecting these landmarks allows us to describe the outline of the manifold. Then based on the Betti number computed by TDA over the landmarks, we know whether the manifold is homotopy equivalent to a loop or not. If so, the next step is fitting this loop analytically. 3.2.1.3 Learning the main loop structure using two K-means clustering plus a cubic spline A common method for fitting a curve smoothly is cubic spline 127 . To do so, we need to generatemlandmarksψ loop 1:m forthesplinefrom ˜ y t . Theselandmarksmustberepresentative. They respect the manifold shape in the embedding space R n and the distance between ˜ y t samples along the loop, not their Euclidean distance inR n . Therefore, we develop a method leveraging two K-means clusterings. From the circular coordinate 128 , we can assign a loop coordinate to TDA landmarks ψ TDA 1:M along the TDA-found loop, so every ψ TDA i has two coordinate values: a loop coordinate (scalar) and a embedding space coordinate (vector) in R n . We apply the 1st K-means clustering 129 to the loop coordinate and get m clusters of ψ TDA 1:M . Then we average the embedding space coordinate (not the loop coordinate) of ψ TDA i in each cluster. Now we have m average points in R n . They are the seeds of the 2 nd K-means clustering which is applied to ˜ y t . After that, the average of ˜ y i in each cluster becomes our cubic spline landmark ψ loop 1:m . The connecting order of ψ loop 1:m is determined by 80 solving the travelling salesman problem (TSP) 130 . Connect ψ loop 1:m through cubic spline, and we have an analytical model of the loop, which is the main structure or geometric feature of the manifold g. This loop is described by g(s,0) such that s∈R with period L, which is the length of the loop. 3.2.1.4 Adding app-C on the loop to complete manifold g by a polynomial algorithm At this point, we already know that manifold g is homotopy equivalent to a loop, and we have fitted this main structure analytically. However, g is not homeomorphic to a loop in general. This means that g is not 1D locally. For example, g can be a tube, which is 2D locally, or even a higher dimensional manifold with a main loop. In practice, the neural activity cannot be captured by a 1D manifold completely, so we must increase the dimension ofg while keeping it homotopy equivalent to a loop at the same time. This is the motivation of developing the appending coordinates (app-C) with corresponding appending vectors (app-vectors) in the embedding space. Since the 1D loop is embedded inR n , we add n− 1 app-vectors, φ a 1 (s),...,φ a n− 1 (s), at each pointg(s,0) along the loop. Then we have a n-D coordinate system, 1 tangent vector plus n− 1 app-vectors, at each point s∈R. However, the app-vectors must satisfy following conditions: 1. Continuity and uniqueness: φ a i (s) is continuous and periodic (mod L). 2. Orthogonality: the tangent vector and φ a 1:n− 1 are mutually orthogonal. 3. Computational efficiency: the expression of each φ a i is simple. Following these criteria, we cannot use Frenet–Serret formulas 131 , because solving a n-D ordinarydifferentialequation(ODE)withnon-stationarycoefficientsisimpractical. Another common method 132 which rotates the coordinate system along the curve while minimizing the rotating angle in computer-aided design (CAD) is also not suitable here. This is because 81 in the latter method, the beginning and the ending coordinate may not be consistent (i.e., φ a i (0)̸=φ a i (L)). Therefore, we need to design a new method. We name this algorithm the polynomial coordinate generator (PCG). It generates n− 1 app-vectors given the tangent vector at s and satisfies three criteria mentioned above. The first step is reorganizing the embedding space coordinate for R n . To do this, we sample s ∈ [0,L] densely and uniformly, so these samples cover the whole loop. Then we utilize principle component analysis (PCA) to find the coordinate transformation matrix Q where QQ ′ =I n . We rename the loop g(s,0) after Q-transformation as c(s) = [c 1 (s),...,c n (s)] ′ for simplicity. Now the 1 st coordinate c 1 (s) explains most of the variance of the loop, and then the 2 nd coordinate, and so on. The 2 nd step of PCG is generating n− 1 app-vectors from the tangent vector of c(s), ˙ c(s). Defining e i = [0,...,0,1,0,...,0] (only 1 at i th position) as the i th basis vector of the Q-transformed embedding space, we apply Gram–Schmidt orthonormalization to the basis (˙ c(s),e 2 ,e 3 ,...,e n ) in an efficient manner to get the app-vectors. To make it numerically stable and computationally efficient, we use the explicit formula as follows to get the app- vectors. For simplicity, we write ˙ c= ˙ c(s) here. ˙ c= ˙ c 1 ˙ c 2 ˙ c 3 . . . . . . . . . ˙ c n ⇒ ˙ c 2 ˙ c 3 ˙ c 1 ··· ˙ c m ˙ c 1 ··· ˙ c n ˙ c 1 − ˙ c 1 ˙ c 3 ˙ c 2 . . . . . . 0 − P 2 j=1 ˙ c 2 j ˙ c m ˙ c m− 1 . . . . . . 0 − P m− 1 j=1 ˙ c 2 j . . . . . . . . . 0 . . . . . . . . . . . . ˙ c n ˙ c n− 1 0 0 0 − P n− 1 j=1 ˙ c 2 j (3.2) The app-vector φ a i (s) is the i th column vector of the matrix in equation (3.2) after normal- ization. Since all terms ofφ a i (s) are polynomials of tangent vector ˙ c(s) and ˙ c(s)= ˙ c(s+L) for ∀s ∈ R, φ a i (s) is continuous, periodic, and computationally efficient. The fact that 82 (˙ c(s),φ a 1 (s),...,φ a n− 1 (s)) are mutually orthogonal can be checked simply by direct compu- tation. Finally,wemusttransferthecoordinatebacktotheembeddingspaceR n byQ − 1 ,and we define the appending coordinate matrix Φ a (s)∈R n× (n− 1) and the geometric coordinate matrix Φ g (s)∈R n× n as Φ a (s)= φ a 1 (s) ∥φ a 1 (s)∥ , ..., φ a n− 1 (s) ∥φ a n− 1 (s)∥ n× (n− 1) (3.3) Φ g (s)= ˙ c(s) ∥˙ c(s)∥ , φ a 1 (s) ∥φ a 1 (s)∥ , ..., φ a n− 1 (s) ∥φ a n− 1 (s)∥ n× n = ˙ c(s) ∥˙ c(s)∥ Φ a (s) n× n (3.4) so Φ a (s) determines the appending coordinate system at loop coordinate s by collecting all normalized app-vectors φ a 1:n− 1 (s). Similarly, Φ g (s) is the geometric coordinate system at s by including the tangent vector ˙ c(s) and the appending vectors, so the dimension of this basis is n, the same as the embedding Euclidean spaceR n . Therefore, we can describe every point (e.g., y t ) in this space by either the Euclidean or the geometric coordinate. This is important in dynamical modeling, learning, and filtering, as we will discuss in section 3.2.3. Above all, PCG can generate app-vectors satisfying the above three criteria along the loop g(s,0). Following the appending coordinate provided by Φ a (s) at each point s, we expand manifold g from a 1D loop to a n-D manifold homotopy equivalent to a loop. The complete manifold modelg(d l t ,d a t ) in equation (3.1) is now completed as below. y t =g(d l t ,d a t )+ϵ t =ϕ (d l t )+Φ a (d l t )× d a t +ϵ t =ϕ (d l t )+Φ a t d a t +ϵ t (3.5) where ϕ (d l t ) = g(d l t ,0) is the model of the main topological structure, the loop, fitted in section 3.2.1.3 and Φ a t d a t takes the rest appending dimension part. We define Φ a t = Φ a (d l t ) for simplifying the notation, meaning the appending coordinate system at time t. In the following sections, equation (3.5) is the basic form we will refer to as the geometric model. 83 3.2.2 Regressing neural activity onto the manifold: the Hausdorff distance From equation (3.1), we can regress each neural activity data point y t to the manifold to get its manifold coordinate composed of loop-C (d l t ) and app-C (d a t ) through minimizing the Hausdorffdistance. Sincetheloopisthemaingeometryfeatureofthemanifold,wecompute d l t first and then d a t . The explicit math equations for this computation are as follow. d l t (y t )=argmin s∈[0,L) ∥y t − g(s,0)∥ (3.6) =argmin s∈[0,L) ∥y t − ϕ (s)∥ (3.7) d a t (y t )=argmin x∈R n− 1 ∥y t − g(d l t (y t ),x)∥ (3.8) =argmin x∈R n− 1 ∥y t − ϕ d l t (y t ) − Φ a d l t (y t ) × x)∥ (3.9) where equations (3.7) and (3.9) are from the exact geometric model (3.5). However, this regression is pointwise. It doesn’t consider or model the evolution of y t across time, so the regressed d l t (y t ) and d a t (y t ) may be noisier than what they actually are. This can happen, for example, when noiseϵ t in equation (3.1) is large. To model the time evolution, we need to model the dynamics of y t on manifold g. We develop a paired EM algorithm with two dynamics for d l t andd a t , respectively. 84 3.2.3 Paired EM algorithm: identify the GDM model parameters Intheregression,wefirstcompute d l t andthend a t sincetheloopisthemaingeometryfeature. Wefollowthisprincipleinbuildingthedynamicmodel. First, wemodelthedynamicsystem of loop-C with the corresponding observation model as d l t v l t = 1 ∆ 0 α × d l t− 1 v l t− 1 +w l t =A l x l t− 1 +w l t (3.10) y t =g(d l t ,0)+G l t r l t =ϕ (d l t )+G l t r l t (3.11) where superscript l marks the parameter belonging to the loop-C dynamic system. We denote x l t = [d l t ,v l t ] ′ as the loop latent state composed of position d l t and velocity v l t along the loop. Since d l t is periodic with period L (the length of the loop), the first term of the dynamic matrixA l , A l 11 , must satisfy condition that A l 11 d l t− 1 is periodic with period L. This implies that A l 11 = 1. Therefore, we include loop velocity v l t in the loop latent state and modelthedynamics(i.e.,α )ontopofit. w l t andr l t arewhiteGaussiannoisewithcovariance matricesW l andR l , respectively. The coordinate systemG l t of noiser l t at time t is a little complex. In LDM,G l t =I n all the time. This means that the observation noise follows the coordinateoftheembeddingEuclideanspaceR n . InGDM,however,therearetwocoordinate systems: theextrinsicEuclideancoordinateandtheintrinsicgeometriccoordinate. Thenoise r l t can follow either coordinates, the Euclidean covariance (EuCOV) and geometry-aligned covariance(AlCOV),andtheybecometwodifferentobservationmodels. Whenit’sEuCOV, G l t = I n , which is the same as LDM; When it’s AlCOV, G l t = Φ g t following equations (3.4) and (3.5). To be general, we use notation G l t throughout the modeling, learning, and filtering algorithms, so all equations can be applied to both EuCOV and AlCOV cases. An important note is that mathematically, neither EuCOV nor AlCOV is included in another 85 one, so we can learn the dynamical mechanism underlying the neural population activity (y t ) by comparing the performance of EuCOV and AlCOV in data analysis. We will see this in the section Results. We write the 2 nd dynamic system with the corresponding observation model for app-C as d a t =A a d a t− 1 +w a t (3.12) y t =g(d l t ,d a t )+G a t r a t =ϕ (d l t )+Φ a t d a t +G a t r a t (3.13) where x a t = d a t is the appending latent state composed of the appending position. This is because app-C is linear, not periodic, so we can build the appending dynamic on top of the position directly, unlike the loop-C in equation (3.10). The superscript a indicates the parameters belonging to the appending dynamic system. A a is a dynamic matrix. w a t and r a t are white Gaussian noise with covariance matricesW a andR a , respectively. G a t is equal to I n or Φ g t (here the superscript is still g, not a!) when the observation model is EuCOV or AlCOV, respectively. Its role is the same asG l t in (3.11). Note that in (3.13), d l t is input. It’s not a random variable here. Combining (3.10)–(3.13), we have the GDM. The next step is learning all parameters in the 1 st and the 2 nd dynamic systems in order, so we develop a paired EM algorithm to do so. Due to the difference between dynamic matrices A l and A a , the EM algorithms for the loop and the appending dynamic systems are separately developed as shown in the following sections. For numerical stability, y t can be normalized using z-scoring before performing the paired EM, and thus the manifold g is normalized because of the same z-scoring procedure. 86 3.2.3.1 Learningthedynamicsystemonthemaingeometrystructure,i.e.,loop: constrained EM algorithm We first learn all parameters θ l ={∆ ,α, W l ,R l ,µ l ,Λ l } in the loop dynamic system (3.10) and (3.11) by developing a constrained EM algorithm. Note that G l t is not a parameter to be learned but I n in EuCOV or Φ g t which depends on d l t in AlCOV, respectively. µ l and Λ l are the mean and the covariance of the initial loop latent state x l 0 . Remember that A l is not completely a tunable parameter, but only ∆ and α are. Therefore, the EM here is constrained. We develop the expectation (E-step) and the maximization (M-step) of this special EM step by step below. Assuming there are T observations, the log-likelihood of x l 0:T andy 1:T is ζ =logp(x l 0:T ,y 1:T |θ l ) =logp(x l 0:T |θ l )+logp(y 1:T |x l 0:T ,θ l ) =logp(x l 0 |θ l )+ T X i=1 logp(x l i |x l i− 1 ,θ l )+ T X i=1 logp(y i |x l i ,θ l ) = h log|Λ l |+∥x l 0 − µ l ∥ 2 (Λ l ) − 1 +T log|W l |+ T X i=1 ∥x l i − A l x l i− 1 ∥ 2 (W l ) − 1 +T log|R l |+ T X i=1 ∥y i − ϕ (d l i )∥ 2 G l i R l (G l i ) ′ − 1 i × (− 1 2 )+constants (3.14) where operator ∥x∥ 2 H = xHx ′ is the square norm of x with weighted matrix H. Note that log|G l i R l (G l i ) ′ | = log|R l | because G l i is always an orthonormal matrix, so |G l i | = 1. Constants are terms irrelevant to θ l in ζ . For conciseness, we will remove them from now on as they don’t influence the subsequent steps. We now demonstrate one iteration of the constrained EM, which can be performed mul- tiple times to converge on the final parameters. Denoting θ l k = {∆ k ,α k ,W l k ,R l k ,µ l k ,Λ l k } as the parameter set learned from the k th iteration of constrained EM, we compute ζ k+1 = 87 E[ζ |y 1:T ,θ l k ] using a calibrated particle filter (C-PF, we’ll derive it in section 3.2.4) with Rauch-Tung-Striebel (RTS) smoother 133 . x l t|T and Λ l t|T are the mean and the covariance of x l t conditioned on y 1:T from the smoother, respectively. Furthermore, we define the condi- tioned cross-covariance Λ l t− 1,t|T = Cov(x l t− 1 ,x l t |y 1:T ). G l t|T is I n under EuCOV or Φ g (d l t|T ) under AlCOV, respectively. Following our notations, we express ζ k+1 as ζ k+1 = tr h (Λ l ) − 1 P l 0|T − x l 0|T (µ l ) ′ − µ l (x l 0|T ) ′ +µ l (µ l ) ′ i + T X i=1 tr h (W l ) − 1 P l i|T − A l P l i− 1,i|T − (A l P l i− 1,i|T ) ′ +A l P l i− 1|T (A l ) ′ i + T X i=1 tr h G l i|T R l (G l i|T ) ′ − 1 y i y ′ i − ϕ (d l i )| T × y ′ i − y i × ϕ ′ (d l i )| T +ϕϕ ′ (d l i )| T i +log|Λ l |+T log|W l |+T log|R l | × (− 1 2 ) (3.15) where P l i|T = Λ l i|T +x l i|T (x l i|T ) ′ and P l i− 1,i|T = Λ l i− 1,i|T +x l i− 1|T (x l i|T ) ′ are moments of the state conditioned on all neural activity observations within the smoother. This gives us the distribution of p(x l i |y 1:T ), which is a Gaussian with the above smoothed means and covariances, i.e. x l i|T and Λ l i|T . Also, in the above and for conciseness, we have abbreviated E[ϕ (d l i )|y 1:T ]andE[ϕ (d l i )ϕ ′ (d l i )|y 1:T ]toϕ (d l i )| T andϕϕ ′ (d l i )| T ,respectively. Wenowcompute ϕ (d l i )| T and ϕϕ ′ (d l i )| T by sampling d l i from distribution p(x l i |y 1:T ) using a Monte Carlo method. This concludes the E-step of the iteration. Next, we maximize ζ k+1 by finding the optimal value of the variables in θ l . Sinceµ l ,Λ l , andR l arewithoutconstraints, theyhaveanalyticaloptimalsolutions 134 , whichwenowlist. For the initial condition, we get µ l k+1 =x l 0|T and Λ l k+1 =Λ l 0|T (3.16) 88 and for the observation model, we get R l k+1 = 1 T × T X i=1 (G l i|T ) ′ × y i y ′ i − ϕ (d l i )| T × y ′ i − y i × ϕ ′ (d l i )| T +ϕϕ ′ (d l i )| T × G l i|T (3.17) For ∆ and α , which are free variables in constrained A l , we find their optimal solutions by looking at ∂ζ k+1 ∂A l first, which is ∂ζ k+1 ∂A l = T X i=1 (W l ) − 1 h (P l i− 1,i|T ) ′ − A l P l i− 1|T i =(W l ) − 1 × M 11 M 12 M 21 M 22 − A l 11 A l 12 A l 21 A l 22 N 11 N 12 N 21 N 22 (3.18) whereM = P T i=1 (P l i− 1,i|T ) ′ ,N = P T i=1 P l i− 1|T , andM ij is the ij-block of matrixM. Note that for the A l matrix, only A l 12 and A l 22 are unknown and thus we only need the last column of ∂ζ k+1 ∂A l , which is ∂ζ k+1 ∂A l 12 and 22 =(W l ) − 1 × M 12 M 22 − A l 11 A l 12 A l 21 A l 22 N 12 N 22 =0 (3.19) Solving (3.19), the optimal solutions of ∆ and α are ∆ k+1 =optimalA l 12 =(M 12 − A l 11 N 12 )N − 1 22 (3.20) α k+1 =optimalA l 22 =(M 22 − A l 21 N 12 )N − 1 22 (3.21) Equations (3.20) and (3.21) are general for all matricesA l with fixed A l 11 andA l 21 . For our loop dynamic model (3.10), we set A l 11 = 1 and A l 21 = 0. Denoting A l k+1 = h 1 ∆ k+1 0 α k+1 i , then the optimal solution ofW l is W l k+1 = 1 T × T X i=1 P l i|T − A l k+1 P l i− 1,i|T − (A l k+1 P l i− 1,i|T ) ′ +A l k+1 P l i− 1|T (A l k+1 ) ′ (3.22) 89 This completes the M-step of the EM iteration and thus concludes a full EM iteration by providing the new parameter set θ l k+1 ={∆ k+1 ,α k+1 ,W l k+1 ,R l k+1 ,µ l k+1 ,Λ l k+1 }. We can now proceed to the next iteration and repeat the process, i.e. perform the E-step and then the M-step. We run this constrained EM iteratively until the series{θ l k } converges. 3.2.3.2 Learning app-C dynamic system: standard EM algorithm After learning all parameters of θ l in the loop dynamic system (3.10) and (3.11), we have a smoothestimationofthelatentstateontheloopx l 1:T|T =[d l 1:T|T ,v l 1:T|T ] ′ . Nowweimplement the2 nd EMtolearnallparametersthatalsoconsistoftheonesfortheappendingdimensions, i.e. θ a = {A a ,W a ,R a ,µ a ,Λ a }. There is no constraint for the parameters of appending dimensions, so we can use the standard EM algorithm 135 . Briefly, we perform the E-step using the Kalman filter plus RTS-smoother, and the optimal parameter solutions in the M-step can then be found as listed below. For the app-C initial condition, we get µ a k+1 =x a 0|T and Λ a k+1 =Λ a 0|T (3.23) For the app-C observation model, we get R a k+1 = 1 T × T X i=1 (G a i|T ) ′ ρ i|T ρ ′ i|T − Φ a i|T x a i|T ρ ′ i|T − ρ i|T (Φ a i|T x a i|T ) ′ +Φ a i|T P a i|T (Φ a i|T ) ′ G a i|T (3.24) whereρ i|T =y i − ϕ (d l i|T ). For the app-C dynamic model, we get A a k+1 = T X i=1 (P a i− 1,i|T ) ′ ! × T X i=1 P a i− 1|T ! − 1 (3.25) W a k+1 = 1 T × T X i=1 P a i|T − A a k+1 P a i− 1,i|T − (A a k+1 P a i− 1,i|T ) ′ +A a k+1 P a i− 1|T (A a k+1 ) ′ (3.26) Again, we run this EM with inputs iteratively till the series{θ a k } converges. 90 3.2.4 Real-timelatentstateestimationusingacalibratedparticle- Kalman filter Up to this step, we have now fitted all parameters in GDM for the loop-C and the app-C from neural activity y 1:T . To estimate the latent states x l t and x a t in real time, we develop anarchitecturetermedcalibratedparticle-Kalmanfilter(C-PFKF),whichbasicallyrunsPF with a new calibrated step for loop-C and then KF for app-C every time step. The idea is that at time t, we first compute x l t|t by applying a calibrated PF (C-PF) on the loop dynamic system (3.10) and (3.11). After that, we use x l t|t as the input of the appending dynamic system (3.12) and (3.13). Then we decode x a t|t with a KF. Here, the only new terminology is the calibrated PF. We develop it for accommodating the quotient algebra on the loop. The detail is explained below. Intheloop-Cdynamicmodel(3.10),looppositiond l t isperiodicwithperiodL(thelength of the loop). Mathematically, d l t is inR, the covering space 136 of the loop S 1 . The algebra on this quotient space is a quotient algebra, which means all values are under the equivalent relationof”modL”, soβ =β +mLfor∀β ∈Rand∀m∈Z. Thisquotient-basedequivalent relation causes problems when we compute the mean x l t|t and the covariance matrix Λ l t|t from the weights and particles of standard PF by weighted average. For example, 0 and L are values of two particles representing the same point in the covering space (and the loop), but their average L 2 doesn’t represent the same point! To fix this, we develop the C-PF, so all particles and weights are calibrated in each time step t to guarantee that they are well-behaved with standard algebraic operators. The steps of standard PF 104 is the prediction, the update, and the resampling. Our C-PF adds a new step, the calibration, between the update and resampling. Assuming the particles and weights from the update step arex i(u) t|t and w i(u) t|t (∀i=1∼ N) respectively, the calibration step is as follow. 1. Compute the update meanx (u) t|t = P N i=1 w i(u) t|t x i(u) t|t . 91 2. Select a bounded region less than L, so there is no quotient problem in this region. For example, we select bd(x (u) t|t , L 4 )=[x (u) t|t − L 4 , x (u) t|t + L 4 ]. 3. Only keep the particles within bd(x (u) t|t , L 4 ) and the corresponding weights. Reorder them asx j(u) t|t and w j(u) t|t (∀j =1∼ M) where M ≤ N. 4. Define x j(c) t|t = x j(u) t|t and w j(c) t|t = w j(u) t|t P M j=1 w j(u) t|t for ∀j = 1 ∼ M. This is the pool of calibrated particles and weights that the resampling step should use. By this calibration step, all particles and weights from each step t are within bd(x (u) t|t , L 4 ), so there is no problem of the average operation over this quotient algebra space. 3.2.5 The geometric dimensionality reduction algorithm: linear projected regression Both the geometric static (GM) and dynamical (GDM) models represent each observation y t by its corresponding latent state following the intrinsic geometric coordinate. In GM, the latent states d l t (y t ) andd a t (y t ) come from the regression (section 3.2.2). In GDM, the latent statescancomefromeitherthepredicting(d l t|t− 1 andd a t|t− 1 ),thefiltering( d l t|t andd a t|t ),orthe smoothing(d l t|T andd a t|T ). Forgeneralizability,wedenotez l t andz a t forthelatentstatesfrom any geometric cases. An issue of these latent states is that they are fully dimensional. That is, dim(z l t )+dim(z a t ) = dim(y t ) = n (or equivalently, dim(z a t ) = n− 1). but in general, we want a low-dimensional model given the observation and the geometric coordinate system. This is similar to what PCA does in the Euclidean space R n . Therefore, we develop the linear projected regression (LPR), a geometric dimensionality reduction algorithm for GM and GDM. To derive LPR, we must define the cost function first. In the Euclidean space, PCA minimizes the residual root-mean-square error (RMSE) given the number of dimensions. 92 We borrow this idea and define the LPR cost function as ”given the number of appending dimensions, minimize the residual RMSE.” This cost function in math is L LPR = T X i=1 ∥y i − ϕ (z l i )− Φ a i Q× Q ′ z a i ∥ 2 =tr ( T X i=1 h ϱ i − Φ a i QQ ′ z a i i × h ϱ ′ i − (z a i ) ′ QQ ′ (Φ a i ) ′ i ) =tr h T X i=1 ϱ i ϱ ′ i i +tr Q ′ × T X i=1 h z a i (z a i ) ′ − z a i ϱ ′ i Φ a i − (Φ a i ) ′ ϱ i (z a i ) ′ i × Q (3.27) whereϱ i =y i − ϕ (z l i )istheobservationwithouttheloop-relatedcomponent. Φ a i isthelocal appending coordinate in equations (3.3) and (3.5) at z a i . Q∈R n× m represents m principal appending components (m ≤ n) for dimensionality reduction. Therefore, its columns are mutually orthonormal (i.e., Q ′ Q = I m ). From line 2 to line 3 in (3.27), we use equalities tr(AB) = tr(BA) and (Φ a i ) ′ Φ a i =I n− 1 . To minimizeL LPR , the columns of Q must be the eigenvectors of the first m-smallest eigenvalues ofJ LPR which is J LPR = T X i=1 h z a i (z a i ) ′ − z a i ϱ ′ i Φ a i − (Φ a i ) ′ ϱ i (z a i ) ′ i (3.28) In summary, given the geometric model, the observation y 1:T , and the geometric latent states z l 1:T andz a 1:T , we can compute its m-dimensional geometric representation by finding them-smallesteigenvaluesandeigenvectorsofJ LPR in(3.28). Them-dimensionalappending latent state is Q ′ z a t following the principal appending vectors Φ a t Q at each loop coordinate z l t . 3.2.6 NHP experimental setup and validation We test the GM and the GDM for modeling two NHPs’ (Monkey J and Monkey C) neural population activity, the firing rate of spiking activity recorded from motor cortical areas during a naturalistic 3D reach-and-grasp task towards targets placed at various locations in 93 the 3D environment 44 . As our performance metric, we use the normalized mean-squared- error (NMSE) for GM between the truey t and the reconstructed neural population activity b y t from the geometric regression d l t (y t ) and d a t (y t ) (section 3.2.2) plus LPR (section 3.2.5). ForGDM,themetricisthenormalizedroot-mean-squared-error(NRMSE)betweenthetrue y t and the one-step-ahead predictiony t|t− 1 from the predicting latent states d l t|t− 1 andd a t|t− 1 by C-PFKF (section 3.2.4) plus LPR. We show the metric formula below for unambiguity. NMSE: P T t=1 ∥y t − b y t ∥ 2 P T t=1 ∥y t ∥ 2 and NRMSE: q P T t=1 ∥y t − y t|t− 1 ∥ 2 q P T t=1 ∥y t ∥ 2 ForMonkeyJ,anelectrodearraywith137electrodes(large-scalemicrodrivesystem,Gray Matter Research, USA) covering parts of dorsal premotor cortex (PMd), ventral premotor cortex (PMv), and primary motor cortex (M1) was used to record spike-field activities. For MonkeyC,twomicro-electrodearrayswith32electrodeseach(SC32,GrayMatterResearch, USA) were used to record spike-field activities. The covering cortices of Monkey C are PMd and PMv. For both Monkeys, raw neural signals were recorded with a 30 kHz sampling rate in real time as the subject performed the reaching task. Reflective markers were placed on subject’s right arm and monitored using infrared and near-infrared cameras at a sampling rate of 100 frames/sec. We extracted the spikes by passing the raw neural signals through a band-pass filter (0.3-6.6 kHz), finding the threshold crossing events using a threshold of 3.5 standard deviations below the mean filtered signal, and binning them in 10 ms bins so the spikes are binary (0/1) in each bin. The firing rate is counted by a 300 ms non-causal ( ± 150 ms) rectangular moving average window every 50 ms. This firing rate is y t in our GM and GDM throughout the whole data analysis. 94 LM GM LM GM 0 1 NMSE Monkey J Monkey C (A) (B) 0 1 NMSE LM GM 1 2 8 loop app-D1 app-D7 1 2 8 loop app-D1 app-D7 Figure3.1: GMdescribestheneuralpopulationactivitymoreefficientlythanLM. (A) The NMSE is computed among reconstructed b y t under different dimensions. The di- mensionality reduction is done by LPR and PCA for GM and LM, respectively. The shaded regions show the 95% confidence bound of the mean. In LM, the coordinate is defined in the Euclidean space, so the coordinates after PCA are still linear (PCA 1 to PCA 8). On the contrary, the first coordinate of LPR is always the loop, the main topological structure, and then the rest dimensions are in the appending vector space. (B) is the same as (A) but on Monkey C. 3.3 Results To validate that our GM and GDM capture the intrinsic geometric structure underlying neural population activity, we compare them with their corresponding linear versions, LM and LDM. 3.3.1 Intrinsic geometric coordinate is a natural coordinate for neural population activity From figure 3.1, we see that GM has lower NMSE comparing to LM across all dimensions. The difference between GM and LM is especially clear in the low-D region. This shows that the main nonlinear topological structure, the loop, is critical in modeling and describing the neural population activity efficiently. The fact that all follow-up LPR dimensions in the GM appending vector space still beat PCA dimensions in LM further confirms the importance of the loop, since all appending vectors are appended to this topological structure. It also 95 1 2 8 loop app-D1 app-D7 LDM GDM LDM GDM, AlCOV 0 1 NRMSE Monkey J Monkey C (A) (B) 0 1 NRMSE LDM GDM GDM, EuCOV 1 2 8 loop app-D1 app-D7 Figure 3.2: Neural deviation is better described by the intrinsic geometric coor- dinate. (A) We compute the NRMSE between the true y t and the one-step-ahead prediction y t|t− 1 under different dimensions. For LDM, this is done by running the standard EM with differ- ent dimensions of the latent state. For GDM, we apply the paired EM on neural population activity, run C-PFKF to get y t|t− 1 , and then reduce the dimension by LPR. The shaded regions show the 95% confidence bound of the mean. The difference between the blue and the red lines is that their observation noise covariance in equations (3.11) and (3.13) are either AlCOV or EuCOV, respectively. Similar to figure 3.1, the coordinates of LDM are linearly defined in the Euclidean space, and the coordinates of GDM are the loop plus the LPR dimensions in the appending vector space. (B) is the same as (A) but on Monkey C. shows that our multi-D manifold-building algorithm for GM can find the intrinsic geometric coordinateunderlyingneuralpopulationactivity. Figures3.1Aand3.1Bhavethesametrend across Monkeys J and C. This demonstrates the generalizability of our GM algorithm across subjects. 3.3.2 Intrinsic geometric coordinate dominates the neural noise deviation From figure 3.2, our GDM is better than LDM especially in the low-D space, the same conclusion from GM versus LM in figure 3.1. This shows that not only the distribution of neural population activity but also the dynamics on top of it follow the intrinsic geometric coordinate better than the extrinsic Euclidean coordinate. Interestingly, the GDM with AlCOV observation noise model performs much better than the GDM with EuCOV model. 96 This answers a new question led by our geometric framework: does the neural deviation noise also have its own geometry different from the Euclidean space? The answer is yes! In literature using LDM, this is not even a question because everything, from the system parameters to the latent state, follows the only coordinate system in LDM, the Euclidean one. WithGDM,thisisthefirsttimethatwecanexplorethisquestionbecausewehavetwo coordinate systems in describing the neural population activity. The fact that AlCOV beats EuCOV among all dimensions demonstrates that the whole neural dynamics, no matter the modeled or non-modeled parts, follows the intrinsic geometric coordinate. The cross between GDM with EuCOV and LDM strengthen this argument: The GDM only works when everything in it following the intrinsic geometric coordinate, which is the natural way todescribetheneuralpopulationactivity. Figures3.2Aand3.2Bhavethesametrendacross Monkeys J and C. This demonstrates the generalizability of the above conclusion of GDM. 3.4 Discussion Literatureshowsthatthehigh-Dneuralpopulationactivityisactuallyembeddedinalow-D subspace. Recently, this low-D subspace is replaced by the idea of low-D manifold, but a date-driven analytical geometric model for this manifold is basically lacking. Furthermore, a pervasive dynamic feature, the rotation, is found in neural population activity under di- verse circumstances, including reaching, smelling, sleeping, etc. Our hypothesis about this phenomenon is that the underlying manifold is with holes, so it’s not homotopy equivalent to the Euclidean space R n . Combining two ideas, we develop a new data-driven geometric framework for modeling neural population activity. This framework includes four new com- ponents: (1)ananalyticalmulti-Dgeometricmodel(GM)composedbythemaintopological hole, a loop, and the appending vectors around it, (2) a geometric dynamical model (GDM) on top of the GM with its learning algorithm, the paired-EM, (3) a geometry-specific filter, the calibrated particle-Kalman filter (C-PFKF), for filtering the latent state from neural 97 populationactivityinrealtime, and(4)ageometricdimensionalityreductionalgorithm, the linear projected regression (LPR), for sorting the geometric coordinates in GM and GDM by minimizing the residual error between the true and the reconstructed/estimated neural population activity. Our static data analysis shows that the GM model explains more variance in neural pop- ulation activity comparing to LM under the same dimensions. This result shows that the intrinsic geometric coordinate is more natural than the extrinsic Euclidean coordinate. Fur- thermore, the GM plus LPR provides a way in exploring the contribution of each geometric coordinate. This is similar to the Euclidean space plus PCA, but we cannot do this with the model in literature, which only includes the main topological hole, a loop 112 . That’s why our GM is not 1D but multi-D. Our result shows that the loop captures much more variance thanthe1 st PCA.Thisindicates thattheloop, a1Dhole, is adominantgeometricstructure in neural population activity. This also explains why the rotation is so common in various types of neural recording. Furthermore, the appending LPR vectors beat their correspond- ing PCA coordinates. This shows the deviation from neural population activity to the main loop also follows the geometric coordinate. Again, this emphasizes the importance of the intrinsic geometric coordinate in studying neural recording. Our dynamical data analysis gets the consistent conclusion as the above static analysis: the dynamical system works better when it follows the intrinsic geometric coordinate, not the extrinsic Euclidean one. Besides, we learn something more from GDM than GM. It’s about the neural noise deviation. In LDM, the only coordinate system is the Euclidean (and its linear combination), so the modeled and non-modeled (i.e, noise) parts in LDM follow the same coordinate system by default. However, now we have two coordinate systems: the geometric and the Euclidean one. Of course, the modeled part in GDM uses the geometric coordinate already. The new question is: which coordinate system should the non-modeled part follow, the geometric (AlCOV) or the Euclidean (EuCOV)? Our GDM data analysis 98 reveals a surprising result: the non-modeled neural noise also follows the geometric coordi- nate! Moreover, GDM with EuCOV performs worse than LDM even though the modeled part is geometric! Once again, the intrinsic geometric coordinate demonstrates its dominant role in interpreting neural population activity. In summary, our new geometric framework reveals the importance of the intrinsic geo- metric coordinate underlying neural population activity. More importantly, this geometric framework, includingGM,GDM,C-PFKF,andLPR,providesatoolboxinexploringneural recording quantitatively. The quantification is the first step in digging a problem in deep. As we know, there are many clues in literature implying that the neural population activ- ity is immersing on a low-D manifold with holes. However, only observing this shape in 2D/3D projections visually is not enough to justify this hypothesis. Indeed, we can only verify this hypothesis by building a geometric model (GM) and comparing it with its linear counterpart (i.e., LM). The GM plus LPR shows the importance of geometry among all dimensions. Furthermore, GDM shows the generalizability of this geometry idea by proving the non-modeled part, the neural noise deviation, also follows this intrinsic geometric coor- dinate. These fruitful results can only be achieved with analytical models because all values can be computed and examined, and from this computation, we’ll usually get the results beyond what we expect! Our next steps are applying this geometric toolbox to other neural recording under different circumstances and developing the corresponding control algorithm on GDM. We believe that the geometric viewpoint plus the analytical tool we develop is a new way in exploring neural population activity in neuroscience and neurotechnology. 99 Chapter 4 Conclusion and ongoing work Neural population dynamics in the motor cortex is a researching hot spot in neuroscience and neurotechnology. The motor cortex is valuable in science and engineering. On the one hand, the motor cortex controls the motion of limbs, which is essential in our daily routine, so it’s critical. Also, this means that the effect of neural signals in the motor cortex can be observed directly by measuring subjects’ limbs motion quantitatively (e.g., tracking the hand’s position by cameras), so the meaning of neural signals becomes concrete. On the other hand, the connection between the motor cortex and the motion of limbs implies that we can detect subjects’ motion intention if we can decode the hidden information in the neural signals. This opens the opportunity for many applications. For example, we can decodethemovingintentionofparalyzedpatientsandhelpthemachievethemotionbyusing robot arms/legs or even reactivating their limbs by stimulating the corresponding muscles. This is brain-machine interface (BMI). Overall, understanding the dynamical mechanism underlying neural population activity in the motor cortex and extracting information from it is important. In this thesis, we achieve this throughout two approaches. The first approach is engineering: decoding subject’s motion intention from all simul- taneously recorded neural signals jointly. Current technology can record multiple types of neural signal (e.g., EEG, ECoG, LFP, spike, etc.) across various resolution in time and space. Obviously, if we can utilize them together, the decoding accuracy will be higher than using single type of signals only. The question is: how to do this? In this thesis, we 100 develop a new filter, the multiscale filter (MSF), to achieve this in real time (chapter 1). MSF can handle various neural signals which are (1) continuous/discrete, (2) fast/slow, and (3) missing at some time points. We also develop the adaptive MSF, which learns system parametersbyKalmanfilter(KF)andpointprocessfilter(PPF)withtheiroptimallearning rate calibrated by our new calibrating algorithm (chapter 2). Our NHP data analysis shows thatMSFcancombinedifferentneuralsignalstogetbetterintentionestimationinrealtime. Furthermore, this improvement still exists even when different neural signals come from the same electrode. Based on these results, our MSF can improve BMI performance even when the patient is only implanted a few electrodes. Also, the ability of utilizing various kinds of neural signals may elongate the lifetime of BMI since different neural signals have different decay period of their signal quality. This is another improvement contributed by MSF to BMI. The second approach is scientific: modeling neural population activity while considering its underlying geometry. Currently, most models in literature, even the nonlinear ones, are homotopy equivalent to the Euclidean space. However, the pervasive rotation in neural signals across various recording scenarios (sleeping, walking, smelling, etc.) implies that the underlying geometry is with holes, so it’s not homotopy equivalent to the Euclidean space. We develop a new analytical geometric framework (chapter 3), including a static model(GM),adynamicalmodel(GDM),ageometricfilter(C-PFKF),andadimensionality reductionmethod(LPR).Thisframeworkisatoolboxforexploringthegeometryunderlying neural population activity quantitatively. We apply this geometric toolbox on two NHPs’ neural population activity, and we confirm the hypothesis that there is really a manifold with a hole underlying the neural population activity. Furthermore, we discover something unexpectedthateventheneuraldeviationnoise(i.e.,thenon-modeledpartinGDM)follows the intrinsic geometric coordinate! This shows the generalizability of this geometry with a hole is broader than what we expect at the beginnings, and this discovery can only be found 101 by modeling the geometry analytically. This shows the value of our analytical geometric framework/toolbox. Insummary,thisthesisdemonstratesanengineeringandascientificwaytoexploreneural population activity, and both directions can be extended. For the multiscale filter, some extensions have been done by my colleagues in the lab, for example, making the latent state abstract 137,138 or switching between different multiscale models 139 . More ongoing extension projects are progressed in our lab now. For the geometric framework, we prepare to apply it on neural population activity recorded under various scenarios to see how broad this GM/GDM framework can cover. Also, the torus is another geometry has been found under some neural population activity, and we plan to extend our GM/GDM to this manifold. 102 References 1. Truccolo, W., Hochberg, L. R. & Donoghue, J. P. Collective dynamics in human and monkey sensorimotor cortex: predicting single neuron spikes. Nature neuroscience 13, 105 (2010). 2. Perel, S., Sadtler, P. T., Oby, E. R., Ryu, S. I., Tyler-Kabara, E. C., Batista, A. P., et al. Single-unit activity, threshold crossings, and local field potentials in motor cortex differentially encode reach kinematics. Journal of neurophysiology 114, 1500–1512 (2015). 3. Wong, Y. T., Fabiszak, M. M., Novikov, Y., Daw, N. D. & Pesaran, B. Coherent neuronal ensembles are rapidly recruited when making a look-reach decision. Nature neuroscience 19, 327 (2016). 4. Womelsdorf, T., Schoffelen, J.-M., Oostenveld, R., Singer, W., Desimone, R., Engel, A. K., et al. Modulation of neuronal interactions through neuronal synchronization. science 316, 1609–1612 (2007). 5. Buzs´ aki, G., Anastassiou, C. A. & Koch, C. The origin of extracellular fields and currents—EEG, ECoG, LFP and spikes. Nature reviews neuroscience 13, 407–420 (2012). 6. Einevoll, G. T., Kayser, C., Logothetis, N. K. & Panzeri, S. Modelling and analysis of local field potentials for studying the function of cortical circuits. Nature Reviews Neuroscience 14, 770–785 (2013). 7. Hsieh, H.-L. & Shanechi, M. M. Multiscale brain-machine interface decoders in Engi- neering in Medicine and Biology Society (EMBC), 2016 IEEE 38th Annual Interna- tional Conference of the (2016), 6361–6364. 8. Brown, E. N., Frank, L. M., Tang, D., Quirk, M. C. & Wilson, M. A. A statistical paradigm for neural spike train decoding applied to position prediction from ensemble firing patterns of rat hippocampal place cells. J. Neurosci. 18, 7411–7425 (1998). 9. Eden, U. T., Frank, L. M., Barbieri, R., Solo, V. & Brown, E. N. Dynamic analysis of neural encoding by point process adaptive filtering. Neural Comput. 16, 971–998 (2004). 103 10. Truccolo, W., Eden, U. T., Fellows, M. R., Donoghue, J. P. & Brown, E. N. A point process framework for relating neural spiking activity to spiking history, neural en- semble, and extrinsic covariate effects. J. Neurophysiol. 93, 1074–1089 (2005). 11. Shanechi, M. M., Hu, R. C., Powers, M., Wornell, G. W., Brown, E. N. & Williams, Z. M. Neural population partitioning and a concurrent brain-machine interface for sequential motor function. Nat. Neurosci. 15, 1715–1722 (2012). 12. Shanechi, M., Orsborn, A., Moorman, H., Gowda, S., Dangi, S. & Carmena, J. Rapid Control and Feedback Rates Enhance Neuroprosthetic Control. Nature Communica- tions, 13825 (2017). 13. Kass, R. E. & Ventura, V. A Spike-Train Probability Model. Neural Computation 13, 1713–1720 (2001). 14. Citi, L., Ba, D., Brown, E. N. & Barbieri, R. Likelihood methods for point processes with refractoriness. Neural computation 26, 237–263 (2014). 15. So, K., Dangi, S., Orsborn, A. L., Gastpar, M. C. & Carmena, J. M. Subject-specific modulation of local field potential spectral power during brain-machine interface con- trol in primates. J. Neural Eng. 11, 026002 (2014). 16. Hotson, G., McMullen, D. P., Fifer, M. S., Johannes, M. S., Katyal, K. D., Para, M. P., et al. Individual finger control of a modular prosthetic limb using high-density electrocorticography in a human subject. Journal of neural engineering 13, 026017 (2016). 17. Yanagisawa, T., Hirata, M., Saitoh, Y., Kishima, H., Matsushita, K., Goto, T., et al. Electrocorticographic control of a prosthetic arm in paralyzed patients. Annals of neurology 71, 353–361 (2012). 18. Markowitz, D. A., Wong, Y. T., Gray, C. M. & Pesaran, B. Optimizing the decoding of movement goals from local field potentials in macaque cortex. J. Neurosci. 31, 18412–18422 (2011). 19. Pistohl, T., Ball, T., Schulze-Bonhage, A., Aertsen, A. & Mehring, C. Prediction of armmovementtrajectoriesfromECoG-recordingsinhumans. Journal of neuroscience methods 167, 105–114 (2008). 20. Moran, D. Evolution of brain–computer interface: action potentials, local field poten- tials and electrocorticograms. Current opinion in neurobiology 20, 741–745 (2010). 21. Wang, W., Collinger, J. L., Degenhart, A. D., Tyler-Kabara, E. C., Schwartz, A. B., Moran, D. W., et al. An electrocorticographic brain interface in an individual with tetraplegia. PloS one 8, e55344 (2013). 104 22. Bundy,D.T.,Pahwa,M.,Szrama,N.&Leuthardt,E.C.Decodingthree-dimensional reaching movements using electrocorticographic signals in humans. Journal of neural engineering 13, 026021 (2016). 23. Stavisky, S. D., Kao, J. C., Nuyujukian, P., Ryu, S. I. & Shenoy, K. V. A high per- forming brain–machine interface driven by low-frequency local field potentials alone and together with spikes. Journal of neural engineering 12, 036009 (2015). 24. Andersen, R. A., Kellis, S., Klaes, C. & Aflalo, T. Toward more versatile and intuitive cortical brain–machine interfaces. Current Biology 24, R885–R897 (2014). 25. Shenoy, K. V. & Carmena, J. M. Combining decoder design and neural adaptation in brain-machine interfaces. Neuron 84, 665–680 (2014). 26. Shanechi, M. M. Brain–Machine Interface Control Algorithms. IEEE Transactions on Neural Systems and Rehabilitation Engineering 25, 1725–1734 (2017). 27. Brandman,D.M.,Cash,S.S.&Hochberg,L.R.Review:HumanIntracorticalrecord- ing and neural decoding for brain-computer interfaces. IEEE Transactions on Neural Systems and Rehabilitation Engineering (2017). 28. Schwartz,A.B.,Cui,X.T.,Weber,D.J.&Moran,D.W.Brain-controlledinterfaces: movement restoration with neural prosthetics. Neuron 52, 205–220 (2006). 29. Lebedev, M. A. & Nicolelis, M. Brain-machine interfaces: past, present and future. Trends in Neurosciences 29, 536–546 (2006). 30. Sajda, P., Muller, K.-R. & Shenoy, K. Brain-Computer Interfaces [from the guest editors]. IEEE Signal Processing Magazine 25, 16–17 (2008). 31. Donoghue, J. P. Bridging the brain to the world: a perspective on neural interface systems. Neuron 60, 511–521 (2008). 32. Nicolelis, M. A. & Lebedev, M. A. Principles of neural ensemble physiology underly- ing the operation of brain-machine interfaces. Nature reviews. Neuroscience 10, 530 (2009). 33. Hatsopoulos, N. G. & Suminski, A. J. Sensing with the motor cortex. Neuron 72, 477–487 (2011). 34. Thakor, N. V. Translating the brain-machine interface. Science translational medicine 5, 210ps17–210ps17 (2013). 105 35. Bansal, A. K., Truccolo, W., Vargas-Irwin, C. E. & Donoghue, J. P. Decoding 3D reach and grasp from hybrid signals in motor and premotor cortices: spikes, multiunit activity, and local field potentials. Journal of neurophysiology 107, 1337–1355 (2011). 36. Mehring, C., Rickert, J., Vaadia, E., de Oliveira, S. C., Aertsen, A. & Rotter, S. Inference of hand movements from local field potentials in monkey motor cortex. Nature neuroscience 6, 1253 (2003). 37. Shanechi, M. M., Orsborn, A. L. & Carmena, J. M. Robust Brain-Machine Inter- face Design Using Optimal Feedback Control Modeling and Adaptive Point Process Filtering. PLoS Comput Biol 12, e1004730 (2016). 38. Taylor, D. M., Tillery, S. I. H. & Schwartz, A. B. Direct cortical control of 3D neuro- prosthetic devices. Science 296, 1829–1832 (2002). 39. Carmena, J. M., Lebedev, M. A., Crist, R. E., O’Doherty, J. E., Santucci, D. M., Dimitrov, D. F., et al. Learning to Control a Brain-Machine Interface for Reaching and Grasping by Primates. PLoS Biol. 1, e42 (2003). 40. Ganguly, K. & Carmena, J. M. Emergence of a Stable Cortical Map for Neuropros- thetic Control. PLoS Biol. 7 (2009). 41. Baranauskas, G. What limits the performance of current invasive Brain Machine In- terfaces? Frontiers in Systems Neuroscience 8 (2014). 42. Sadtler, P. T., Quick, K. M., Golub, M. D., Chase, S. M., Ryu, S. I., Tyler-Kabara, E. C., et al. Neural constraints on learning. Nature 512, 423 (2014). 43. Bishop, W., Chestek, C. C., Gilja, V., Nuyujukian, P., Foster, J. D., Ryu, S. I., et al. Self-recalibrating classifiers for intracortical brain–computer interfaces. Journal of neural engineering 11, 026001 (2014). 44. Wong,Y.T.,Putrino,D.,Weiss,A.&Pesaran,B.Utilizingmovementsynergies toim- prove decoding performance for a brain machine interface in Engineering in Medicine and Biology Society (EMBC), 2013 35th Annual International Conference of the IEEE (2013), 289–292. 45. Hsieh, H.-L., Wong, Y. T., Pesaran, B. & Shanechi, M. M. Multiscale modeling and decoding algorithms for spike-field activity. Journal of neural engineering 16, 016018 (2018). 46. Shanechi,M.M.,Hu,R.C.&Williams,Z.M.Acortical-spinalprosthesisfortargeted limb movement in paralysed primate avatars. Nat. Commun. 5 (2014). 106 47. Kang, X., Sarma, S. V., Santaniello, S., Schieber, M. & Thakor, N. V. Task- independent cognitive state transition detection from cortical neurons during 3-D reach-to-grasp movements. IEEE Transactions on Neural Systems and Rehabilitation Engineering 23, 676–682 (2015). 48. Coleman, T. P. & Sarma, S. S. A computationally efficient method for nonparametric modeling of neural spiking activity with point processes. Neural Computation 22, 2002–2030 (2010). 49. Smith, A. C. & Brown, E. N. Estimating a state-space model from point process observations. Neural Comput. 15, 965–991 (2003). 50. Shanechi, M. M., Williams, Z. M., Wornell, G. W., Hu, R. C., Powers, M. & Brown, E.N.Areal-timebrain-machineinterfacecombiningmotortargetandtrajectoryintent using an optimal feedback control design. PloS one 8 (2013). 51. Agarwal, R., Chen, Z., Kloosterman, F., Wilson, M. A. & Sarma, S. V. A novel non- parametricapproachforneuralencodinganddecodingmodelsofmultimodalreceptive fields. Neural computation 28, 1356–1387 (2016). 52. Sani, O. G., Yang, Y., Lee, M. B., Dawes, H. E., Chang, E. F. & Shanechi, M. M. Mood variations decoded from multi-site intracranial human brain activity. Nature biotechnology (2018). 53. Yang,Y.,Connolly,A.T.&Shanechi,M.M.Acontrol-theoreticsystemidentification framework and a real-time closed-loop clinical simulation testbed for electrical brain stimulation. Journal of Neural Engineering 15, 066007 (2018). 54. Gilja, V., Nuyujukian, P., Chestek, C. A., Cunningham, J. P., Yu, B. M., Fan, J. M., et al. A High-Performance Neural Prosthesis Enabled by Control Algorithm Design. Nat. Neurosci. 15, 1752–1757 (2012). 55. Orsborn, A. L., Moorman, H. G., Overduin, S. A., Shanechi, M. M., Dimitrov, D. F. & Carmena, J. M. Closed-loop decoder adaptation shapes neural plasticity for skillful neuroprosthetic control. Neuron 82, 1380–1392 (2014). 56. Koyama, S., Eden, U. T., Brown, E. N. & Kass, R. E. Bayesian decoding of neural spike trains. Annals of the Institute of Statistical Mathematics 62, 37 (2010). 57. MacKay,D.J. Information theory, inference and learning algorithms (Cambridgeuni- versity press, 2003). 58. Gilja,V.,Pandarinath,C.,Blabe,C.H.,Nuyujukian,P.,Simeral,J.D.,Sarma,A.A., etal.Clinicaltranslationofahigh-performanceneuralprosthesis.Naturemedicine21, 1142–1145 (2015). 107 59. Yang, Y. & Shanechi, M. M. An adaptive and generalizable closed-loop system for control of medically induced coma and other states of anesthesia. Journal of neural engineering 13, 066019 (2016). 60. Velliste, M., Perel, S., Spalding, M. C., Whitford, A. S. & Schwartz, A. B. Cortical control of a prosthetic arm for self-feeding. Nature 453, 1098–1101 (2008). 61. Orsborn, A. L., Dangi, S., Moorman, H. G. & Carmena, J. M. Closed-Loop Decoder AdaptationonIntermediateTime-ScalesFacilitatesRapidBMIPerformanceImprove- ments Independent of Decoder Initialization Conditions. IEEE Trans. Neural Syst. Rehabil. Eng. 20, 468–477 (2012). 62. Collinger,J.L.,Wodlinger,B.,Downey,J.E.,Wang,W.,Tyler-Kabara,E.C.,Weber, D. J., et al. High-performance neuroprosthetic control by an individual with tetraple- gia. The Lancet 381, 557–564 (2013). 63. Mahmoudi, B. & Sanchez, J. C. A symbiotic brain-machine interface through value- based decision making. PLOS ONE 6, e14760 (2011). 64. Hochberg, L. R., Bacher, D., Jarosiewicz, B., Masse, N. Y., Simeral, J. D., Vogel, J., et al. Reach and grasp by people with tetraplegia using a neurally controlled robotic arm. Nature 485 (2012). 65. Dangi, S., Gowda, S., Moorman, H. G., Orsborn, A. L., So, K., Shanechi, M., et al. Continuous Closed-Loop Decoder Adaptation with a Recursive Maximum Likelihood Algorithm Allows for Rapid Performance Acquisition in Brain-Machine Interfaces. Neural Comput. 26, 1811–1839 (2014). 66. Fan, J. M., Nuyujukian, P., Kao, J. C., Chestek, C. A., Ryu, S. I. & Shenoy, K. V. Intention estimation in brain–machine interfaces. Journal of neural engineering 11, 016004 (2014). 67. Willett, F. R., Pandarinath, C., Jarosiewicz, B., Murphy, B. A., Memberg, W. D., Blabe, C. H., et al. Feedback control policies employed by people using intracortical brain–computer interfaces. Journal of Neural Engineering 14, 016001 (2017). 68. Brown, E. N., Barbieri, R., Ventura, V., Kass, R. & Frank, L. The time-rescaling theorem and its application to neural spike train data analysis. Neural Comput. 14, 325–346 (2001). 69. Hsieh, H.-L. & Shanechi, M. M. Optimizing the learning rate for adaptive estimation of neural encoding models. PLOS Computational Biology 14, 1–34. doi:10.1371/ journal.pcbi.1006168 (2018). 108 70. Myers, K. & Tapley, B. Adaptive sequential estimation with unknown noise statistics. IEEE Transactions on Automatic Control 21, 520–523 (1976). 71. Todorov, E. Optimality principles in sensorimotor control. Nat. Neurosci., 907–915 (2004). 72. Shanechi,M.M.,Wornell,G.W.,Williams,Z.M.&Brown,E.N.Feedback-controlled parallel point process filter for estimation of goal-directed movements from neural signals. IEEE Trans. Neural Syst. Rehabil. Eng. 21, 129–140 (2013). 73. Shadmehr, R. & Krakauer, J. W. A computational neuroanatomy for motor control. Exp. Brain Res., 359–381 (2008). 74. Golub, M. D., Byron, M. Y. & Chase, S. M. Internal models for interpreting neural population activity during sensorimotor control. Elife 4, e10015 (2015). 75. Huh, D. & Sejnowski, T. J. Conservation law for self-paced movements. Proceedings of the National Academy of Sciences 113, 8831–8836 (2016). 76. Jeffrey, B. B. Linear Optimal Control 1999. 77. Moran, D. W. & Schwartz, A. B. Motor cortical representation of speed and direction during reaching. Journal of neurophysiology 82, 2676–2692 (1999). 78. Kim, S.-P., Simeral, J. D., Hochberg, L. R., Donoghue, J. P. & Black, M. J. Neural control of computer cursor velocity by decoding motor cortical spiking activity in humans with tetraplegia. J. Neural Eng. 5, 455–476 (2008). 79. Kao, J. C., Nuyujukian, P., Ryu, S. I. & Shenoy, K. V. A high-performance neural prosthesis incorporating discrete state selection with hidden Markov models. IEEE Transactions on Biomedical Engineering 64, 935–945 (2017). 80. Delp, S. L., Anderson, F. C., Arnold, A. S., Loan, P., Habib, A., John, C. T., et al. OpenSim: open-source software to create and analyze dynamic simulations of move- ment. IEEE transactions on biomedical engineering 54, 1940–1950 (2007). 81. Heldman,D.A.,Wang,W.,Chan,S.S.&Moran,D.W.Localfieldpotentialspectral tuning in motor cortex during reaching. IEEE Transactions on Neural Systems and Rehabilitation Engineering 14, 180–183 (2006). 82. Ball, T., Schulze-Bonhage, A., Aertsen, A. & Mehring, C. Differential representation of arm movement direction in relation to cortical anatomy and function. Journal of neural engineering 6, 016006 (2009). 109 83. Kilavik, B. E., Ponce-Alvarez, A., Trachel, R., Confais, J., Takerkart, S. & Riehle, A. Context-related frequency modulations of macaque motor cortical LFP beta oscilla- tions. Cerebral cortex 22, 2148–2159 (2011). 84. Muller, L., Chavane, F., Reynolds, J. & Sejnowski, T. J. Cortical travelling waves: mechanisms and computational principles. Nature Reviews Neuroscience (2018). 85. Willett, F. R., Suminski, A. J., Fagg, A. H. & Hatsopoulos, N. G. Improving brain- machine interface performance by decoding intended future movements. J. Neural Eng. 10, 026011 (2013). 86. Cunningham, J. P., Nuyujukian, P., Gilja, V., Chestek, C. A., Ryu, S. I. & Shenoy, K.V.AClosed-LoopHumanSimulatorforInvestigatingtheRoleofFeedbackControl in Brain-Machine Interfaces. J. Neurophysiol. 105, 1932–1949 (2011). 87. Frank, L. M., Eden, U. T., Solo, V., Wilson, M. A. & Brown, E. N. Contrasting patterns of receptive field plasticity in the hippocampus and the entorhinal cortex: an adaptive filtering approach. Journal of Neuroscience 22, 3817–3830 (2002). 88. Frank, L. M., Stanley, G. B. & Brown, E. N. Hippocampal plasticity across multi- ple days of exposure to novel environments. Journal of Neuroscience 24, 7681–7689 (2004). 89. Van Hartevelt, T. J., Cabral, J., Deco, G., Møller, A., Green, A. L., Aziz, T. Z., et al. Neural plasticity in human brain connectivity: the effects of long term deep brain stimulation of the subthalamic nucleus in Parkinson’s disease. PloS one 9, e86496 (2014). 90. Santaniello, S., Montgomery Jr, E. B., Gale, J. T. & Sarma, S. V. Non-stationary discharge patterns in motor cortex under subthalamic nucleus deep brain stimulation. Frontiers in integrative neuroscience 6, 35 (2012). 91. Little, S., Pogosyan, A., Neal, S., Zavala, B., Zrinzo, L., Hariz, M., et al. Adaptive deep brain stimulation in advanced Parkinson disease. Annals of neurology 74, 449– 457 (2013). 92. Brown, E. N., Nguyen, D. P., Frank, L. M., Wilson, M. A. & Solo, V. An analysis of neural receptive field plasticity by point process adaptive filtering. Proceedings of the National Academy of Sciences 98, 12261–12266. doi:10.1073/pnas.201409398 (2001). 93. Haykin, S. S. Adaptive filter theory (Pearson Education India, 2008). 110 94. Shanechi, M. M. & Carmena, J. M. Optimal feedback-controlled point process decoder for adaptation and assisted training in brain-machine interfaces inNeural Engineering (NER), 2013 6th International IEEE/EMBS Conference on (2013), 653–656. 95. Jacobs, R. A. Increased rates of convergence through learning rate adaptation. Neural networks 1, 295–307 (1988). 96. Luo, Z.-Q. On the convergence of the LMS algorithm with adaptive learning rate for linear feedforward networks. Neural Computation 3, 226–245 (1991). 97. Xue, Q., Hu, Y. H. & Tompkins, W. J. Neural-network-based adaptive matched fil- tering for QRS detection. IEEE Transactions on Biomedical Engineering 39, 317–329 (1992). 98. Wan,E.A.&VanDerMerwe,R.TheunscentedKalmanfilterfornonlinearestimation in Adaptive Systems for Signal Processing, Communications, and Control Symposium 2000. AS-SPCC. The IEEE 2000 (2000), 153–158. 99. Polesel, A., Ramponi, G. & Mathews, V. J. Image enhancement via adaptive unsharp masking. IEEE transactions on image processing 9, 505–510 (2000). 100. Belitski, A., Gretton, A., Magri, C., Murayama, Y., Montemurro, M. A., Logothetis, N. K., et al. Low-Frequency Local Field Potentials and Spikes in Primary Visual Cortex Convey Independent Visual Information. Journal of Neuroscience 28, 5696– 5709. doi:10.1523/JNEUROSCI.0009-08.2008 (2008). 101. Haykin, S. Kalman filtering and neural networks (John Wiley & Sons, 2004). 102. Liu, D. & Todorov, E. Evidence for the flexible sensorimotor strategies predicted by optimal feedback control. J. Neurosci., 9354–9368 (2007). 103. Merel, J., Pianto, D. M., Cunningham, J. P. & Paninski, L. Encoder-Decoder Opti- mization for Brain-Computer Interfaces. PLoS Comput Biol 11, e1004288 (2015). 104. Arulampalam, M. S., Maskell, S., Gordon, N. & Clapp, T. A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Transactions on signal processing 50, 174–188 (2002). 105. Cunningham,J.P.&Yu,B.M.Dimensionalityreductionforlarge-scaleneuralrecord- ings. Nature neuroscience 17, 1500–1509 (2014). 106. Chapin, J. K. & Nicolelis, M. A. Principal component analysis of neuronal ensemble activity reveals multidimensional somatosensory representations. Journal of neuro- science methods 94, 121–140 (1999). 111 107. Kobak, D., Brendel, W., Constantinidis, C., Feierstein, C. E., Kepecs, A., Mainen, Z. F., et al. Demixed principal component analysis of neural population data. Elife 5, e10989 (2016). 108. Churchland, M. M., Cunningham, J. P., Kaufman, M. T., Foster, J. D., Nuyujukian, P., Ryu, S. I., et al. Neural population dynamics during reaching. Nature 487, 51–56 (2012). 109. Stephen,E.P.,Li,Y.,Metzger,S.,Oganian,Y.&Chang,E.F.Latentneuraldynamics encode temporal context in speech. bioRxiv (2021). 110. Chung, S. & Abbott, L. Neural population geometry: An approach for understanding biological and artificial neural networks. Current opinion in neurobiology 70, 137–144 (2021). 111. Mitchell-Heggs, R., Prado, S., Gava, G. P., Go, M. A. & Schultz, S. R. Neural manifold analysis of brain circuit dynamics in health and disease. arXiv preprint arXiv:2203.11874 (2022). 112. Chaudhuri,R.,Ger¸ cek,B.,Pandey,B.,Peyrache,A.&Fiete,I.Theintrinsicattractor manifold and population dynamics of a canonical cognitive circuit across waking and sleep. Nature neuroscience 22, 1512–1520 (2019). 113. Saha, D., Leong, K., Li, C., Peterson, S., Siegel, G. & Raman, B. A spatiotemporal codingmechanismforbackground-invariantodorrecognition.Natureneuroscience16, 1830–1839 (2013). 114. Zhang, Y., Xu, G., Wang, J. & Liang, L. An automatic patient-specific seizure onset detection method in intracranial EEG based on incremental nonlinear dimensionali- tyreduction. Computers in biology and medicine 40, 889–899 (2010). 115. Gardner,R.J.,Hermansen,E.,Pachitariu,M.,Burak,Y.,Baas,N.A.,Dunn,B.A.,et al. Toroidal topology of population activity in grid cells. Nature 602, 123–128 (2022). 116. Kriegeskorte, N. & Wei, X.-X. Neural tuning and representational geometry. Nature Reviews Neuroscience 22, 703–718 (2021). 117. Carlsson, G. Topology and data. Bulletin of the American Mathematical Society 46, 255–308 (2009). 118. Zomorodian, A. & Carlsson, G. Computing persistent homology. Discrete & Compu- tational Geometry 33, 249–274 (2005). 112 119. Singh, G., Memoli, F., Ishkhanov, T., Sapiro, G., Carlsson, G. & Ringach, D. L. Topologicalanalysisofpopulationactivityinvisualcortex. Journal of vision 8,11–11 (2008). 120. Sizemore,A.E.,Karuza,E.A.,Giusti,C.&Bassett,D.S.Knowledgegapsintheearly growth of semantic feature networks. Nature human behaviour 2, 682–692 (2018). 121. Sabatini, D. A. & Kaufman, M. T. A curved manifold orients rotational dynamics in motor cortex. bioRxiv (2021). 122. Low, I. I., Williams, A. H., Campbell, M. G., Linderman, S. W. & Giocomo, L. M. Dynamic and reversible remapping of network representations in an unchanging envi- ronment. Neuron 109, 2967–2980 (2021). 123. Deitch, D., Rubin, A. & Ziv, Y. Representational drift in the mouse visual cortex. Current biology 31, 4327–4339 (2021). 124. Fu, Z., Beam, D., Chung, J. M., Reed, C. M., Mamelak, A. N., Adolphs, R., et al. Thegeometryofdomain-generalperformancemonitoringinthehumanmedialfrontal cortex. Science 376, eabm9922 (2022). 125. Jensen, K., Kao, T.-C., Tripodi, M. & Hennequin, G. Manifold gplvms for discov- ering non-euclidean latent structure in neural data. Advances in Neural Information Processing Systems 33, 22580–22592 (2020). 126. DeSilva,V.&Carlsson,G.E.Topologicalestimationusingwitnesscomplexes. SPBG 4, 157–166 (2004). 127. Bartels, R. H., Beatty, J. C. & Barsky, B. A. Hermite and cubic spline interpolation. An Introduction to Splines for Use in Computer Graphics and Geometric Modelling, 9–17 (1998). 128. De Silva, V., Morozov, D. & Vejdemo-Johansson, M. Persistent cohomology and cir- cular coordinates. Discrete & Computational Geometry 45, 737–759 (2011). 129. Hastie, T., Tibshirani, R. & Friedman, J. The elements of statistical learning: data mining, inference, and prediction (Springer Science & Business Media, 2009). 130. Hahsler, M. & Hornik, K. TSP-Infrastructure for the traveling salesperson problem. Journal of Statistical Software 23, 1–21 (2007). 131. K¨ uhnel, W. Differential geometry (American Mathematical Soc., 2015). 132. Klok,F.Twomovingcoordinateframesforsweepingalonga3Dtrajectory. Computer Aided Geometric Design 3, 217–229 (1986). 113 133. Kailath, T., Sayed, A. H. & Hassibi, B. Linear estimation BOOK (Prentice Hall, 2000). 134. Ghahramani, Z. & Hinton, G. E. Parameter estimation for linear dynamical systems tech.rep.(TechnicalReportCRG-TR-96-2,UniversityofTotronto,Dept.ofComputer Science, 1996). 135. Ma, J., Wu, O., Huang, B. & Ding, F. Expectation maximization estimation for a class of input nonlinear state space systems by using the Kalman smoother. Signal Processing 145, 295–303 (2018). 136. Hatcher, A. Algebraic topology. Cambridge Univ. Press (2002). 137. Abbaspourazad,H.,Hsieh,H.-L.&Shanechi,M.M.Amultiscaledynamicalmodeling and identification framework for spike-field activity. IEEE Transactions on Neural Systems and Rehabilitation Engineering 27, 1128–1138 (2019). 138. Abbaspourazad, H., Choudhury, M., Wong, Y. T., Pesaran, B. & Shanechi, M. M. Multiscale low-dimensional motor cortical state dynamics predict naturalistic reach- and-grasp behavior. Nature communications 12, 1–19 (2021). 139. Song, C. Y., Hsieh, H.-L., Pesaran, B. & Shanechi, M. M. Modeling and inference methods for switching regime-dependent dynamical systems with multiscale neural observations. Journal of Neural Engineering (2022). 114
Abstract (if available)
Abstract
Neural population dynamics in the motor cortex underlying naturalistic movement is nonlinear, multi-dimensional, and encoded across multiple spatiotemporal scales of neural population activity. To model and decode this complex biological system, we take two approaches. The first approach is engineering: we develop a new adaptive multiscale filter (MSF, chapter 1) for decoding NHP's intentions in real time by running at the millisecond time-scale of spikes while adding information from fields at their slower time-scales. This jointly decoding idea is inspired by the fact that modern technology can simultaneously record neural signals in various scales, from spiking of individual neurons to large neural populations measured with local field potential (LFP). From information theory, multiscale decoding is always better than the single-scaled one, and this motivates us developing MSF. Furthermore, we extend it to adaptive MSF learning all spike-field multiscale model parameters simultaneously in real time at their different time-scales. Here, both learning speed and accuracy are controlled by a hyper-parameter, the learning rate. For the best performance, we develop a novel analytical calibration algorithm for optimal selection of the learning rate (chapter 2). The key is deriving the explicitly analytical functions that predict the effect of learning rate on the estimation error and the convergence time. We validate the adaptive MSF by numerical simulations and experiments over a non-human primate (NHP) whose spikes and LFPs are recorded from motor cortex during a naturalistic 3D reach-and-grasp task. Our simulations show that the adaptive MSF can add information across scales while accurately learning all model parameters in real time. NHP data shows that MSF outperforms single-scale decoding. This improvement is largest in the low-information regime, and is similar regardless of the degree of overlap between spikes and LFPs; The second approach is scientific: we develop a new geometric dynamical model (GDM, chapter 3) guided by the nonlinear manifold underlying the neural population activity from the motor cortex. It is inspired by a prominent feature, the rotation, that has been observed in linear/nonlinear low-dimensional projections of neural population activity across many datasets. We hypothesize that the cause of these rotations is that there exist nonlinear multi-dimensional (multi-D) manifolds with holes underlying neural population activity. Thus we develop GDM which learns a nonlinear multi-D manifold with a hole and fits the intrinsic dynamical model on top of this manifold. We apply GDM to NHP's neural firing rates, and the result shows that the intrinsic geometric coordinate provides a more natural space to describe the dynamics of neural population activity compared to the extrinsic Euclidean coordinate. Overall, our adaptive MSF and GDM provide tools to study neural population activity from engineering and scientific angles. They are tools for exploring neurotechnology and neuroscience in the future.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Dynamical representation learning for multiscale brain activity
PDF
Switching dynamical systems with Poisson and multiscale observations with applications to neural population activity
PDF
Multiscale spike-field network causality identification
PDF
Detection and decoding of cognitive states from neural activity to enable a performance-improving brain-computer interface
PDF
Decoding memory from spatio-temporal patterns of neuronal spikes
PDF
Nonlinear dynamical modeling of single neurons and its application to analysis of long-term potentiation (LTP)
PDF
On the electrophysiology of multielectrode recordings of the basal ganglia and thalamus to improve DBS therapy for children with secondary dystonia
PDF
Experimental and computational models for seizure prediction
PDF
Computational models and model-based fMRI studies in motor learning
PDF
Dealing with unknown unknowns
PDF
Inverse modeling and uncertainty quantification of nonlinear flow in porous media models
PDF
Theoretical foundations and design methodologies for cyber-neural systems
PDF
Multiscale modeling of cilia mechanics and function
PDF
Metasurfaces in 3D applications: multiscale stereolithography and inverse design of diffractive optical elements for structured light
PDF
Efficient inverse analysis with dynamic and stochastic reductions for large-scale models of multi-component systems
PDF
Deterministic evolutionary game theory models for the nonlinear dynamics and control of chemotherapeutic resistance
PDF
Novel multi-stage and CTLS-based model updating methods and real-time neural network-based semiactive model predictive control algorithms
PDF
Correction, coregistration and connectivity analysis of multi-contrast brain MRI
PDF
High throughput computational framework for synthesis and accelerated discovery of dielectric polymer materials using polarizable reactive molecular dynamics and graph neural networks
PDF
A million-plus neuron model of the hippocampal dentate gyrus: role of topography, inhibitory interneurons, and excitatory associational circuitry in determining spatio-temporal dynamics of granul...
Asset Metadata
Creator
Hsieh, Han-Lin
(author)
Core Title
Geometric and dynamical modeling of multiscale neural population activity
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Degree Conferral Date
2022-12
Publication Date
12/15/2024
Defense Date
12/01/2022
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
BMI,brain machine interface,geometry,learning rate,manifold,multiscale,neural population activity,nonlinear modeling,OAI-PMH Harvest,topology
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Shanechi, Maryam (
committee chair
), Nayak, Krishna (
committee member
), Valero-Cuevas, Francisco (
committee member
)
Creator Email
hanlinhs@usc.edu,shl.prince.of.persia@gmail.com
Unique identifier
UC112620979
Identifier
etd-HsiehHanLi-11385.pdf (filename)
Legacy Identifier
etd-HsiehHanLi-11385
Document Type
Dissertation
Format
theses (aat)
Rights
Hsieh, Han-Lin
Internet Media Type
application/pdf
Type
texts
Source
20221216-usctheses-batch-998
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
BMI
brain machine interface
geometry
learning rate
manifold
multiscale
neural population activity
nonlinear modeling
topology