Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Detection and decoding of cognitive states from neural activity to enable a performance-improving brain-computer interface
(USC Thesis Other)
Detection and decoding of cognitive states from neural activity to enable a performance-improving brain-computer interface
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Detection and Decoding of Cognitive States from Neural Activity
to Enable a Performance-Improving Brain-Computer Interface
by
Nitin Sadras
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(ELECTRICAL ENGINEERING)
May 2024
Copyright 2024 Nitin Sadras
Acknowledgements
I thank my advisor, Maryam Shanechi, for her guidance and wisdom, and for the opportunities she
has provided me.
I thank our collaborators Davide Valeriani, Caterina Cinel, and Nick Yeung for their advice
on EEG recording and analysis. I also sincerely thank Bijan Pesaran for providing the invaluable
datasets that I used for my research.
I thank my colleagues in the NSEIP lab for their support, both academic and emotional. I
am especially grateful to Omid Sani for his mentorship throughout my PhD. I also thank Lucine
Oganesian, Christian Song, and Parima Ahmadipour for being my support network in the lab.
I thank Han-Lin Hsieh, Dongkyu Kim, Eray Erturk, Alireza Ziabari, Parsa Vahidi, Rahul Nair,
Mustafa Avcu, and Christoph Schneider for their assistance with data collection.
Finally, I thank my family - Mahalakshmi, Balakrishnan, Vignesh, and Momo - for their unconditional love and support.
ii
Table of Contents
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Event Detection from Neural Activity . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Cognitive State Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Chapter 2: Event Detection and Decoding from Spiking Neural Activity . . . . . . . . . . . 4
2.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.1 Point Process Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.2 Estimating Event Times: The Matched Filter . . . . . . . . . . . . . . . . 10
2.1.3 Threshold Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.4 GLM Maximum Likelihood Parameter Estimation . . . . . . . . . . . . . 16
2.1.5 Parameter Estimation for High-Dimensional STRFs . . . . . . . . . . . . 18
2.1.6 Point Process Decoder for Neurons with STRF . . . . . . . . . . . . . . . 19
2.1.7 Performance Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.1.7.1 Event Detection . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.1.7.2 Saliency Decoding . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.1.7.3 Saccade Decoding . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.1.7.4 Neuronal Predictive Power . . . . . . . . . . . . . . . . . . . . 25
2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2.1 Simulated Data: Visual Saliency . . . . . . . . . . . . . . . . . . . . . . . 26
2.2.1.1 PPMF Accurately Detects Stimulus Events. . . . . . . . . . . . 29
2.2.1.2 PPF Can Decode Saliency Maps. . . . . . . . . . . . . . . . . . 29
2.2.1.3 Number of Neurons vs Image Resolution. . . . . . . . . . . . . 30
2.2.1.4 Effect of SRF Arrangement. . . . . . . . . . . . . . . . . . . . . 31
2.2.1.5 Estimated STRFs Match True Values. . . . . . . . . . . . . . . . 33
2.2.2 Real Data: Saccade Detection and Decoding . . . . . . . . . . . . . . . . 34
2.2.2.1 STRF Model is Significantly Predictive of Spikes. . . . . . . . . 38
2.2.2.2 PPMF Successfully Detects Saccade Events. . . . . . . . . . . . 38
2.2.2.3 ML Classifier Can Predict Saccade Direction. . . . . . . . . . . 39
iii
2.2.2.4 Increasing the Number of Neurons Improves Detection and Decoding Performances. . . . . . . . . . . . . . . . . . . . . . . . 40
2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.3.1 Comparison to Existing Methods . . . . . . . . . . . . . . . . . . . . . . 41
2.3.2 The Choice of the STRF Model . . . . . . . . . . . . . . . . . . . . . . . 43
2.3.3 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Chapter 3: Event Detection and Decoding from Multimodal Neural Activity . . . . . . . . 45
3.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.1.1 Multimodal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.1.2 Maximum Likelihood Estimate of Event Times and Classes . . . . . . . . 49
3.1.3 Model Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.1.4 Multimodal Model of Saccade-Sensitive Neural Activity . . . . . . . . . . 54
3.1.5 Nonhuman Primate Saccade Task . . . . . . . . . . . . . . . . . . . . . . 55
3.1.6 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.2.1 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.2.2 Nonhuman Primate (NHP) Data Results . . . . . . . . . . . . . . . . . . . 60
3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.3.1 Bi-Directional Performance Improvement . . . . . . . . . . . . . . . . . . 61
3.3.2 Cross-Modal Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.3.3 Applications and Future Directions . . . . . . . . . . . . . . . . . . . . . 62
Chapter 4: The Neural Correlates of Decision Confidence and Their Potential for Use in a
Performance-Enhancing Brain-Computer Interface . . . . . . . . . . . . . . . . . . . . 64
4.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.1.1 Experimental task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.1.2 Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.1.3 Data pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.1.4 Data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.1.4.1 ERP analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.1.4.2 EEG source localization. . . . . . . . . . . . . . . . . . . . . . 73
4.1.5 Single-trial decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.1.6 BCI simulation framework . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.2.1 Reported confidence is predictive of accuracy . . . . . . . . . . . . . . . . 80
4.2.2 Confidence-related ERPs are stimulus-locked . . . . . . . . . . . . . . . . 81
4.2.3 Cortical sources of confidence-related activity . . . . . . . . . . . . . . . . 84
4.2.4 Confidence can be decoded from single trial stimulus-locked pre-response
EEG activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.2.5 Decoded confidence can be used to improve task performance in a simulated BCI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.3.1 The importance of a post-stimulus gap . . . . . . . . . . . . . . . . . . . . 90
4.3.2 Neural sources of confidence . . . . . . . . . . . . . . . . . . . . . . . . . 92
iv
4.3.3 Confidence classifier is viable for use in a real-time BCI . . . . . . . . . . 93
4.3.4 BCI Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.3.5 Future directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Chapter 5: Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
A The Neural Correlates of Decision Confidence: Additional Details and Control
Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
A.1 Cluster-Based Permutation Test . . . . . . . . . . . . . . . . . . . . . . . 116
A.2 Confidence Threshold Sensitivity Analysis . . . . . . . . . . . . . . . . . 117
A.3 Source Localization Control Analysis . . . . . . . . . . . . . . . . . . . . 117
A.4 Temporal Split for Classifier Cross-Validation . . . . . . . . . . . . . . . . 120
v
List of Figures
1.1 Components of a performance-improving brain-computer interface . . . . . . . . . 2
2.1 Point process matched filter data pipeline . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Point process saliency and saccade encoding models . . . . . . . . . . . . . . . . 10
2.3 Time invariance of the area under the firing rate curve . . . . . . . . . . . . . . . . 11
2.4 Point process event detection flowchart . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5 Point process event detection and decoding example on simulated saliency data . . 28
2.6 Point process matched filter decoding performance on simulated saliency data (1/2) 30
2.7 Point process matched filter decoding performance on simulated saliency data (2/2) 32
2.8 True vs estimated point process model parameters . . . . . . . . . . . . . . . . . . 33
2.9 Non-human primate saccade task . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.10 Spatiotemporal response fields in nonhuman primate frontal eye fields . . . . . . . 35
2.11 Point process model evaluation on nonhuman primate data . . . . . . . . . . . . . 37
2.12 Point process matched filter performance on nonhuman primate spiking activity . . 39
2.13 Point process matched filter performance increases with more neurons . . . . . . . 40
3.1 Multimodal event detector sample output . . . . . . . . . . . . . . . . . . . . . . 52
3.2 Multimodal saccade encoding model . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.3 True vs estimated multimodal model parameters . . . . . . . . . . . . . . . . . . . 57
3.4 Multimodal event detector performance on simulated data . . . . . . . . . . . . . . 58
3.5 Multimodal event detector performance on non-human primate data . . . . . . . . 59
4.1 A confidence-based brain-computer interface for improved task performance . . . 65
4.2 Experimental protocol for the gap / no gap stimulus discrimination tasks. . . . . . 70
vi
4.3 Confidence classifier architecture details. . . . . . . . . . . . . . . . . . . . . . . . 77
4.4 BCI simulation flowchart. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.5 Behavior analysis results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.6 EEG analysis results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.7 EEG source localization analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.8 Confidence classification results using pre-response activity in the gap task. . . . . 86
4.9 BCI Simulation Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.1 ERP analysis, using all trials split at 80th percentile. . . . . . . . . . . . . . . . . . 118
5.2 ERP analysis, using all trials split at 50th percentile. . . . . . . . . . . . . . . . . . 119
5.3 eLORETA Source Localization Control Analysis . . . . . . . . . . . . . . . . . . 122
5.4 Confidence Classification with Temporal Cross-Validation Split . . . . . . . . . . 123
vii
Abstract
Brain-computer interfaces (BCIs) consist of hardware and software that create a communication
channel between a user’s brain and external devices. BCI technology has typically been used to
restore functionality to injured or impaired patients. Outside of clinical applications, there is also
growing interest in developing BCIs that can improve a user’s capabilities. In this work, we develop
key components and show proof-of-concept of a BCI that can improve a user’s performance on a
given task by providing feedback based on decoded cognitive states.
First, we develop a method to detect stimulus- and behavior-related events from neural activity,
addressing the challenge of multiple event types that have varying spatiotemporal signatures. This
method enables neural decoding in cases where task-related event times, such as stimulus onsets,
are unknown ahead of time. We develop event-detection algorithms for binary spiking data and
for multimodal data consisting of both spike and local field potential (LFP) signals. Second, we
investigate the neural correlates of decision confidence in electroencephalogram (EEG) activity.
We discover that neural activity is modulated by confidence in the post-stimulus epoch, but not in
the post-response epoch, and show that confidence can be decoded from single-trial post-stimulus
EEG activity. We then design a simulated BCI framework to show that this confidence decoding is
accurate enough to build a BCI that can improve performance on a decision-making task, especially
when the difficulty and error cost are high.
The advancements made in this work can facilitate the development of cognitive BCIs that can
be used in naturalistic settings without constrained tasks or prior knowledge of event times.
viii
Chapter 1
Introduction
1.1 Motivation
Brain-computer interfaces (BCIs) consist of hardware and software that create a direct control
pathway between a subject’s brain and an external device [1–15]. While BCI technology has
largely been used to restore functionality to injured or impaired patients, there is growing interest
in developing BCIs that can improve a user’s capabilities [16–20]. As such, the primary motivation
of this work is the development of a real-time BCI that can improve a user’s abilities by providing
them information based on decoded cognitive states. The envisioned BCI is composed of several
key components, shown in Figure 1.1. The two primary goals of this work are 1) to develop event
detection and decoding methods to enable real-time BCIs and 2) to identify cognitive states that
can be decoded from neural activity, and show that these decoded states can be used to provide
feedback that is capable of improving a user’s performance on a given task.
1.2 Event Detection from Neural Activity
The core component of a BCI is the decoder, an algorithm that estimates a subject’s brain state
from neural activity. The brain state, which can represent intended behavior, an affective state, or
a perceived stimulus, can then be translated into a control command for a device such as a prosthetic, electrical stimulation device, or computer interface. For example, motor decoders estimate
1
Figure 1.1: Key components of a brain-computer interface for improved task performance. Here,
the user is performing a task that involves making a decision about a stimulus. A continuous
stream of neural data is passed to the event detector, which can detect task-relevant events that
are encoded in neural activity, such as stimulus onset or shifts in attention. The decoder then uses
data aligned to the detected event to estimate the user’s cognitive state. Based on this estimated
cognitive state, feedback can be given to the user in the form of sensory input in order to help
improve task performance.
intended movement to control a prosthetic device [1–13], and affective decoders estimate mood to
adjust electrical stimulation in neuropsychiatric disorders [21, 22].
Many of these decoders are run on trial-based neural activity that is aligned to a known event,
and therefore implicitly require knowledge of task-related event times. Such events include the
start of a task trial, stimulus onset, movement onset, or the time of a user’s response. In realistic
applications, however, it is possible that these event times will not be known to the decoder, and
that event-aligning is not possible. As such, a method to detect events from neural activity is
needed in order to enable BCI decoders in situations when event times are unknown. In chapters 2
and 3, we develop methods to detect events from various modalities of neural activity. Based on
parametric models of neural activity, we derive real-time maximum-likelihood estimators for event
times and classes, and show that these estimators can successfully detect and classify events from
both simulated and real neural activity.
2
1.3 Cognitive State Decoding
Next, we investigate a specific cognitive state, decision confidence, and evaluate its viability for
use in a performance-improving BCI. Prior studies have shown that when making decisions, humans can reliably evaluate how likely they are to be correct [23–25]. If this subjective confidence
can be reliably decoded from brain activity, it would be possible to build a BCI that improves decision accuracy by automatically providing more information to the user if needed based on their
confidence. But this possibility depends on whether confidence can be decoded right after stimulus
presentation and before the response so that a corrective action can be taken in time. Although prior
work has shown that decision confidence is represented in brain signals, it is unclear if the representation is stimulus-locked [26–28] or response-locked [23, 29, 30], and whether stimulus-locked
pre-response decoding is sufficiently accurate for enabling such a BCI.
In chapter 4, we investigate the neural correlates of confidence by collecting high-density EEG
during a perceptual decision task with realistic stimuli. Importantly, we design our task to include
a post-stimulus gap that prevents the confounding of stimulus-locked activity by response-locked
activity and vice versa, and then compare with a task without this gap. We perform event-related
potential (ERP) and source-localization analyses. Our analyses suggest that the neural correlates
of confidence are stimulus-locked, and that an absence of a post-stimulus gap could cause these
correlates to incorrectly appear as response-locked. By preventing response-locked activity from
confounding stimulus-locked activity, we then show that confidence can be reliably decoded from
single-trial stimulus-locked pre-response EEG alone. We also identify a high-performance classification algorithm by comparing a battery of algorithms. Lastly, we design a simulated BCI
framework to show that the EEG classification is accurate enough to build a BCI and that the decoded confidence could be used to improve decision making performance, particularly when the
task difficulty and cost of errors are high. Our results show the feasibility of non-invasive EEGbased BCIs to improve human decision making.
3
Chapter 2
Event Detection and Decoding from Spiking Neural Activity
Neurons encode information by modulating their firing rates in response to stimuli and other behavioral processes. This dependence is characterized by the spatial response field (SRF), which
describes the region of sensory or behavioral space in which a stimulus causes a neuron to fire. For
example, motor neurons have SRFs that represent a preferred direction of movement [31], while
neurons in the visual pathway have SRFs that represent regions of the visual field to which the neuron is sensitive [32]. Some neurons also exhibit a transient response to the onset of a behavioral
or stimulus event. This means that the activity of these neurons depends not only on the spatial
location of behavior or stimulus, but also on the time that has elapsed since the occurrence of the
behavior or stimulus. In other words, the neural activity will follow a temporal pattern relative
to the behavior/stimulus event time—i.e., the time at which the event occurs—, regardless of how
long the stimulus or behavior persists. This dependence on time is characterized by the temporal response field (TRF) [32, 33]. Neurons that exhibit TRFs include neurons in the auditory cortex [34,
35], saccade-sensitive neurons in the frontal eye fields [36], and visually sensitive neurons in the
superior colliculus (SC) [37, 38]. Together, these neurons are described by the spatio-temporal
response field (STRF) [32, 34, 35, 39]. In the case where a neuron exhibits the same transient
response for all spatial locations of a stimulus or behavior, it can be modeled as having a purely
temporal receptive field, with no spatial component [36].
Existing decoders, such as Kalman filters or the point process filter (PPF), have been designed
for neurons with SRFs. These decoders use parameterized representations of the SRF to estimate
4
underlying stimulus or behavioral states (e.g., movement direction) from neural activity in real
time. For example, the Kalman filter models the firing rate as a linear function of the state (e.g.,
[40, 41]). The PPF, on the other hand, represents spike trains as a binary-valued 0-1 time series—
with a 1 representing a spike and a 0 representing the absence thereof—that is modeled as a point
process whose instantaneous firing rate is a log-linear function of the state (e.g., the direction of
intended reach) [36, 42–49]. These decoders, however, have not yet been extended for use with
neurons that also exhibit TRFs. One approach to representing STRFs is to describe the spiking
activity as a point process whose instantaneous rate is a function both of the stimulus/behavioral
state and of the time at which they occur. Prior studies have shown that these point process models
are significantly predictive of the spiking activity of neurons with STRFs [33, 36]. However, a
decoder for these STRF point process models has not yet been developed.
Developing a decoder for neurons with STRFs is challenging due to the transient nature of
their firing rate modulation, which necessitates both detecting the stimulus/behavioral event by
estimating the time at which it occurs (i.e., event times) and decoding or classifying these events
from spikes. Indeed, any decoder for neurons with STRFs, whether for discrete or continuous
state estimation, would need to first estimate the event times. Prior studies that categorize discrete
stimulus or behavioral events using discrete classifiers such as support vector machines (SVM)
have required that data be trial- or stimulus-aligned in order to be classified, i.e., they assumed
knowledge of the event times [50–52]. In real-time applications, the event times may be unknown
ahead of time, and these methods cannot be used without augmentation. Because of this constraint,
even for discrete classification purposes and without using the STRF point process model, an
algorithm that can detect events by estimating event times is needed.
Prior studies have developed decoders that successfully differentiate phases of neural activity,
e.g., movement planning and execution phases, using a hidden Markov model (HMM) [53, 54].
This model, however, is designed for neural activity that is stationary during each phase. Therefore,
it does not aim to detect events that are represented transiently in neural activity as is the case with
STRFs. Prior studies have also developed important non-parametric methods for the detection of
5
Figure 2.1: Data processing pipeline for behavioral/stimulus event detection and decoding from
neural activity. The PPMF is used to detect the events and estimate their times, and then a decoder
such as the PPF is used to decode the brain state. First, the STRF model parameters are estimated and the decoding algorithm is trained using labeled training data for which the brain state is
known. At test time, unlabeled spike data is passed through the PPMF in order to estimate stimulus/behavioral event times. These event times are then used by the decoder to decode brain states.
In this figure, the brain states being decoded are visual saliency maps, which identify regions of
the visual field that stand out from their surroundings.
events using the spectrogram of a neural signal, such as a single-unit firing rate signal or a local
field potential (LFP) signal, when brain states are discrete-valued [55]. However, this method does
not aim to perform event detection for point process models of population spiking activity. Because
the point-process models of neurons with STRFs in [36] have been shown to be highly predictive
of spiking patterns, it is important to develop novel detectors and decoders that are designed for
STRF point process models. Further, it is important to also enable event detection when brain states
are continuous-valued (e.g., saliency maps in response to visual stimuli). Thus, a new method is
needed to integrate event time information into a spatial decoder for continuous-valued brain states.
Finally, for detection and decoding reliability, it is essential to design such a method to estimate the
onset times of stimulus or behavioral events and decode these events by aggregating information
across a neuronal population.
In this chapter, we develop a novel method that takes advantage of the temporal structure of
neural responses to detect events and estimate event times, and to decode brain states from measured population spiking activity (figure 2.1). This method, termed the point-process matched filter
6
(PPMF), uses a parameterized point process model of the TRF to compute the maximum likelihood estimate of event times. We show that the maximum-likelihood solution requires applying
a linear filter, which is matched to the TRF, directly to the measured binary-valued spike trains.
The peaks in the output of the matched filter correspond to the maximum-likelihood estimate of
the event times. Once event times have been estimated, we then perform brain-state decoding by
developing a PPF for the STRF model in [36]. The PPMF can estimate event times in real-time,
and is therefore suitable for use in real-time BCIs.
We validate our method using both extensive numerical simulations and real neural recordings.
We first apply the PPMF to detect visual stimuli (i.e., estimate their onset times) and to decode
saliency maps from simulated spikes of neurons representing visual saliency. These neurons have
been shown to exhibit both spatial selectivity and transient responses to stimuli [37, 38, 56, 57].
We then evaluate the method on spiking activity, recorded from the prefrontal cortex (PFC) of
a macaque monkey performing a delayed saccade task, on its ability to detect the saccades (i.e.,
estimate their onset times) and decode the saccade direction. We show that our method successfully detects the events and decodes the brain states – both continuous saliency maps and discrete
saccade directions – even though the event times are unknown. This method offers new opportunities for BCIs that decode brain states represented in neurons with STRFs, and for optimal event
detection from population spiking activity.
2.1 Methods
We first present a point-process model of how information is encoded in neural spiking activity
through the STRF. We then derive the PPMF from this model to detect the events and decode
the brain states. Moreover, we describe how the parameters of the point-process model can be
estimated, and how the PPMF can be used with other decoding algorithms. Figure 2.1 shows the
complete data-processing pipeline involved in applying the PPMF.
7
2.1.1 Point Process Model
The spiking activity of neurons can be described as a binary time series of 0’s and 1’s, with a
1 indicating a spike and a 0 indicating the absence of a spike. Such binary time series can be
modeled as point processes [58]. Point processes are characterized by the conditional intensity
function (CIF), which models the instantaneous firing rate of a neuron at any given time as a
function of external and internal covariates [44].
If we model the logarithm of the CIF or firing rate λ(t) as a linear function of the covariates, the point process model can be analyzed within the Poisson generalized linear model (GLM)
framework. GLMs are an extension of linear regression models that allow a linear combination of
regressors to be related to a response variable through an appropriate nonlinear function. Poisson
GLMs in particular have proven useful in modeling the dependence of spike trains on external
covariates, such as stimuli or behavior [42–46, 48, 49, 59]. In a manner similar to [36], we model
the dependence of the firing rate on stimuli through an STRF as
λ(t) = exp
α + ft(T)fs(S)
. (2.1)
Here, ft(·) is the TRF, fs(·) is the SRF, T is the set of stimulus onset times, S is the brain state, and
α characterizes the baseline firing rate. Here ‘brain state’ can refer to either a perceived stimulus
or a planned behavior. Note that the brain state S can vary with time, but we refer to it as S instead
of S(t) for notational simplicity.
For a neuron with a temporal response whose amplitude is a function of the brain state value
(e.g., stimulus location), we can write the TRF term as
ft(T) = ∑
ts∈T
r(t −ts) (2.2)
and the SRF term as
fs(S) = φ
T
S (2.3)
where φ is the SRF parameter vector to be estimated in model training and r(·) specifies the shape
of the temporal response for the neuron (again to be estimated in model training). Note that (2.2)
involves a sum over a finite set T of stimulus times ts
, each of which is continuous-valued. If the
brain state S has dimension n, then both S and the SRF parameter vector φ are length-n vectors.
Plugging (2.2) and (2.3) into (2.1), we get
logλ(t) = α +
∑
ts∈T
r(t −ts)
φ
T
S. (2.4)
The interpretation of this CIF is that a stimulus S at time ts ∈ T will elicit a transient response
proportional to r(t), with an amplitude modulated by φ
T S. If we model an observed spike train as
having some firing rate λ(t) and using the inhomogeneous Poisson model [60], the likelihood of
an observed spike train with spike times {ti}i=1:M = {t0,...,tM} is given by
p({ti}i=1:M) = e
−Q ∏
M
i=1 λ(ti) (2.5)
where Q =
R τ
0
λ(t)dt, and τ is the total duration of the spike train. Note that this is a product over
a set of M spike times ti
, each of which is continuous-valued. Together, (2.4) and (2.5) characterize
our STRF point process model.
As motivating examples for the PPMF, we use visual saliency and saccadic eye movements.
Visual saliency measures the ability of an object to stand out from its surroundings in a visual
scene [61], and saccades are rapid eye movements between fixation points. In figure 2.2 we show
how the general model in (2.4) can be specialized for neurons that encode saliency and saccades,
respectively. For saliency-sensitive neurons (figure 2.2A), φ represents the region of the visual
field that the neuron is sensitive to, while for saccade-sensitive neurons (figure 2.2B), the SRF φ
represents a preferred direction of eye movement.
9
Figure 2.2: Firing rate models for neurons that encode saliency (left) and saccades (right). The
TRF consists of a transient response r(t) every time there is an event. The set of event times is
denoted by T. The SRF characterizes the sensitivity of a neuron to the brain state. (A) In the
case of saliency-tuned neurons, the SRF is a region of the visual field. In the example saliencySRF above, the neuron is sensitive to a circular region of space in the top left corner of the visual
field - this is represented by a sensitivity map, where high values (white) correspond to regions
of space that the neuron is sensitive to, and low values (black) indicate regions that the neuron is
not sensitive to. The columns of this sensitivity map are concatenated to obtain φ. Similarly, the
columns of the saliency map are concatenated to obtain the brain state vector S. (B) In the case
of saccade-tuned neurons, the SRF is parameterized by a preferred direction. Specifically, φ is a
length-2 vector of the form [cos(θp) sin(θp)], where θp is the preferred direction. Similarly, the
brain state is a length-2 vector of the form [cos(θ) sin(θ)], where θ is the saccade angle.
2.1.2 Estimating Event Times: The Matched Filter
Based on our point process model of neural activity in (2.4) and (2.5), we develop a method to estimate event times ts from observed population spike trains. To do so, we use maximum-likelihood
(ML) estimation, which has been successful in finding various parameters of point process and
Poisson models, such as the influence of covariates or delays [44, 60]. Here, we derive the ML
estimate of event times, which maximize the likelihood of the observed population spike trains
under the model in (2.4) and (2.5). This derivation is done entirely in continuous time.
For the derivation, we first consider the case in which we have spike observations from a single
neuron on the interval [0, τ]. Suppose that a single event occurs at an unknown time ts ∈ [0, τ].
If we assume that the support of the transient response r(t) is shorter than the interval between
events, then the result of the derivation for a single event also applies to multiple events (see more
below). For a single event, we can write the CIF in (2.4) as a function of ts
:
λ(t,ts) = exp(α +r(t −ts)φ
T
S). (2.6)
10
Figure 2.3: Q(ts), the area under the firing rate curve, does not change with ts
, as long as ts
is not
near the edges of our measurement window. Here, Q(t1), Q(t2), and Q(t3) are all equal.
This change of notation is done because we wish to maximize the likelihood of our data over all
possible values of ts
. Substituting this into our point-process likelihood function in (2.5), we get
p({ti}i=1:M) = e
−Q(ts) ∏
M
i=1 λ(ti
,ts) (2.7)
where Q(ts) = R τ
0
λ(t,ts)dt.
With our firing rate and likelihood models established, we can now formalize the event detection problem: we seek to find the time ts
that maximizes the likelihood function (2.7). This can be
written as the following optimization problem:
tˆs = argmax
ts
log p({ti}i=1:M) (2.8a)
= argmax
ts
∑
M
i=1
log
λ(ti
,ts)
−Q(ts) (2.8b)
= argmax
ts
∑
M
i=1α +r(ti −ts)φ
T S−Q(ts). (2.8c)
If we assume that the support of r(t) is small compared to τ, then Q(ts) does not vary with ts
.
This is because delaying λ(t) by ts does not change the value of its integral over the duration τ as
1
illustrated in figure 2.3, barring edge cases where ts
is close to 0 or τ. With this assumption, we
can remove Q from the optimization. We now define u(t) as the observed spike train given by:
u(t) =
M
∑
i=1
δ(t −ti). (2.9)
Here, δ is the Dirac Delta function. This is an impulse train where each impulse corresponds to an
observed spike. Making use of this representation of the spike train, we can further simplify our
optimization:
tˆs = argmax
ts
∑
M
i=1
r(ti −ts)φ
T S (2.10a)
= argmax
ts
∑
M
i=1
r(ti −ts) (2.10b)
= argmax
ts
R ∞
t=−∞
u(t)r(t −ts)dt (2.10c)
= argmax
ts
u(ts) ∗ r(−ts). (2.10d)
Here ∗ represents convolution and thus applying a linear filter matched to the transient response
r(t). Thus the ML estimate of the event time ts can be obtained by passing the observed spike train
through this filter, and finding the time where the output signal peaks. Note that in simplifying from
(2.10a) to (2.10b), we assume that the neuron increases its firing rate when stimuli are presented
(φ
T S nonnegative). If instead the neuron decreases its firing rate in response to stimuli (φ
T S
nonpositive), then either the argmax becomes an argmin or alternatively r(t) can be multiplied by
-1 in order for the operation to remain an argmax. In section 2.2 (Results), we confirm that all
responses are indeed nonnegative in our PFC neuronal dataset.
The result can be extended to multiple events. In particular, if the support of the transient
response r(t) is shorter than the interval between events, then the same matched filter can be run
continuously to detect multiple peaks in the output signal, corresponding to multiple events. This
is because there will be no overlap between the transient responses to consecutive events. Thus the
same derivation applies to multiple events (see section 2.1.3).
12
The result can also be extended to spike trains from multiple neurons, i.e., population spike
trains. For a population of neurons, the likelihood function becomes
p({ti,c}i=1:M,c=1:C) =
C
∏c=1
e
−Qc(ts)
M
∏
i=1
λc(ti
,ts). (2.11)
Here the subscript c is the neuron index, C is the total number of neurons, and ti,c is the time of the
ith spike from the cth neuron. Note that as in prior work [42–44, 46, 62–67], the above likelihood
function assumes that the activity of neurons is conditionally independent conditioned on the brain
state.
Performing the same steps as for a single neuron, we arrive at our maximum likelihood estimator:
tˆs = argmax
ts
C
∑
c=1
φ
T
c S
uc(ts) ∗ rc(−ts)
. (2.12)
Here, terms with the subscript c are specific to each neuron – φc is the cth neuron’s SRF parameter
vector, uc(t) is the spike train observed from the cth neuron, and rc(t) is the transient response of
the cth neuron. Thus (2.12) is a sum of matched filters, weighted by each neuron’s spatial response
φ
T
c S. The spatial responses are in turn a function of the stimulus S, which will be unknown when
performing event detection and event time estimation. However, since all neurons respond to the
same event, by definition, each individual matched filtered signal uc(ts)∗rc(−ts) peaks at the same
time because each peak corresponds to the same event onset time. More specifically, the transient
responses rc(t) are aligned with the event onset. Although the firing rates of individual neurons
may peak at different times relative to the event, this information is captured in each rc(t). Thus
once the matched filter is applied, peaks in each output signal uc(ts) ∗ rc(−ts) corresponds to the
same event time, not peaks in firing rate. Thus, our estimate will not depend on any of the φ
T
c S
terms, so we can remove them. We thus obtain the PPMF as:
tˆs = argmax
ts
C
∑
c=1
uc(ts) ∗ rc(−ts)
. (2.13)
Finally, note that if all rc(t)’s are zero outside of the interval [t1,t2], where t = 0 indicates the
stimulus/behavioral event time, then we should introduce a time-shift of t2 seconds in the matched
filter kernels rc(−t) to compute the convolution causally. Here, t1 and t2 are properties of the neural
population we are measuring. If all neurons complete their transient responses before the stimulus
or behavioral event occurs, then t2 will be negative. Consequently, the event time estimate would
be known before the occurrence of the event. If t2 is positive, we would have a delay of t2 in the
output signal – the peaks in the matched filter will occur t2 seconds after the event. However, even
in this case, if t2 is sufficiently small, then we can perform suitable event detection in real-time
BCI settings. For example, for the saccade-sensitive neurons that we measured, t2 was .5 seconds.
The saliency-sensitive neurons studied in [38] and [56] have a t2 that varies from around .2 to .5
seconds. Finally, note that this t2 delay is due to a fundamental property of the temporal response
of neurons with STRFs, and not due to the algorithm design – we need to observe the duration of
the neuronal response before we can know that an event has occurred.
2.1.3 Threshold Selection
In practice, multiple stimuli may be presented during a recording session. This means that instead
of searching for a single global maximum in the PPMF output corresponding to a single stimulus,
we must find multiple local maxima corresponding to multiple stimuli. If we assume that no more
than one stimulus can occur within a window of size TS, then we can declare that a stimulus has
occurred at points where the PPMF output is above a threshold K, and is the local maximum
in a window of TS seconds. TS can be chosen based on the duration of r(t), or based on prior
knowledge of how often events can occur. The selection of the threshold K is an important factor
in the PPMF’s performance, for which we must devise a principled approach.
We wish to select the lowest threshold such that baseline neural activity (parameterized by α
in (2.1)) does not cause the filter to surpass the threshold. We consider the case in which we have
a single neuron that fires at λ0 Hz at baseline, and our matched filter kernel r(t) is a truncated
Gaussian with full-width at half-maximum (FWHM) W, and maximum value H. The FWHM is
14
the distance between the two points on a curve that achieve half the curve’s maximum value, and
is a commonly-used measure for the ‘width’ of pulse-like waveforms. For a Gaussian curve, the
FWHM is given by 2.355σ, where σ is the standard deviation of the Gaussian [68]. Note that the
matched filter kernels can be selected from any other set of basis functions or function dictionaries,
and our method is not specific to the Gaussian kernels. We use the Gaussian kernel for illustrative
purposes and because they have been shown successful in capturing the shape of the transient
response in neurons with STRF (see also below).
If we pass the baseline activity of our single neuron through this Gaussian filter, the mean
value of the output will be roughly µ = HWλ0. To arrive at this approximation, we notice that the
Gaussian filter is essentially counting spikes in a window of length W, and scaling that value by
H. If r(t) was a rect function with width W and height H, this approximation would be exact. The
standard deviation of the output can be approximated as σ = H
p
Wλ0, since the standard deviation
of the number of spikes in W units of time is p
Wλ0 under a homogeneous Poisson distribution.
By the linearity of convolution, this approximation can be easily generalized to a kernel r(t)
that is a weighted sum of truncated Gaussians, and to a sum over multiple neurons. For C neurons
and b Gaussian basis functions, these approximations are given by
µ =
C
∑
c=1
b
∑
i=1
Hc,iWλc (2.14a)
σ =
vuut
C
∑
c=1
b
∑
i=1
H2
c,iWλc . (2.14b)
Here Hc,i
is the height of the ith basis function for the cth neuron, and we assume that all b basis
functions have width W. Given these analytical expressions for the mean and standard deviation
of the matched filter output during baseline neural activity, a threshold can be chosen based on any
desired confidence bound under a Gaussian distribution, i.e., to ensure that the probability of false
detection is below a desired upper-bound.
Note that while we have derived these threshold equations for an r(t) that is a weighted sum of
truncated Gaussian functions, any other function dictionary or basis set can be used to parameterize
15
r(t). We chose to use Gaussian basis functions in this work due to their previous success in [36],
and due to their ability to describe the peaky shape of the TRFs in our dataset. With suitably
defined measures of width and height (i.e., just as FWHM and max value provided these measures
for Gaussian functions), the threshold equations (2.14a) and (2.14b) can be used as-is for any other
function dictionary.
2.1.4 GLM Maximum Likelihood Parameter Estimation
Our model of neural activity in (2.4) is parameterized by the baseline firing rate α, the transient
response r(t), and the SRF φ. If the brain state S(t) is a length-n vector, then φ is a vector of the
same size. α is a scalar value, and r(t) can be any continuous function with finite support. In order
to use the PPMF, we must estimate these parameters.
Neural parameters in point-process models are typically estimated by maximizing the pointprocess likelihood of a training dataset over the parameters. This can be done efficiently for GLMs
by numerical methods such as the well-known iteratively reweighted least squares [44]. However,
these methods require that our log CIF (2.4) is linear in the parameters. We must therefore rewrite
our model in a manner that is suitable for parameter estimation. In [36], an STRF firing-rate model
is written in terms of linear and bilinear forms. Here, we write a similar form, starting from our
model in (2.4).
While the SRF φ is already in a parametric form, the transient response r(t) is not, and cannot
be estimated as-is. To solve this problem, we can represent the transient response as a weighted
sum of b basis functions:
r(t) =
b
∑
i=1
wi
fi(t). (2.15)
With this representation, the TRF is completely specified by w1,...,wb. Next, we define a
signal di(t) that represents both the (spatial) value of the behavior/stimulus S and the timings of
behavior/stimulus events ts
:
16
di(t) = ∑
ts∈T
δ(t −ts)Si(ts) (2.16)
where Si(t) is the ith element of the brain-state vector at time t. Using the basis-function decomposition of r(t) in (2.15) and the brain-state signal d(t) in (2.16), the term
∑ts∈T
r(t −ts)
S(t) in
(2.4) can be rewritten as a matrix multiplication:
∑
ts∈T
r(t −ts)
S
T
(t)
=
∑
ts∈T
r(t −ts)
S1(t), ... ,
∑
ts∈T
r(t −ts)
Sn(t)
=
r(t) ∗ d1(t), ... , r(t) ∗ dn(t)
=
b
∑
i=1
wi
fi(t) ∗ d1(t), ... ,
b
∑
i=1
wi
fi(t) ∗ dn(t)
= [w1 ... wn]
f1 ∗ d1(t) ... f1 ∗ dn(t)
.
.
.
.
.
.
.
.
.
fb ∗ d1(t) ... fb ∗ dn(t)
= w
TX(t).
(2.17)
Here, ∗ represents convolution, w is a vector of the basis function weights w1 ...wb, which we
refer to as the TRF parameter vector, and the matrix containing the convolutions is denoted by
X(t). Substituting this into (2.4), we obtain a generalized bilinear model, similar to that described
in [36] :
exp(α +w
TX(t)φ). (2.18)
X(t) is specified by the stimulus values, onset times, and choice of basis functions, all of which
are known for the training dataset. w and φ are parameter vectors that represent the TRF and
SRF, respectively, and must be estimated from the training dataset. Estimation of these parameter
vectors can be performed via maximum likelihood estimation and coordinate ascent, as described
in [36]. Briefly, this involves iteratively performing the following two steps: 1) fixing the TRF
parameter vector w and estimating the baseline parameter α and SRF parameter vector φ, and
2) fixing the SRF parameter vector and estimating the baseline parameter and the TRF parameter
vector. Coordinate ascent is necessary because (2.18) is a generalized bilinear model, but becomes
a generalized linear model if we fix either the TRF or SRF parameter vectors. In order to avoid
overfitting when the number of parameters is large, it is possible to use regularization in both of
the coordinate ascent steps, such as L1 Lasso regularization [69].
2.1.5 Parameter Estimation for High-Dimensional STRFs
Maximum likelihood GLM parameter estimation provides a mathematically principled method to
estimate STRF parameter vectors from training data. In cases where the brain state S and SRF
vector φ are high-dimensional, however, prohibitively large amounts of data are required to obtain
good parameter estimates. One such case is the decoding of visual saliency maps. If we wish to
decode a 9×16 pixel saliency map, then both the brain state and the SRF vector φ (see figure 2.2A)
would have a dimension of 144. With such a large brain-state size, the above method for parameter
estimation is impractical.
In such a case, an experimental receptive-field (RF) mapping procedure, described in [70], can
be used instead. The procedure consists of displaying a grid of point stimuli, one point at a time.
For each point, the maximum firing rate of each neuron is recorded. This allows us to construct a
map of firing rates for each point in the grid, for each neuron. If the grid of points is sufficiently
dense, this firing rate map is very much like our desired SRF vector - a map indicating which
regions of space the neuron is sensitive to. If we wanted our SRF to be 144-dimensional, then we
would need 144 point stimuli. The final step in turning this map into an SRF parameter vector is
to rescale it to a fixed range, such as [0, 1]. This allows us to dissociate the region of space the
neuron is sensitive to from the intensity of its response, which is characterized by the TRF vector.
Values below some threshold can be set to zero in order to eliminate baseline firing rate effects.
18
Once this procedure is complete, we can vectorize our normalized firing rate maps to obtain
the SRF vector φ in (2.18). The TRF vector w and baseline parameter α can then be estimated via
maximum likelihood estimation, as before.
2.1.6 Point Process Decoder for Neurons with STRF
Once the stimulus times have been estimated by the PPMF, we should design a point-process
decoding algorithm to estimate the brain state S (i.e., stimulus/behavior) based on the model in
(2.4). Here, we design a PPF to compute the minimum mean-squared error (MMSE) estimate
of S in real time for neurons with STRFs, where the brain state S is continuous-valued. We also
show how a maximum likelihood point process classifier, which categorizes S into one of n distinct
classes given a segment of trial-aligned data, can be used with the PPMF for discrete-valued brain
states [45].
In general, the PPF is a real-time decoding algorithm that can be thought of as a point-process
analogue of the Kalman Filter [43]. The PPF, designed for neurons with SRFs, has played a key
role in spike-train decoding [2, 42–45, 47–49, 65–67], and in BCI designs [46, 63]. However,
while PPFs have previously been designed for SRF point-process models, they have not yet been
designed for STRF models. Here, we show how to construct the PPF equations for our STRF
point-process model (2.4) and likelihood function (2.5).
The PPF is a recursive algorithm that consists of a prediction step and an update step. To write
the PPF, we bin the spikes in time bins of length ∆, which should be chosen small enough to at
most contain 1 spike (typically 1-5 ms). We denote the binary (0-1) value of the spike train from
neuron c at time t by Nc(t). The general prediction equations are given by [43]
St|t−1 = ASt−1|t−1
(2.19a)
Wt|t−1 = AWt−1|t−1A
T +Q (2.19b)
19
where St|t−1 and Wt|t−1 are the prediction mean and covariance of the brain state, respectively. A is
a state transition matrix that models how we expect S to evolve over time, and Q is the covariance
matrix of the state transition noise, which models the uncertainty of our prediction. The general
update equations are given by [43]
W−1
t|t = W−1
t|t−1 +
C
∑
c=1
∂ logλc
∂St
T
λc∆
∂ logλc
∂St
−
∂
2
logλc
∂St∂S
T
t
(Nc(t)−λc∆)
St=St|t−1
(2.20a)
St|t = St|t−1+
Wt|t
C
∑
c=1
∂ logλc
∂St
T
(Nc(t)−λc∆)
St=St|t−1
(2.20b)
where St|t and Wt|t denote the posterior mean and covariance of the brain state at time t, respectively, C is the number of neurons being observed, λc is the firing rate function of the cth neuron,
[·]a denotes the evaluation of the inside expression at value a, and for simplicity, we have denoted
S(t) by St
.
In order to find the PPF equations for the special case of our STRF model (2.4), we must
compute the derivatives of λ with respect to the brain state St
:
∂ logλc
∂St
T
= φc ∑
ts∈T
rc(t −ts) (2.21a)
∂
2
logλc
∂St∂S
T
t
= 0. (2.21b)
Here we use A = I, which is the n×n identity matrix and allows us to enforce continuity in the
evolution of the brain state. This corresponds to a random-walk state transition model and gives us
the following prediction equations:
St|t−1 = St−1|t−1
(2.22a)
Wt|t−1 = Wt−1|t−1 +Q. (2.22b)
20
Now plugging in the firing rate derivatives in (2.21) into (2.20a) and (2.20b), we arrive at our
update equations:
W−1
t|t = W−1
t|t−1 +∑
C
c=1
φc ∑ts∈T
rc(t −ts)
×λc(t)∆
φc ∑ts∈T
rc(t −ts)
T
St=St|t−1
(2.23a)
St|t = St|t−1 +Wt|t ∑
C
c=1
φc ∑ts∈T
rc(t −ts)
×(Nc(t)−λc(t)∆)
St=St|t−1
.
(2.23b)
Intuitively, the update step makes changes to its estimate of the brain state based on observed
spikes from neuron c when ∑ts∈T
rc(t −ts) is large. On the other hand, when ∑ts∈T
rc(t −ts) is close
to zero, the observed spikes are ignored, and the state is not updated. This corresponds to the fact
that when ∑ts∈T
rc(t −ts) is small, neuron c’s transient response to a behavioral/stimulus event has
completed, and it is no longer firing in response to the behavioral/stimulus state. It is important
to note that in both update equations, the event times ts must be known. That is why the matched
filter component of the PPMF must be used in order to estimate them.
In cases when the stimulus or behavioral state is discrete-valued, instead of the PPF, we design a
second decoder referred to as the Maximum Likelihood (ML) Point Process Classifier. This second
decoder builds a separate inhomogeneous Poisson model for each discrete class (e.g. saccade
direction), and classifies observed data by evaluating the likelihood under each of these models.
The data is classified as the discrete class whose associated model results in the greatest likelihood
for the observed spikes [45].
During training, we estimate the firing rate response λc,d(t) for each neuron c and discrete
class direction d. In our application, d is a saccade direction that can take one of eight possible values. There are several ways of estimating these firing rates as demonstrated in [45]. In
this work, we simply compute saccade-aligned peristimulus time histograms (PSTHs), using 50ms
non-overlapping windows, over all training trials of each specific class d [71]. We use the population spiking activity data in a 1-second window centered on the saccade time for classification.
Figure 2.4: Flowchart for event detection and decoding using the PPMF. Relevant sections and
equations are indicated for each step.
22
Given saccade-aligned data of an unknown class, the ML Point Process classifier computes the
point-process likelihood Ld as:
Ld =
C
∏c=1
t2
∏
t=t1
λ(c,d)
(t)∆
Nc(t)
e
−λc,d(t)∆
(2.24)
where C is the number of neurons, t1 and t2 are the start and end times of the data relative to the
saccade, ∆ is the sampling period, and Nc(t) is the binary spike train observed from neuron c.
This is a discrete-time approximation to the continuous-time point process likelihood as shown in
[44] and used extensively in [42–47, 59, 63–65, 72]. For consistency across both of our decoding
algorithms, we use it here as well. The classifier’s output dˆis the class with the highest likelihood:
dˆ= argmax
d
Ld . (2.25)
Figure 2.4 shows how all the above methods are used in conjunction to perform the overall task of
event detection and decoding.
2.1.7 Performance Measures
2.1.7.1 Event Detection
To measure the performance of our event detection algorithm, we use standard confusion-matrixbased metrics - precision, recall, and F-score [73]. These are metrics that are commonly used for
binary classification algorithms. Let FP be the number of false positives, T P be the number of true
2
positives, and P be the number of positive instances in the test set. Then, these metrics are defined
as
precision =
T P
T P+FP
(2.26)
recall =
T P
P
(2.27)
F-score =
2
precision−1 +recall−1
. (2.28)
In the context of this work, a positive instance is the occurrence of a behavioral/stimulus event.
We consider an event detection to be a true positive if it occurs within a 250 ms window of an
actual event, and a false positive otherwise. In addition to precision, recall and the F-score, we use
two metrics introduced in [55]: the attempt frequency (AF), and the null positive (NP) rate. These
are defined as
AF =
# detected events
total session duration
(2.29)
and
NP = AF∗ τ (2.30)
where τ is the duration of the detection window (here, 250 ms). The NP rate indicates the percentage of events that will be detected by chance, given that events occur at an equal rate throughout
the duration of the session. A comparison of the recall and NP rate allows us to judge how much
better than chance the detector is performing. For our simulated saliency dataset, we do this via
a binomial test, which tests the null hypothesis that our method is performing at the NP rate. For
our saccade dataset, we do this via a paired-t test between recall and NP-rate values across crossvalidation folds. This is equivalent to computing the difference between recall and NP rate, and
testing if this difference is significantly different from zero.
In the case of saccade detection, we additionally consider the per-direction recall (also referred
to as per-direction detection accuracy), which is simply the recall for all saccades of a specific
24
direction. In other words, this is the percentage of saccades of a particular direction that were
detected by the PPMF.
2.1.7.2 Saliency Decoding
In order to quantify the ability of the PPF to decode saliency maps, we use Pearson’s correlation
coefficient [74] between the decoded and the true saliency values. Correlation coefficients near
1 indicate that the decoded values are highly correlated with the true values, while values near 0
indicate that they are not.
2.1.7.3 Saccade Decoding
To measure the ability of the ML Point-Process Classifier to classify saccade directions, we use
per-direction classification accuracy:
accd =
# saccades correctly classified as direction d
# detected saccades of direction d
. (2.31)
This is the percentage of detected saccades of a particular direction that are correctly classified.
Note that saccades that were not detected do not contribute to the denominator. Because of this,
the per-direction classification accuracy is a measure of the classifier alone, and not of the event
detection method.
2.1.7.4 Neuronal Predictive Power
We assess the predictability of neuronal spikes with our estimated STRF point process models
using Receiver Operating Characteristic (ROC) curve analysis [73] . This method quantifies the
ability of a firing rate model to predict spikes, and is a standard tool used to assess the predictability of neural firing activity [72, 75]. The ROC plots the probability of true detection of spikes
against the probability of false detection, and the area under this curve (AUC) is used to define
the predictive power (PP). Specifically, PP = 2 ∗AUC −1. A PP value of 1 indicates perfect spike
25
prediction, while a PP value of 0 indicates that the model does no better than random chance at
predicting spikes.
2.2 Results
We validated the PPMF using both numerical simulations and a real neural dataset. In order to
demonstrate a variety of use cases, the simulated data involves neurons that encode visual saliency,
while the real dataset involves neurons that encode eye movement. Further, we use different decoders in the two cases - the PPF in the former, and the ML Point Process Classifier in the latter.
The PPF is used to decode continuous-valued saliency, while the ML Classifier is used to decode
discrete-valued saccade directions.
2.2.1 Simulated Data: Visual Saliency
Visual saliency is the quality of an object that makes it stand out from its surroundings [61]. A
saliency map is a topological map that indicates which regions of the visual field are salient [76]. It
is widely accepted that many if not all brain regions involved in visual processing contain representations of saliency [77] [78] [38]. Further, there exist several algorithms that can compute saliency
maps from input images [76, 79]. Neurons in the SC brain structure have been shown to exhibit
firing activity in response to salient stimuli in a manner that is consistent with these computational
saliency maps [38]. Further, they have been observed to have transient responses to the onset of
stimuli [37, 56, 57]. This indicates that SC neurons may be well-modeled as having STRFs.
Motivated by these studies, we model and simulate firing activity from neurons sensitive to
visual saliency. We collected 1132 images from several publicly available datasets (BSDS500,
CalTech101, ImgSal, MIT Saliency Benchmark) and computed their saliency maps using the Hypercomplex Fourier Transform (HFT) algorithm [79]. For consistency, all images were resized to
90×160 pixels. Saliency maps were blurred with a Gaussian kernel with a standard deviation of
26
16 pixels so that salient regions of the visual field were comparable in size to the size of simulated
SRFs. The effect of blurring on decoder performance is discussed in section 2.2.1.3.
We then computed population firing rates λc(t|S) for each neuron c in response to these saliency
maps using the model in figure 2.2A, where the brain state S is a vectorized saliency map, the SRF
φ is a vectorized spatial map that indicates a neuron’s sensitivity to each region of space, and the
term r(t) describes the transient response to the appearance of a visual stimulus. Since in [38] it
is shown that SC neurons increase their firing rates in response to visual stimuli, we choose φ
T S
to be nonnegative in our simulations. Once these firing rates were computed, binary spike trains
were generated from these rates using the time-rescaling theorem developed in [80].
The SRFs of our simulated neurons were chosen to be circular regions of space, as in figure
2.2, since prior research has shown superior colliculus receptive fields to be approximately circular
[57, 81]. We performed various simulations with different numbers, sizes, and layouts of SRFs as
described in the following sections. To parameterize the TRFs, we chose the basis functions fi(t)
in (2.15) to be 5 truncated Gaussians each with a standard deviation of 0.05 s, and with means
evenly spread out from 0.0 to +0.5 seconds (s) relative to the stimulus presentation. The weights
wi were randomly selected for each neuron, and scaled in order to have a maximum firing rate of
200 Hz, since SC neurons have been previously observed to have firing rates on the order of 200
Hz [82]. The baseline firing rate was set to 1 Hz for all neurons.
In order to test the PPMF’s ability to estimate event times and the PPF’s ability to decode
saliency maps, the following experiment was simulated: For each image in our compiled dataset,
we simulated a two-second-long trial in which the image presentation time was chosen uniformly at
random between .5 and 1.5 seconds. For each trial, we used randomly generated STRF parameters
to simulate spiking activity from a population of neurons. We used the PPMF to estimate the image
presentation times, and the PPF to decode saliency maps once presentation times were estimated.
27
Figure 2.5: Example of the PPMF and PPF being run on simulated saliency data. The PPMF
was used to detect image stimuli and estimate their presentation times, and the PPF was then
used to decode saliency maps. (A) Example image stimuli and decoded saliency maps. From left
to right, the columns correspond to input images, true saliency maps corresponding to the input
images (as computed by HFT), and PPF-decoded saliency maps. (B) An example SRF layout.
For each saliency map corresponding to an image stimulus, 50 circular SRFs with a radius of 25
pixels were placed uniformly at random throughout the visual field. Each circle corresponds to
the SRF of a single simulated neuron. (C) PPMF output on simulated spiking activity. Peaks in
the output correspond to estimates of stimulus onset time, and are marked by vertical dotted lines.
(D) Pearson’s correlation coefficient between decoded saliency values and true saliency values,
computed for each pixel across 1132 decoded saliency maps. (E) Histogram of timing estimation
errors, across 1132 detected stimuli.
28
2.2.1.1 PPMF Accurately Detects Stimulus Events.
We initially simulated the above experiment with 50 neurons (figure 2.5). Each neuron had a
circular SRF with a radius of 25 pixels, which was placed uniformly at random within the field
of view for each image. An example SRF layout is shown in figure 2.5B. This means that we
simulated different independent SRF layouts for the 50-neuron population. Using the true TRF
parameters, the matched filter successfully detected 100% of the image stimuli. PPMF detections
were considered to be true positives if they occurred within 250 ms of an actual image stimulus.
There were 1132 true positives and 39 false positives, resulting in a precision of .967 and a recall
of 1, and an F-score of .983. The NP rate was .259, indicating that if our method was detecting
events uniformly at random, we would expect a recall of .259 (chance level for recall). The recall
is significantly higher than the NP rate (p < 1e-324, binomial test), indicating that our method
performs significantly above chance level.
Figure 2.5E shows a histogram of time estimation errors for the true positives - these are the
differences between the true event times and the detected event times for all detections that were
within 250ms of an actual event. The PPMF was able to identify stimulus times very precisely -
96.6% of the true detections were within 20 ms of the true event time.
2.2.1.2 PPF Can Decode Saliency Maps.
Once the PPMF detected stimulus events, the PPF in (2.22a) – (2.23b) was used to decode a 9×16
pixel saliency map from the simulated neural activity. For this simulation, no parameter estimation
was done and the true values of the parameters (used for simulation) were also used for decoding;
we will discuss model estimation in section 2.2.1.5. The results of the simulation along with
example decoded saliency maps are shown in figure 2.5. In figure 2.5A, we can see that the salient
regions in the PPF-decoded saliency maps are in the same locations as the corresponding groundtruth saliency maps. As described in section 2.1.7, we use Pearson’s correlation coefficient to
quantify the PPF’s decoding performance. The correlation coefficient for each decoded region of
29
Figure 2.6: PPF decoding performance on simulated saliency map data as a function of neuron
count and blur factor. The correlation coefficient increases as both the neuron count and the blur
factor are increased. With fewer neurons, only coarse regions of saliency can be decoded well. As
the neuron count increases, increasingly detailed saliency maps can be decoded.
space across all 1132 stimuli is shown in figure 2.5D. The correlation coefficient across all pixels
and all saliency maps was .67.
We note that decoding performance decreases slightly at the boundary of the field of view, as
seen in figure 2.5D. Since points near the boundary are missing part of their surrounding, they
are on average covered by fewer receptive fields, which in turn results in decreased decoding
performance. The reduced SRF coverage toward the boundaries can be seen in the example SRF
in figure 2.5B.
This simulation shows that under ideal conditions, the PPMF can take advantage of TRF shapes
in order to accurately estimate event times. Further, the PPF derived for STRF neurons can be used
alongside the PPMF to decode visual saliency from simulated neural activity.
2.2.1.3 Number of Neurons vs Image Resolution.
As mentioned previously, input saliency maps were blurred with a Gaussian kernel so that salient
regions were comparable in size to the size of the neuron SRFs. We reasoned that if the saliency
30
maps contained small salient features and sharper edges, more neurons would be required in order
to have acceptable decoding performance. With a large enough number of neurons, each region of
the visual field would be covered by several SRFs, and small regions of overlap between multiple
SRFs would allow for the decoding of salient regions smaller than the SRF size.
In order to test this hypothesis, we simulated spiking data for neural populations of varying
sizes, and for different blur factors. Here, the blur factor is the standard deviation of the Gaussian
kernel used to perform blurring – a higher standard deviation corresponds to more blur. All neuron
SRFs were circles with a radius of 25 pixels and were placed uniformly at random throughout the
field of view for each image stimulus. The results of this analysis are shown in figure 2.6. As before, PPF decoding performance was measured via the correlation coefficient between the decoded
saliency maps and the true saliency maps. Each point on the surface is the correlation coefficient
computed across all pixels and all 1132 decoded saliency maps for a particular blur/neuron-count
pair. This correlation plot supports our hypothesis that performance increases as both the number
of neurons and the blurring factor increase. In essence, there is a correlation between the desired
level of detail in the decoded saliency maps, and the number of neurons required to achieve good
decoding performance for that level of detail: decoding more detail requires more neurons.
In order to show that a larger number of neurons can be used to achieve good performance with
higher-resolution (lower blur factor) saliency maps, we display the results of a simulation with 550
neurons and no blurring in figure 2.7A, B, showing that more detail can be decoded in this case.
2.2.1.4 Effect of SRF Arrangement.
In the above simulations, the simulated SRFs were circles of radius 25 pixels, distributed throughout the visual field uniformly at random for each image stimulus. SRFs in the superior colliculus
are generally round in shape, but SRFs near the center of the visual field (fovea) are smaller in
size than those near the periphery. Further, there is a higher density of SRFs near the center than
toward the periphery [57, 81, 83]. In order to simulate SRFs that are more biologically plausible,
we ran another simulation in which we chose the SRF centers at random according to a Gaussian
31
Figure 2.7: Decoder performance with 550 simulated neurons and no blurring, with uniform and
foveated SRF layouts. A) From left to right: Input image, saliency map of input image, decoded
saliency map with uniform SRF layout where SRFs are distributed throughout the visual field uniformly at random, and decoded saliency map for foveated SRF layout where SRFs are placed with
more density near the center of the visual field. B) Example SRF visualization and correlation
coefficient map for uniform SRF layout. C) Example SRF visualization and correlation coefficient map for foveated SRF layout. Note that for the foveated layout, the correlation coefficient
decreases toward the edges of the image. This is because of the increased SRF size and decreased
SRF density near the edges.
distribution centered at the center of the visual field for each image stimulus. We also chose the
SRF radii as a linear function of their distance from the center, ranging from 5 pixels at the center
to 20 pixels at maximum eccentricity. We refer to this layout as the foveated layout. Although
this is a simplification of how SRF sizes and density change with position, it allows us to illustrate
the idea that a more biologically plausible SRF arrangement would contain less information about
saliency in the periphery of the visual field. To compare this foveated layout with a uniform layout,
we also ran a simulation where neurons with radii of 5, 10, and 20 pixels were placed uniformly at
random throughout the visual field for each image stimulus.
The comparison of the uniform and foveated SRF layouts for 550 simulated neurons and no
saliency map blur is shown in figure 2.7. For the foveated arrangement, the performance decreases
toward the periphery due to the larger size and increased sparsity of receptive fields (figure 2.7C,
average correlation coefficient is 0.71). On the other hand, performance for the uniform layout
32
Figure 2.8: Saliency decoding with STRF parameter estimation. A) Estimated SRFs (right column)
compared to their true values (left column). B) Estimated TRFs (cyan) compared to their true values (purple). C) Correlation coefficients between PPF-decoded saliency values and true saliency
values when estimated parameters were used (right bar) vs when true parameters were used (left
bar). Correlation coefficients were computed across pixels for each decoded saliency map; error
bars are the standard deviation across all decoded saliency maps. Decoding with estimated parameters resulted in performance that was nearly as good as decoding with the true parameters.
is relatively constant throughout the field of view (figure 2.7B, average correlation coefficient is
0.82).
2.2.1.5 Estimated STRFs Match True Values.
In all the analyses until this point, the decoders have used the true STRF and baseline parameters (φ, w, and α) in order to estimate stimulus times and decode saliency maps. In real-world
applications, a parameter estimation procedure would be required to find these parameters.
In order to estimate saliency STRF parameters from simulated neural activity, we used the
RF mapping procedure described in [70], and above in section 2.1.5. We simulated the neural
responses to a 9 by 16 grid of 144 point stimuli – one point for each of the 144 pixels in the 9 ×
33
Figure 2.9: Experimental protocol for the delayed-saccade task. The task required the monkey to
make a saccade to a peripheral target after a ‘Go’ cue. The baseline period lasted for 500-800ms,
and the delay period lasted for 1000-1500 ms.
16 pixel decoded saliency map. The firing-rate maps constructed from this procedure were then
normalized to the range [0, 1], and values below .2 were zeroed out. This threshold was chosen
arbitrarily, and in practice would depend on the variability of each neuron’s response to stimuli
within its receptive field, and on each neuron’s baseline firing rate.
Once the SRFs were estimated, GLM maximum likelihood estimation [44] was used to fit the
TRF parameter vectors w and the baseline parameters α as described in section 2.1.4. The training
data for this procedure was obtained by simulating neural responses to circular stimuli placed
within the estimated SRF of each neuron. We simulated 10 such trials per neuron.
The results of the parameter estimation procedure are shown in figure 2.8. The simulation was
of 50 neurons with evenly distributed SRFs of radius 25 pixels, and saliency maps were blurred
with a 16-pixel Gaussian kernel before simulation. Estimated SRF and TRF parameters were
nearly identical to the true values, and decoding performance was also nearly identical when the
true parameters were used versus when the estimated parameters were used in the PPF.
2.2.2 Real Data: Saccade Detection and Decoding
In order to validate the PPMF on real neural activity, we used spike data collected from the prefrontal cortex (PFC) of a rhesus macaque monkey while it performed a delayed-saccade task. This
data has been previously reported in [84]. The task required the monkey to maintain its gaze on
a central point on a screen (baseline period), and then make a saccade to one of eight peripheral
targets after a ‘go’ cue. The experimental design is detailed in figure 2.9.
34
Figure 2.10: The directional and temporal tuning of 5 example neurons. PSTHs are plotted for
each neuron, for each of 8 saccade directions. The saccade time is indicated by a vertical dashed
line. Each row corresponds to a saccade direction, indicated by the arrows on the left. Each column
corresponds to a neuron. Light purple regions are 95% confidence intervals. Neurons have clear
transient responses that vary by direction.
A movable electrode array consisting of 32 electrodes (Gray Matter Research, USA) was placed
over the prearcuate gyrus of the lateral PFC to record neural activity. Electrodes were spaced
1.5 mm apart in the X and Y directions, and their depths were adjusted individually in order to
maximize recording quality. Raw neural signals were sampled at 30 kHz. In order to isolate singleunit activity, the raw data was pre-processed by high-pass filtering at 300 Hz and thresholding at
3.5 standard deviations below the signal mean. Spike-sorting was then performed via principal
component analysis and k-means clustering. Eye position was recorded using an infrared eyetracking system (ISCAN, USA) with a sampling rate of 120 Hz. Further details can be found
in [84].
35
We used the PPMF to estimate the timing of task-relevant saccades, and then used the ML
Point-Process Classifier to classify detected saccades to one of eight directions. In order to verify
that the recorded units were sensitive to saccades and exhibited TRFs, we computed saccadealigned PSTHs for each saccade direction. The histograms for 5 exemplary neurons are shown in
figure 2.10. Each row represents a saccade direction, each column represents a single unit, and
the dashed purple vertical line represents the saccade onset time. We can see that these neurons
exhibit transient responses to the saccade onset, and that the amplitude of this response varies per
saccade direction, i.e., they exhibit directional tuning. Together, these observations suggest that
these neurons exhibit STRFs.
We modeled the activity of the recorded neurons using the CIF in figure 2.2B. The brain state
is a length-2 vector S = [cos(θ) sin(θ)]T
, where θ is the saccade direction. Similarly the SRF
parameter vector is φ = [cos(θp) sin(θp)], where θp is the preferred direction of the neuron. As in
the saliency simulation, the transient response r(t) is represented as a weighted sum of truncated
Gaussians. The CIF was written parametrically as (2.18), and the parameters φ and w were estimated for each neuron via GLM maximum likelihood estimation [44]. Because (2.18) is a bilinear
form, the parameters were estimated iteratively via coordinate ascent, as described in section 2.1.4.
A subset of the data originally analyzed in [84] was selected for analysis based on the number
of saccade-tuned neurons that were present per recording session. Neurons were considered to be
untuned if either 1) the 95% confidence intervals of both SRF parameters contained zero or 2) The
95% confidence intervals of all 5 TRF parameters contained zero. These conditions indicate that
a neuron did not have significant spatial or temporal tuning. We chose to analyze days with > 5
significantly tuned neurons, which resulted in three days with 6334 total saccades being selected
for analysis. Although our firing rate model in (2.4) and our PPMF derivations allow for φ
T S to be
either non-positive or non-negative, we found that all the neurons exhibited non-negative transient
responses for all saccade directions, i.e., the firing rate either increased from baseline or remained
at baseline in response to stimuli. This can be seen for a subset of neurons in figure 2.10.
36
Figure 2.11: Estimated STRF model is significantly predictive of PFC neuronal spikes. A) Model
PP averaged across neurons for each day, compared to average chance-level PP. Chance-level PP
values were computed by retraining the models with shuffled saccade directions. 90% of neurons
used for analysis had PP values significantly greater than chance level (p < .01, 2-sample t-test).
Error bars are the SEM across neurons and cross-validation folds. B) Spike-Prediction ROC curves
for a representative set of neurons recorded on day 3. The top two ROC curves correspond to
neurons near the average value, while the bottom two ROC curves correspond to the two best-case
neurons. Each curve is labeled with its corresponding PP value. PP values closer to 1 indicate that
a model predicts spikes well, while values near 0 indicate that a model performs no better than
chance.
The data from the selected days consisted of 8 continuous recording sessions per day. In order
to test the ability of the PPMF to detect events in a continuous, un-epoched stream of data, we
performed leave-one-session-out cross validation by training our models on all the sessions except
one, and testing the PPMF and decoder on the session that was left out. This process was repeated
so that each session was used for testing once.
37
2.2.2.1 STRF Model is Significantly Predictive of Spikes.
As discussed in section 2.1.7, we measure the neuronal predictive power of our fitted models with
ROC-curve analysis (figure 2.11). In figure 2.11A, we compare the average PP across neurons
for each day to chance level. In figure 2.11B, we show the ROC curves and PP values for a
representative sample of neurons recorded on day 3.
To perform statistical analysis on the PP, we obtained the mean and variance of chance-level PP
by training and evaluating the point-process model on 100 random shuffles of the original dataset.
Specifically, we shuffled the labeled saccade direction for each trial, such that each trial would be
labeled with a random saccade direction after shuffling. For each randomly shuffled dataset, the
point-process model was re-trained, and the PP was computed. The mean and variance of the PP
was taken across the 100 shuffles.
Figure 2.11A compares the average PP with chance level for each day. Of the 20 neurons
selected for analysis, the trained models for 18 had PP values that were significantly higher than
chance (p < .01, 2-sample t-test). Averaged across neurons, the chance-level PP was .037, with a
standard deviation of .034. The average PP for the trained models was .26, with a standard deviation of .19 across neurons. This indicates that the learned STRF parameters are indeed predictive
of spiking activity.
2.2.2.2 PPMF Successfully Detects Saccade Events.
Using the PPMF with estimated TRF parameters (figure 2.12), we were able to successfully detect
70.46% of all task-relevant saccades (figure 2.12C, recall). This value is significantly higher (p <<
.01, paired t-test)than the NP rate of 26.7%, which is the percentage of saccades we would expect
to detect if estimated saccade times were chosen at random. As mentioned previously, a detected
saccade was considered to be a true positive if it occurred within 250ms of an actual saccade. The
performance measures for event detection defined in section 2.1.7, are shown in figure 2.12C.
38
Figure 2.12: PPMF successfully detects saccade events from PFC neuronal spiking activity. A) A
slice of the PPMF’s output (black trace), along with indicators for true (purple dashed vertical lines)
and estimated (cyan dashed vertical lines) saccade times. B) Saccade detection and classification
accuracies across directions. Performance varies by direction due to the small number of neurons.
C) Performance metrics for event detection. Error bars indicate the SEM across all 24 recording
sessions (8 per day for 3 days). D) Error histogram for estimated saccade times. The detection
window, and hence the maximum possible timing error for a true detection, is 250 ms.
In figure 2.12D, we show a histogram of time estimation errors for all detected saccades. The
error is narrowly concentrated, with 80% of detected saccades being within 150ms of a true saccade. Additionally, the histogram tapers off toward the edges, which indicates that small changes
to the detection window duration (250 ms) would not have much impact on detector performance.
2.2.2.3 ML Classifier Can Predict Saccade Direction.
A discrete ML Point-Process classifier was used to classify detected saccades into one of eight
possible directions. Spike data from a 1-second window around each detected saccade was used
as input to the classifier. The likelihood of the spike data was computed for each direction, and the
saccade was classified as the direction with the highest likelihood. The per-direction classification
accuracy of the ML classifier, along with the per-direction detection accuracy of the PPMF are
39
Figure 2.13: PPMF detection significantly improves with more (27) neurons in a combined dataset.
A) Saccade detection and classification accuracy for each saccade direction. B) Performance metrics for event detection. Error bars indicate the SEM across 5 cross-validation folds. C) Error histogram for estimated saccade times. The improvement in performance from the individual datasets
in figure 2.12 is due to the increased number of neurons in the combined dataset.
shown in figure 2.12B. The variation across directions is likely due to the fact that very few neurons
(5-7 per day) were used for saccade detection and classification, and the SRFs of these neurons do
not evenly cover the 8 possible directions.
2.2.2.4 Increasing the Number of Neurons Improves Detection and Decoding Performances.
In order to confirm that performance would increase with a larger population of neurons, we created
a synthetic dataset by combining data across days. Specifically, we combined saccade-aligned trial
data from trials from different days, but with the same saccade direction. This resulted in a combined dataset with 27 total neurons, and 828 total delayed-saccade trials. 5-fold cross-validation
was run on this combined dataset, and the results are summarized in figure 2.13. To prepare data
for testing, test trials were buffered with 1 second of empty space with no spikes, and were then
concatenated together. The PPMF and ML classifier were run on this concatenated data. The 1
second buffer was added in order to prevent firing rate discontinuities between trials from affecting
the PPMF. Performance on the combined dataset was drastically better than on any single day,
as expected. The precision improved significantly (p << .01, 2-sample t-test). The recall also
40
increased (p = .053, 2-sample t-test). Average classification accuracy improved to 94.6%. Additionally, the timing error curve became much tighter, with 80% of detected saccades being within
40 ms of a true saccade, as opposed to 150 ms in the single-day case.
2.3 Discussion
In this work we developed the PPMF, the first event detection algorithm based on a point process
model of neurons with STRFs [36]. We also designed a PPF for decoding neurons with STRFs.
By detecting events and estimating their time, the PPMF allows various decoders and classifiers to
be applied to neurons that exhibit TRFs even when event times are unknown. Indeed, without the
PPMF, decoders and discrete classifiers would require stimulus- or behavior-aligned data and thus
a-priori knowledge of the event times, which is not available in real-time applications. The PPMF
combined with the PPF can be run in real-time with a fixed delay, making them suitable for use in
neurotechnologies and BCIs.
The effectiveness of the PPMF was demonstrated with simulated saliency-tuned neural data
and with real saccade-tuned neural data recorded from a macaque monkey. In the former case, first
image stimuli were detected by PPMF and then the PPF was used to decode saliency maps with
both the true and the estimated STRF parameters. In the later case, we used coordinate ascent on
the Poisson likelihood function to estimate the parameters of a Generalized Bilinear Model from
neuronal spikes, and then used the PPMF to estimate saccade times with these parameters. An ML
Point Process Classifier was then used to decode saccade direction.
2.3.1 Comparison to Existing Methods
It is important to note that PPMF serves a different purpose compared with methods that aim to
differentiate various stationary phases of neural activity in a motor task as in [53] and [54]. For
example, in a motor task, these phases are the baseline before any planning, a planning phase prior
to movement, and movement phase. An HMM is used to model probabilistic transitions between
41
these phases, which are modeled as discrete states, and a homogeneous Poisson model is learned
for each state. Although this method is quite useful and can successfully detect transitions between
different phases of neural activity, it is designed for cases when neural activity can be divided into
phases of finite duration during which firing rates are not time-varying (i.e., are stationary). When
neurons exhibit TRFs, their response to a stimulus or behavioral event is transient, and therefore
not stationary. It is because of these transient responses that it is necessary to use a matched filter
to estimate event times, as performed by the PPMF.
It is also important to note that the PPMF provides a novel event detection algorithm for a point
process model of population spiking activity. Prior work in [55] presents an important method for
event detection, which is instead designed for the continuous firing rate signal of a neuron or an
LFP signal. This prior method is based on spectral analysis of the firing rate, referred to as the
cepstrum event detector (CED), and thus does not aim to do event detection for point process
models. For this reason, the PPMF and CED have several different functions. First, the PPMF
is designed for a parametric point process model of STRF neurons, because these point process
models have been shown to be highly predictive of neuronal activities that exhibit STRF [36]. Thus
it is important to also design event detection algorithms for point process models. Second, the CED
is designed to decode brain states that are discrete-valued by building a model for each possible
discrete brain-state value. The PPMF is instead designed to also decode continuous-valued brain
states such as saliency by performing event detection independently of the decoding method. Once
event times are estimated by the PPMF, discrete and continuous brain state decoders can then be
used. Third, the CED operates on spectrograms of firing rates while the PPMF operates directly on
spike trains; for this reason, the PPMF entails a detection window and latency that is equal to the
TRF response of the neurons and the CED latency is instead dictated by the amount of data needed
to compute a reliable spectrogram, e.g., 1 second in [55]. The latency consideration could become
important in some applications when the transient response occurs after the behavioral/stimulus
event and is shorter than 1 second. For example, transient responses in the superior colliculus
42
neurons that encode saliency have been shown to last less than .5 seconds [37, 38, 56]. Lastly, the
PPMF estimated the event times by aggregating information from a population of spiking neurons.
2.3.2 The Choice of the STRF Model
Point processes have been shown to be powerful models of neurons with STRFs in prior studies [36]. Thus developing principled event detection and decoding methods for these point process
models can benefit future neurotechnologies. Prior studies on point process decoding have largely
focused on decoding neurons without transient responses, i.e., those with SRFs alone (e.g., [42,
46, 62, 63, 66, 85, 86]). These prior studies have shown that point process decoders can enhance
BCI systems by increasing the adaptation, control and feedback rates and by providing an accurate
encoding model for binary spike events [46, 63]. This is because point process methods directly
model the binary spiking data on the fastest possible time scale unlike methods that require binning
and counting spikes.
Motivated by the benefit of point process decoders for neurons with SRFs as well as the demonstrated power of point process encoding models of neurons with STRFs, we developed the first
event detection algorithm for point process models, i.e., the PPMF. The PPMF enables real-time
decoding from point process models of neurons from any brain region that exhibits STRFs. We
also devised a PPF decoder for neurons with STRFs. The PPMF can be combined with this PPF
to enable continuous state decoding, or combined with various classification methods for discrete
state decoding from neurons with STRFs.
2.3.3 Future Directions
Implementing the PPMF in real-time BCIs is an important direction of future research. Specifically, the PPMF could enable the development of novel BCIs that take advantage of decoded brain
states in order to restore or augment cognitive function in users. For example, a saliency-based BCI
could decode a user’s covert focus of attention, and use this to present information to the user in a
way that minimizes reaction time. It may also be possible to predict shifts of attention before they
43
happen, which would further increase the utility of attentional decoding. Such a BCI could also
be extended to decode multisensory attention, combing information from visual regions, auditory
regions, and multisensory regions. Both visual and auditory regions exhibit TRFs [32, 34, 35, 39],
thereby making event detection a necessary step in developing such cognitive decoders.
Our event-detection and decoding methods are based on a rate-coding model of neural activity,
i.e., a model where neurons encode information by modulating their firing rates [31, 32, 36, 38]. An
alternative model of neural coding is that of temporal coding, where neurons encode information in
the precise timing of their spikes [87–90]. One possible future direction is to investigate whether
or how models of temporal coding can result in successful event detection algorithms.
In this work, we developed the PPMF for event detection and decoding from population spiking
activity. Due to advances in neurophysiological recording technology, however, it is now possible to simultaneously record multiple scales of neural activity, from small-scale spiking activity
of individual neurons to large-scale network activity measured through field signals such as local field potentials (LFPs) and electrocorticogram (ECoG) [47, 91–98]. Thus there is potential
in combining information across these scales to improve BCI performance [47, 94, 96–98]. Recently, multiscale decoders for neural signals with SRFs have been developed that model spike
and field data simultaneously while taking into account their different statistical profiles and time
scales [47]. These decoders can combine information across scales of activity to improve decoding performance compared to when using each scale in isolation [47, 94]. The success of these
methods suggests that developing multiscale methods for event detection and decoding when there
are transient responses in neural activity can enhance neurotechnologies. In particular, the PPMF
may also benefit from including information from neural scales besides from binary spikes, such
as LFPs and ECoG. A multiscale event detection algorithm could potentially enable the development of multiscale decoders that take advantage of transient responses in both the spike and LFP
domains. Extending the PPMF by exploring these directions and implementing them in real-time
BCIs are important future directions of our research.
44
Chapter 3
Event Detection and Decoding from Multimodal Neural
Activity
In chapter 2, we developed event detection and decoding methods for point-process signals. However, neuroscience experiments can simultaneously record multiple types of neural signals such as
local field potentials (LFPs), which can be modeled as Gaussian, along with neuronal spikes, which
can be modeled as point processes. Further, prior work has shown that motor and cognitive states
are represented across multiple spatiotemporal modalities of neural activity, and that BCI decoding
of these states can significantly benefit from using multimodal data [47, 91, 95, 96, 98–103]. Although event detection methods have been developed for each of these modalities in isolation [55,
60, 104, 105], no such method exists for multimodal time series containing a mixture of Gaussian
and point-process signals. As discussed previously, BCI decoders are often run on data where
task-relevant event times are known to the decoder, but in many applications it is possible that
these event times are unknown, and therefore event-locking cannot be performed during decoding.
In order to enable high-performance multimodal motor or cognitive BCIs when either event times
are unknown or tasks are not stereotyped, a method to simultaneously detect and decode events
from multimodal signals in real time is necessary.
The problem of event detection from multimodal time series is challenging because of the differences in statistical properties of Gaussian and point-process signals [102, 106, 107] and because
the events with unknown times can also have different yet unknown classes. In order to address
45
this challenge, we develop the multimodal event detector (MED), an algorithm which performs
event detection in multimodal time series with multiple unknown event classes. We do this by
deriving the maximum-likelihood estimator of simultaneous event times and classes from multimodal data. We first write a parametric likelihood model of Gaussian and point-process time series
that encode an event with an unknown time and class, learn the model parameters from data, and
then derive the estimate of the time and class that maximize this likelihood function. We validate
this method in simulations and in spike-LFP neural datasets recorded from a monkey performing a
rapid eye-movement (saccade) task, with the goal of detecting the times and directions of saccades.
In simulated and real data, we show that the MED can successfully detect and classify eye movements from multimodal spike-field data. We further show that the MED successfully integrates
information from both data modalities, with performance increasing as signal channels of either
modality are added.
3.1 Methods
We first describe our model of how event times and classes are encoded in multimodal time series,
then derive the MED as the maximum-likelihood estimator of both event times and classes.
3.1.1 Multimodal Model
Point process signals can be described as a time-series of 0’s and 1’s. These point processes can
be characterized by a conditional intensity function (CIF) that models the rate λ at which nonzero
values, or ’spikes’, occur as a function of external covariates. We model point processes via a
Poisson generalized linear model (GLM), where the logarithm of the CIF is a linear function of the
covariates. Poisson point-process models have been successfully used to model neuronal spiking
activity [42–44, 48, 49, 108]. Further, prior work has developed Poisson GLMs that explicitly
encode event times and classes, and showed that such models are a good fit to the spiking of
46
neurons that encode stimulus-related events [36]. However, such models have not yet been used to
develop multimodal detection and decoding algorithms, as we do here.
In the context of event detection, the covariates that we model are the event time ts ∈ [0,T]
and the event class s, where [0,T] is the interval on which the event can occur and s is a ’one-hot’
vector in R
S
, where S is the number of possible classes. In a manner similar to [36], we write the
point process CIF as
logλi(t) = ri(t −ts)φ
T
i
s+αi (3.1)
Here, φi ∈ R
S
and ri(t) are, respectively, the spatial and temporal responses of channel i, while
αi characterizes the baseline rate. The interpretation of this CIF is that an event at time ts will
evoke a temporal response ri(t) that is modulated by a spatial response φ
T
i
s. Given the CIF λi(t)
and assuming inhomogeneous Poisson statistics, the likelihood of observing a binary signal at
channel i with spikes at times {ti,m}m=1:Mi = {ti,0,...,ti,Mi
} is given by
p({ti,m}m=1:Mi
) = e
−Qi
Mi
∏m=1
λi(ti,m) (3.2)
where Qi =
R T
0
λi(t)dt, m is the spike time index, and Mi
is the total number of spikes observed
from channel i [109].
Similarly to our point process model, we model continuous channels as having spatial and
temporal responses to an event at time ts of class s:
y j(t) = rj(t −ts)φ
T
j
s+wj(t), wj(t) ∼ N (0,σ
2
j
) (3.3)
where wj(t) is Gaussian noise and σ
2
j
is the variance of this noise for channel j. The corresponding
likelihood is
p(y j(t)
t=1:T
) =
T
∏
t=1
1
q
2πσ2
j
exp−
y j(t)−rj(t −ts)φ
T
j
s
2
2σ
2
j
(3.4)
4
In order to form a joint model of Gaussian and Poisson channels, we assume that they are
conditionally independent, given the stimulus time ts and class s. With this assumption, the joint
likelihood for I Poisson channels and J Gaussian channels is
L (ts
,s) = p({ti,m} i=1:I
m=1:Mi
,{y j(t)} j=1:J
t=1:T
|ts
,s)
=
I
∏
i=1
p({ti,m} i=1:I
m=1:Mi
|ts
,s)×
J
∏
j=1
p({y j(t)} j=1:J
t=1:T
|ts
,s)
Substituting in our likelihood models from (3.2) and (3.4), we get
L (ts
,s) =
I
∏
i=1
e
−Qi
Mi
∏m=1
exp
ri(ti,m −ts)φ
T
i
s+αi
×
J
∏
j=1
T
∏
t=1
1
q
2πσ2
j
exp
−
y j(t)−rj(t −ts)φ
T
j
s
2
2σ
2
j
(3.5)
This provides a multimodal model of Gaussian and point-process signals that encode both event
times, via a temporal response, and event classes, via a spatial response. We note that prior work
has been done to model spatiotemporal responses for point process signals [36, 105], but the above
does so for multimodal signals to solve the unaddressed problem of simultaneous event detection
and classification from multimodal signals, as we do next.
3.1.2 Maximum Likelihood Estimate of Event Times and Classes
Using the models defined in section 3.1.1, we can now formalize the problem of simultaneous
event detection and classification as an optimization problem. Specifically, our goal is to find the
event time and class that maximize the log-likelihood of the observed multimodal data:
(tˆs
,sˆ) = argmax
ts,s
logL (ts
,s)
= argmax
ts,s
I
∑
i=1
−Qi(s) +
Mi
∑
m=1
ri(ti,m −ts)φ
T
i
s+αi
+
J
∑
j=1
T
∑
t=1
−
1
2
log(2πσ2
j
)−
1
2σ
2
j
(y j(t)−rj(t −ts)φ
T
j
s)
2
Removing additive terms that do not vary with ts or s, and therefore do not impact the argmax, this
simplifies to:
(tˆs
,sˆ) = argmax
ts,s
I
∑
i=1
−Qi(s) +
Mi
∑
m=1
ri(ti,m −ts)φ
T
i
s
+
J
∑
j=1
T
∑
t=1
1
σ
2
j
y j(t)rj(t −ts)φ
T
j
s−
1
2
rj(t −ts)φ
T
j
s
2
(3.6)
To further simplify this expression, we define ui(t) as the Poisson binary time series:
ui(t) =
Mi
∑
m=1
δ(t −ti,m) (3.7)
where δ(t) is the Dirac delta function. We can now re-write the following terms from (3.6) as
convolutions:
Mi
∑
m=1
ri(ti,m −ts)φ
T
i
s =
∞
∑
t=−∞
ui(t)ri(t −ts)φ
T
i
s
= φ
T
i
s
ui(ts) ∗ ri(−ts)
and
T
∑
t=1
y j(t)rj(t −ts)φ
T
j
s = φ
T
j
s
y j(ts) ∗ rj(−ts)
where ∗ represents convolution. Substituting these convolution expressions into (3.6), we get
(tˆs
,sˆ) = argmax
ts,s
I
∑
i=1
φ
T
i
s
ui(ts) ∗ ri(−ts)
−Qi(s)
+
J
∑
j=1
1
σ
2
j
φ
T
j
s
y j(ts) ∗ rj(−ts)
−
1
2
T
∑
t=1
rj(t −ts)φ
T
j
s
2
(3.8)
Finally, we note that if the support of r(t) is much smaller than T, then the term ∑
T
t=1
rj(t −ts)
does not vary with ts
. This is because delaying rj(t) by ts samples does not change the value of its
summation over the entire duration T, as long as the entire support of rj(t −ts) lies within [0,T].
Assuming that this is true, we can further simplify the second term in the sum over the J Gaussian
channels in (3.8):
(tˆs
,sˆ) = argmax
ts,s
I
∑
i=1
φ
T
i
s
ui(ts) ∗ ri(−ts)
−Qi(s)
+
J
∑
j=1
1
σ
2
j
φ
T
j
s
y j(ts) ∗ rj(−ts)
−
(φ
T
j
s)
2
2
T
∑
t=1
r
2
j
(t)
(3.9)
This is a sum of linear matched filters that are matched to the temporal responses r(t) and
scaled by the spatial responses φ
T
s. Point processing channels are vertically shifted by the term
Qi(s), while Gaussian channels are vertically shifted by (φ
T
j
s)
2
2 ∑
T
t=1
r
2
j
(t) and then scaled by their
inverse noise variance σ
2
j
. The shift terms set the baseline levels for each channel’s contribution
to the MED output. The inverse noise variance scaling for Gaussian channels means that noisier
channels have a smaller contribution to the MED output.
An additional challenge in multimodal integration when some modalities are discrete and some
are continuous is that of proper scaling of likelihoods. In particular, while the point process likelihood for discrete random variables provides a probability measure (i.e. probability of a number
spikes), the Gaussian likelihood for continuous random variables provides a density measure that
needs to be integrated over a range of values to provide a probability measure (in and of itself is not
a probability). This distinction between discrete and continuous modalities makes the multimodal
approach sensitive to scaling of the continuous signals (see details in section 3.3). To address
this challenge and additionally account for model mismatch in real datasets, we introduce a single
cross-modal scaling parameter k that weighs the contributions of the two data modalities in the
likelihood model:
(tˆs
,sˆ) = argmax
ts,s
I
∑
i=1
φ
T
i
s
ui(ts) ∗ ri(−ts)
−Qi(s)
+k
J
∑
j=1
1
σ
2
j
φ
T
j
s
y j(ts) ∗ rj(−ts)
−
(φ
T
j
s)
2
2
T
∑
t=1
r
2
j
(t)
(3.10)
This scaling parameter can be learned via grid-search on training data by maximizing any
performance metric of interest. Here, we choose the value of k that maximizes the sum of the
event detection AUC (see section 3.1.6) and event classification accuracy on training data.
We can interpret the output of the MED as S separate signals, or one for each possible event
class. At any given time, the maximum-valued output signal corresponds to the maximum-likelihood
estimate of the event class, while the peaks in this maximum-valued signal correspond to the
maximum-likelihood estimates of the event time. This idea is illustrated using simulated spikefield data in Figure 3.1, where correct saccade classification is indicated by the background color
matching the color of the peak signal.
In order to use the MED in a real dataset, the parameters of our model in (3.5) must be estimated
from training data. A maximum-likelihood method for estimating these parameters is described in
section 3.1.3.
3.1.3 Model Parameter Estimation
In order to make use of the MED, the model parameters of each channel must be estimated. Prior
work has developed a way to estimate spatial and temporal parameters for Poisson GLMs [36].
Figure 3.1: Sample output of the MED on simulated multimodal neural activity. In this example,
the events are rapid eye movements (saccades), and the event classes are one of 8 eye movement
directions. The MED produces 8 output signals, one for each class, which are shown plotted here.
The color of the background indicates the true saccade direction at a given time, while the color of
the maximum-valued output signal corresponds to the maximum-likelihood estimate of the saccade
direction. Peaks in the maximum-valued signal correspond to the maximum-likelihood estimate of
the saccade times. The mapping between colors and saccade directions is shown in the top left.
This method can also be used with Gaussian linear models. Building on this prior work [36] for
Poisson GLMs, we estimate the parameters in our multimodal model as follows.
The parameters that we must estimate are the temporal responses r(t), the spatial responses φ,
the Poisson baseline parameters αi
, and the Gaussian noise variances σj
. We can estimate these
parameters by maximizing the likelihood of a training dataset over these parameters. For Poisson
signals, this can be done efficiently via iteratively re-weighted least squares, and for Gaussian
signals, this can be done via simple linear regression. However, these methods require that our
models are log-linear in the parameters for Poisson channels, and linear in the parameters for
Gaussian channels. Consequently, we must rewrite our multimodal model in a way that is suitable
for parameter estimation.
In our models (3.1) and (3.3) the temporal responses r(t) are not in a parametric form. We can
address this by parameterizing them as a weighted sum of B basis functions:
r(t) =
B
∑
b=1
wb fb(t). (3.11)
52
In this way, the temporal response is parameterized by the weights w1,...,wB. In this work, the
basis functions fb(t) are 10 truncated Gaussians with a standard deviation of 100ms, with means
evenly spread out from -500ms to 500ms relative to the event time ts
. Now that our model has been
parameterized, we need a way to represent the independent variables, the event classes s and times
ts
, in a way that is suitable for generalized linear regression. To do this, we define a signal dn(t)
that encodes both the event classes and the event times:
dn(t) = ∑
ts∈τ
δ(t −ts)sn(ts) (3.12)
Here, sn(t) is the nth element of the one-hot vector s at time t and τ is the set of event times.
Using the parameterization of r(t) in (3.11) and the event-encoding signal d(t) in (3.12), the terms
∑ts∈T
r(t −ts)
φ
T
s(t) in (3.5) can be rewritten as a matrix multiplication, in a manner similar
to [36]:
∑
ts∈τ
r(t −ts)
φ
T
s(t)
=
∑
ts∈τ
r(t −ts)
s
T
(t)φ
=
∑
ts∈τ
r(t −ts)
s1(t), ... ,
∑
ts∈τ
r(t −ts)
sn(t)
φ
=
r(t) ∗ d1(t), ... , r(t) ∗ dn(t)
φ
=
b
∑
i=1
wi
fi(t) ∗ d1(t), ... ,
b
∑
i=1
wi
fi(t) ∗ dn(t)
φ
= [w1 ... wn]
f1 ∗ d1(t) ... f1 ∗ dn(t)
.
.
.
.
.
.
.
.
.
fb ∗ d1(t) ... fb ∗ dn(t)
φ
= w
TX(t)φ .
(3.13)
Here, ∗ is the convolution operation and w = [w1 ...wb]
T
is a vector of the temporal response
parameters. The matrix X(t) encodes the independent variables ts and s, which will be known fo
training data, along with our chosen basis functions fb(t). Substituting this into (3.1) and (3.3), we
obtain a generalized bilinear model for point process channels:
logλi(t) = w
T
i X(t)φi +αi (3.14)
and a bilinear model for Gaussian channels:
y j(t) = w
T
j X(t)φj +wj(t), wj(t) ∼ N (0,σ
2
j
) (3.15)
w and φ are parameter vectors that represent the temporal and spatial responses respectively,
and must be estimated from the training dataset. Generalized bilinear models have previously been
used to model the spatiotemporal responses of spiking neurons [33, 36].
If we fix the value of w, then we can learn φ by fitting a linear model or Poisson GLM with
w
T
i X(t) as the independent variable and the training data, either Gaussian or point-process, as the
dependent variable. We can similarly learn w by fixing φ and using X(t)φi as the independent
variable. In this work, we fit parameters by iteratively fixing either the spatial or temporal response
and fitting the other, as in prior works [33, 36]. Parameters were initialized to vectors of all ones.
For point-process signals, α is learned as the bias parameter in the GLM, and for Gaussian
signals, σ is the standard deviation of the residuals after the parameters are learned.
3.1.4 Multimodal Model of Saccade-Sensitive Neural Activity
Prior work has shown that saccadic eye movements elicit transient responses in neural activity that
depend on the saccade direction [36, 105]. This provides an ideal validation setting for the MED by
allowing us to test whether it can be used to detect and classify saccades from multimodal neural
activity. In the dataset described in Section 3.1.5, there are 8 possible eye movement targets, and
as such we consider the case where there are eight possible saccade directions. We model this
by making our event class s a one-hot vector in R
8
, with each entry representing one possible
saccade direction. The spatial responses φ ∈ R
8
encode how saccades to each possible direction
54
Figure 3.2: Model of saccade-sensitive multimodal neural activity. A) r(t) is the temporal response
of the channel to an event at time ts
, while the spatial parameter φ encodes the magnitude of the
channel’s response to each possible saccade direction s. B) Simulated Poisson (top) and Gaussian
(bottom) data based on the model in A. Saccade times are indicated by vertical dashed lines, and
the corresponding saccade directions are indicated by the circled arrows.
change the magnitude of the temporal response r(t). This saccade encoding model is illustrated in
Figure 3.2A. Simulated multimodal data based on this model is shown in Figure 3.2B.
3.1.5 Nonhuman Primate Saccade Task
In addition to simulations, we validated the MED on multimodal spike-LFP data collected from the
prefrontal cortex (PFC) of a macaque monkey performing a delayed saccade task. The task design
is as follows. First, the monkey was required to maintain its gaze on a central point on a screen
for 500-800ms. After this fixation, a target cue appeared in one of eight possible locations. After
a delay period of 1000-1500ms, a ’go’ cue prompted the monkey to make a saccade to the target.
A movable electrode array consisting of 32 electrodes (Gray Matter Research, USA) was placed
over the prearcuate gyrus of the lateral PFC to record neural activity. Raw neural signals were
sampled at 30 kHz. In order to isolate single-unit activity, the raw data was preprocessed by highpass filtering at 300 Hz and thresholding at 3.5 standard deviations below the signal mean. Spike
sorting was then performed via principal component analysis and k-means clustering. LFP signals
were acquired by band-pass filtering the raw data from 0.5 to 300 Hz. Both spike and LFP signals
55
were then downsampled to 1kHz. Eye position was recorded using an infrared eye-tracking system
(ISCAN, USA) with a sampling rate of 120 Hz. All surgical and experimental procedures were in
compliance with National Institute of Health Guide for Care and Use of Laboratory Animals and
were approved by the New York University Institutional Animal Care and Use Committee. Further
details can be found in [84].
3.1.6 Performance Evaluation
We assess the ability of the MED to detect event times using Receiver Operating Characteristic
(ROC) curve analysis [73]. The ROC plots the probability of true detection of events against the
probability of false detection as a threshold for detection is varied. The area under this curve
(AUC) is used as a threshold-free performance metric, with an AUC of 0.5 indicating chance-level
detection and an AUC of 1 indicating perfect detection. We use a modified version of the AUC
that counts detections that are within 200ms of a true event to count as true detections.
The details of this modified AUC measure are as follows. Let x(t) be the maximum value of the
MED’s output at any time t, and let h be an event detection threshold. We consider all peaks in x(t)
that are above the threshold h. If a peak is within 200ms of an actual saccade, then it is considered
a true positive. If it is not, then it is considered a false positive. In this way, we record the true
positive rates (TPRs) and false positive rates (FPRs) as the threshold h varies from min(x(t)) to
max(x(t)). We can then plot these FPR and TPR values against each other to construct an ROC
curve, and the area under this curve is our AUC metric.
3.2 Results
We validated the MED in the context of saccade detection from neural activity, using both numerical simulations and non-human primate neural signals. In both cases, the goal of the MED is to
detect saccade onset time and direction.
56
Figure 3.3: Comparison of true and estimated parameters based on simulated multimodal data. We
simulated multimodal neural activity that encodes saccade time and direction, and used maximum
likelihood estimation to estimate the model parameters. A) Ground truth (top row) and estimated
(bottom row) spatiotemporal responses for five example Gaussian channels. Each trace indicates
the channel’s response to a specific saccade direction, indicated by its color. The mapping between
trace colors and saccade directions is shown on the top right. Our maximum likelihood estimation
procedure resulted in parameters that closely match the ground truth. B) Same as A but for five
example Poisson channels.
3.2.1 Simulation Results
We simulated the activity of five Poisson and five Gaussian channels using the model described in
Section 3.1.4. We generated neural signals corresponding to 20 saccades, with a jittered 2 second
gap between each saccade. The simulated point process channel signals had a minimum firing rate
of 10 Hz and a maximum firing rate of 100 Hz, while Gaussian signals had an SNR of 0.1. These
values were chosen in order to roughly match the properties of the real dataset we analyze in section
3.2.2. We repeated this simulation 104 times and for each repetition the true neural parameters,
saccade directions, and saccade time jitters were randomized. The true neural parameters were
57
Figure 3.4: MED performance on simulated saccade-sensitive multimodal neural activity. All
measures are shown as a function of the number of channels of each modality and are averaged
over 104 simulation repetitions. A) saccade time detection area under the curve (AUC); higher
AUC indicates better saccade detection performance. B) saccade time detection normalized rootmean-square error (NRMSE); lower NRMSE indicates better performance. C) saccade direction
classification accuracy. The MED successfully combines information across data modalities, with
performance increasing monotonically as channels of either type are added.
used to generate training and test data for each iteration, and neural parameters were estimated
from the training data using the method described in section 3.1.3. These estimated parameters
were then used to perform simultaneous saccade time detection and direction classification on the
test data.
A comparison of the estimated parameters with the ground truth is shown in Figure 3.3. The
event detection and classification performance metrics, averaged across simulation iterations and
as a function of multimodal channel counts, are shown in Figure 3.4. This simulation analysis
shows that: 1) we can successfully estimate spatial and temporal response parameters from simulated multimodal data and 2) the MED can use these estimated parameters to successfully perform
simultaneous event detection and classification from multimodal data. Importantly, the MED successfully combines information across data modalities, with performance increasing monotonically
as channels of either modality are added.
58
Figure 3.5: MED performance on multimodal non-human primate (NHP) neural data. A) Saccade
time detection AUC. B) Saccade direction classification accuracy. Results in A and B are averaged over 8 cross-validation folds and 10 shuffles of channel order. C) Performance benefit of
using multimodal data over spike-only data, as a function of the number of spike channels. Performance benefit is shown for both saccade time detection AUC (top) and direction classification
accuracy (bottom). Bars indicate the mean percent improvement from spike-only to multimodal
performance, and error bars indicate the standard error of the mean (SEM). Multimodal data includes all LFP channels, with different number of spike channels (i.e., spike count) as indicated on
the x-axis. For all spike-count values, there was significant benefit in adding LFP channels (p <
.001, paired t-test, N=80). D) Same as C, but showing the benefit of using multimodal data over
LFP-only data. For all LFP-count values, there was significant benefit in adding spike channels (p
< 1e-14, paired t-test, N=80).
59
3.2.2 Nonhuman Primate (NHP) Data Results
To validate the MED on a real-world dataset, we used spike-LFP neural data recorded from a
macaque monkey performing a delayed-saccade task. In this task, detailed in Section 3.1.5, the
monkey first had to maintain its gaze on a central fixation point on a screen, and then make a
saccade to one of eight peripheral targets after a ’go’ cue [84]. We used a subset of this data
consisting of eight continuous recording sessions containing a total of 1871 trials. We evaluated
the MED via leave-one-out cross validation over sessions, where all sessions except for one were
used to train the MED, and the remaining session was used to test it. This cross-validation was
repeated eight times so that each session could be used for testing. In order to explore how the MED
integrates information across modalities, we measured performance as channels of both modalities
were added one at a time. For each cross-validation fold, we shuffled the order in which spike and
LFP channels were added 10 times. These 10 shuffles were the same for each fold.
The MED was able to successfully detect and classify saccades simultaneously from multimodal spike-LFP data. First, even with a relatively low number (10) of spike and LFP channels,
the MED achieved a saccade time detection AUC of 0.96 (chance level = 0.5) and a saccade direction classification accuracy of 0.55 (chance level = 0.125). These are both significantly higher
(p < 5e-5, paired t-test, N=80) than the corresponding chance levels. Second, the MED successfully achieved multimodal fusion. Both detection AUC and classification accuracy increase as
channels of either type are added, as shown in Figure 3.5A and B. Indeed, multimodal performance
was significantly greater than both spike-only (p < 1e-3, Hochberg-corrected paired t-test, N=80)
and LFP-only performance (p < 1e-14, Hochberg-corrected paired t-test, N=80) for all unimodal
channel counts, as shown in Figure 3.5C and D. While this performance increase was indeed significant for all unimodal channel counts, the increase was larger in the low-information regime,
i.e., when unimodal channel counts were smaller.
Interestingly, despite LFP-only performance being lower than spike-only performance, the
MED was still able to improve upon spike-only performance by adding LFPs. This indicates
that the MED truly integrated information from both modalities, taking advantage of information
60
present in LFP channels that was not present in spike channels. This result also suggests that spiking and LFP activities carry non-redundant information about both the timing and the class of a
saccade event.
3.3 Discussion
In this work we developed the MED, which solves the unaddressed problem of event detection
and classification from multimodal point-process and Gaussian time series. We showed that both
in simulated data and in an NHP neural dataset, the MED was able to simultaneously detect and
classify eye movements in a way that successfully combined information across data modalities.
3.3.1 Bi-Directional Performance Improvement
A successful multimodal decoder should take advantage of the information present in all available
data modalities. To show that the MED does this, we performed an extensive performance analysis
in which channels of both modalities were incrementally included as input to the MED. On both
simulated and real NHP data, this analysis showed that MED performance improves bidirectionally
– both when LFP channels are added to a fixed number of spike channels, and when spike channels
are added to a fixed number of LFP channels. In the NHP dataset, although spike-only performance
was generally better than LFP-only performance, multimodal performance was still significantly
better than both. These results suggest that LFPs contain information about both event timing and
event class that is not present in spikes, and vice versa.
3.3.2 Cross-Modal Scaling
As mentioned in section 3.1.2, we modified the maximum-likelihood estimator of event times and
classes by adding a cross-scale combination parameter. As we explained, this is necessary when
we have a combination of discrete and continuous modalities, because the likelihood in the former
provides a probability measure while the likelihood in the latter requires integration over a range of
61
values to give a probability. Thus combining likelihoods of discrete and continuous modalities is
sensitive to scalings of the latter. Indeed, scaling a Gaussian signal will in turn scale its likelihood,
even though its signal-to-noise ratio (SNR) is unchanged.
To see why this is, consider a signal y = s + n, where s = b is a constant signal and n ∼
N (0,σ
2
) is noise. The SNR of this signal is b
2/σ
2
, and the likelihood of observing y = b is
given by the peak of the Gaussian PDF 1/(σ
√
2π). If b = 1 and σ
2 = 1, then the SNR of y is
1 and the likelihood of observing y = 1 is 1/(
√
2π) ≈ 0.4. If we then multiply y by 2, this is
equivalent to setting b = 2 and σ
2 = 4. The SNR is still 1, but the likelihood of observing y = 2
is now 1/(2
√
2π) ≈ 0.2. So, even though scaling y does not change its SNR, it does change its
likelihood. This is problematic because ideally the contribution of Gaussian channels to the MED
output should depend only on their SNR, and not on their scale. In practice, we found that a single
parameter weighing the contribution of all Gaussian channels together was sufficient to properly
combine their information with the point-process channels, thereby alleviating this issue.
In addition to the above mathematical reason, another practical reason for using this scaling
parameter is model mismatch in real neural datasets. If any of the model parameters are incorrectly
estimated in a systematic way across a single modality, then the learned model will not properly
weigh the contributions of each modality. For example, it is possible that there are events unrelated
to eye movements that also elicit spatiotemporal responses in both types of channels. For Gaussian
channels, this can be accounted for by the noise variance parameter σ, but Poisson models do not
have a way to account for this kind of variance. This phenomenon could cause one scale to be
ignored even if it has information to contribute.
3.3.3 Applications and Future Directions
The application of the MED in a real-time BCI is an important research direction. While motor
BCIs to date have focused on continuous movement decoding, the development of cognitive BCIs
will in many contexts require event detection in order to perform decoding. For example, in the
context of a stimulus-based decision task, one would first need to detect a stimulus onset in order
62
to then decode decision-related information, such as the decision itself or the associated confidence [26, 110–112]. Thus, the MED can enable multimodal cognitive BCIs for decision making
in real-world applications where event times are unknown. The MED can also help extend multimodal motor BCIs to naturalistic setups where task-related events, such as movement onset, must
be detected before decoding is possible.
In section 3.1.2, we assumed conditional independence between the spiking and LFP modalities, conditioned on event times and classes, in order to make the derivations tractable. Since
the MED successfully combined information across data modalities in both event detection and
classification, we can conclude that this is a reasonable assumption. The fact that this conditional
independence assumption between LFPs and spikes can be reasonable has also been shown in various prior studies (e.g., [47, 102]). Further, without this assumption, the number of parameters
can become prohibitively large if interdependencies between every channel pair are considered,
thus raising the possibility of overfitting to training data – especially in neural datasets that can be
limited in sample size. Nevertheless, future work can consider modeling such interdependencies
while regularizing or sparsifying them to avoid overfitting.
63
Chapter 4
The Neural Correlates of Decision Confidence and Their
Potential for Use in a Performance-Enhancing Brain-Computer
Interface
Having developed event detection and decoding algorithms to enable decoding in realistic settings,
we now investigate how cognitive states can be decoded and used in a BCI system to enhance a
user’s capabilities. As mentioned in chapter 1, BCI technology is typically used to restore functionality to injured or impaired patients. While many such BCIs have been developed using invasive
neurophysiology modalities [1, 3, 10, 11, 113], non-invasive electroencephalography (EEG) based
BCIs have also been used to restore functions such as locomotion, motor control, and communication to impaired patients [114–123]. Beyond the treatment of pathological conditions such
as motor impairment and neuropsychiatric disorders, there is also growing interest in developing
non-invasive BCIs that can improve individual capabilities [16–20].
Prior work on such BCIs includes memory enhancement [124], drowsiness detection [125–
127], driving assistance [128], remote robot control [129], cursor control [130], trust evaluation
[131], group decision making [132], error detection [133], and attention monitoring [134, 135].
Within this class, one important application of interest is to develop BCIs that help subjects make
critical decisions with increased reliability, speed, and accuracy, especially in critical, stressful or
time-pressured situations [15].
64
Figure 4.1: Schematic of a confidence-based brain-computer interface for improved task performance. While the user performs a perceptual decision task, post-stimulus (pre-response) neural
activity is passed to the decoder, which estimates the user’s confidence in their decisions. Based
on this estimated confidence, the BCI can generate feedback to assist the user and improve their
performance.
In this chapter, we envision a non-invasive confidence-based BCI that aims to improve decision
accuracy. To be realized in the future, such a BCI will first need to decode the user’s decision confidence, such that decoded confidence can be used to determine when to provide sensory feedback
designed to increase the user’s decision accuracy, as shown in Figure 4.1. Indeed, self-reported
confidence has been shown to be highly predictive of decision accuracy [23–25], and is also reflected in neural activity [23, 26, 29] and could therefore be an appropriate brain state for use in
a BCI for improved decision accuracy. However, developing such a confidence-based BCI for
decision making is challenging because it requires answering two questions, which remain elusive.
First, we should determine if confidence can be reliably decoded prior to the user’s decision,
and how accurate such decoding can be. This pre-response decoding is necessary in order for
the BCI to have time to influence the user’s decision via sensory feedback. Second, we should
determine if the accuracy of such a pre-response decoder is sufficient to enable a BCI to improve
the user’s decision accuracy, and under what task conditions this improvement is observed. Here,
we address these two standing questions.
65
Regarding the first question, prior work has shown that decision confidence is reflected in
neural activity, and can be decoded at above-chance levels [23, 26, 29, 111, 136, 137]. However, it
is still largely unclear whether the neural encoding of confidence is stimulus-locked or responselocked. This distinction is of critical importance for BCI development, as it determines the ways
in which a BCI can intervene in the decision-making process. If confidence can be decoded prior
to the decision, then a BCI can provide feedback to influence the very same decision, for example
to discourage decision making when the user is not confident. If confidence can only be decoded
after the decision, a BCI wouldn’t be able to influence the same decision, but could still be useful
for performance monitoring, for example to keep track of how well-calibrated a user’s confidence
is to their performance. While some studies show that neural activity is modulated by confidence
in the stimulus-locked epoch [26–28], others suggest that this encoding is stronger in the responselocked epoch [23, 29, 30]. We thus first focus on the question of whether confidence is a stimulusor response-locked phenomenon, and if it is stimulus-locked, how well it can be decoded.
In order to dissociate stimulus-locked and response-locked correlates of confidence, we design
a novel experimental task that collects subjective confidence reports while clearly separating the
stimulus processing from the response using a time gap. This gap is important as stimulus-locked
activity can interfere with response-locked activity without a sufficient post-stimulus gap [138–
141]. Results in prior studies differ from each other on both the presence of a post-stimulus gap,
and on whether the neural representation of confidence is found to be response-locked or stimuluslocked. For example, in [23] and [29], no post-stimulus gap is present, and confidence appears to
be more strongly represented in the response-locked epoch. In [111], several tasks with very short
(250ms) post-stimulus gaps are studied to show that transfer learning can decode confidence across
tasks, but the gap is not sufficiently long to prevent stimulus-response overlap. In comparison, in
[26], a post-stimulus gap is used, and it is shown that confidence can be decoded from stimuluslocked neural activity, while the existence of any response-locked confidence modulation is not
explored.
66
A study that compares stimulus- and response-locked EEG activity collected during tasks both
with and without a stimulus-response gap is necessary to reconcile these different conclusions, but
no such study has been done so far. Existing works that do include a sufficiently long post-stimulus
gap do not compare stimulus-locked and response-locked activity, and therefore cannot rule out
the possibility of response-locked correlates of confidence. Here we design novel experiments and
use event-related potential (ERP) analysis to systematically probe the effect of this gap on neural
correlates. Our ERP analysis reveals the confidence-related activity to be stimulus-locked, which
is promising for the use of confidence in a BCI. Our novel task design allows us to show that
this activity is stimulus-locked and not response-locked, in the presence of a post-stimulus gap.
We then use EEG source localization analysis to explore which brain regions reflect sources of
confidence-related activity.
Having characterized the neural correlates of confidence and their stimulus vs response-locked
nature, we then investigate how well confidence can be decoded from single-trial neural activity
prior to any response. Prior studies have shown that confidence can be decoded from single-trial
EEG activity, but it is unclear which of the many existing classification methods is best suited for
EEG decoding in this setting. Further, here we can assess the decoding purely due to pre-response
activity. This is because the gap introduced in our experiment ensures that the neural activity
used for classification does not contain components related to response execution, which may be
influenced by the response modality (verbal, button press, etc.) and which can help/confound the
decoding. Thus, we perform a systematic comparison of classification methods for pre-response
EEG, including various neural networks, in terms of their pre-response decoding of confidence.
Among the considered alternatives, we find a logistic classifier to have the best performance, likely
because of their data-efficient training.
Regarding the second question, even if confidence can be decoded from single trials preresponse, it remains unclear if, and under what conditions, the achieved decoder accuracies can
enable a BCI to improve a user’s decision accuracy. We thus next explore this question by devising
a simulated BCI framework. Our simulation framework not only allows us to determine whether
67
a BCI can improve task performance, but also under what task conditions such improvement is
more pronounced. Our simulation analysis shows that neural classifiers of confidence that have the
same accuracy as that observed in our data can indeed improve task performance within the BCI.
Further, this improvement is largest for difficult and high-risk task conditions.
4.1 Methods
4.1.1 Experimental task
All experiments were approved by USC’s Institutional Review Board, and all participants gave
written informed consent to participate in the study. We developed an experimental paradigm in
order to evoke different levels of confidence in response to realistic stimuli. The 3D models and
images used in our experiment are similar to those used in [136]. The timeline for a single trial
of the experiment is shown in Figure 4.2A. The stimulus, a character wearing either a cap or a
helmet, was shown against a corridor background for 250ms. After a 1.75 s post-stimulus gap,
subjects were asked whether they saw a cap or a helmet. This post-stimulus gap was necessary
in order to dissociate stimulus- and response-locked phenomena, and is discussed in detail below.
Following the post-stimulus gap, subjects reported their response via mouse click – left click for
helmet, right click for cap. There was no time limit for this response. After a 1.5 s post-response
gap, subjects were then asked to report their confidence on a scale from 1 to 10, with a 1 being a
random guess and a 10 being complete confidence. The prompt for confidence report was shown
for a fixed duration of 2 seconds, after which the value indicated by the subject was recorded as
their confidence. Subjects performed a total of 640 trials in 40-trial blocks. We now describe the
rationale behind the gap sizes and the method by which task difficulty and thus confidence level
were varied.
In order to vary the difficulty of the task, we blurred the stimuli by different amounts on each
trial. Specifically, the stimuli were blurred to 5 different levels (Figure 4.2B) using gaussian kernels of varying sizes, with each blur level appearing an equal number of times in the first block.
68
After the first block, the number of maximally- or minimally-blurred stimuli in each subsequent
block was adjusted based on the subject’s accuracy in the most recent block. If the accuracy was
below 65 %, 5 maximally-blurred stimuli were replaced with minimally-blurred stimuli in order
to make the task easier. If the accuracy was above 85%, 5 minimally-blurred stimuli were replaced with maximally-blurred stimuli in order to make the task harder. This was done in order to
keep the accuracy in the range of 65%-75% because this range results in roughly equal numbers
of confident and unconfident trials as follows. For highly confident trials, we would expect near
100% accuracy, while for minimally confident trials, we would expect near 50% accuracy (random
chance). Therefore, with an equal number of confident and unconfident trials, the total accuracy
should be around 75%. We refer to this task as the gap task.
Our 1.75 s post-stimulus gap was necessary in order to separate stimulus- and response-locked
phenomenon. ERPs such as the motor-related cortical potential (MRCP) have been shown to start
up to 1.5s prior to a response-related movement [142], so it was necessary for the post-stimulus
gap to be longer than this. This gap prevents response-locked activity which may be influenced by
the response modality (verbal, button press, etc.) from interfering with stimulus-locked activity.
Therefore, this gap allows us to observe stimulus-locked activity that is independent of the response
modality. Further, without this gap, it is possible that stimulus-related activity can leak into the
response-locked epoch, thereby causing stimulus-locked activity to be seen as response-locked.
In order to assess the impact of the post-stimulus gap, we also collected data during a stimulusdiscrimination task that is identical to the gap task but with no post-stimulus gap, referred to here
as the no-gap task (Figure 4.2A).
4.1.2 Data collection
We recruited 16 subjects to participate in our study (average age: 26 years; 2 female, 14 male; 1
left-handed, 15 right-handed). All subjects had normal or corrected-to-normal vision. 11 subjects
participated in the gap task, while 8 subjects participated in the no-gap task. 3 subjects that participated in the no-gap task also participated in the gap task. Two gap-task subjects were removed
69
Figure 4.2: Experimental protocol for the gap / no gap stimulus discrimination tasks. Users were
presented with an image of a character wearing either a cap or a helmet. In the gap task, a blank
screen with a fixation cross was shown for 1.75s before the subject was prompted for a response.
In the no-gap task, the subject was prompted for a response immediately. Following a second
gap of 1.5s, subjects were then asked to report their confidence via scroll bar on a scale from
0 (unconfident) to 10 (confident). A 0 corresponds to a random guess, while a 10 corresponds
to complete certainty. Subjects had a total of 2 seconds to complete their confidence report, but
there was no time limit for the initial response. (B) Stimuli were blurred to one of five levels to
modulate task difficulty. In the first block of trials, all blur levels occurred with equal frequency.
In subsequent blocks, the proportion of blurred and unblurred stimuli was adjusted based on the
user’s performance to have roughly equal numbers of confident and unconfident trials.
70
from the ERP and source localization analyses due to excessive eye movement artifacts. We collected EEG data using a 256 electrode BioSemi ActiveTwo system. In addition to scalp EEG, 2
electrodes were placed on the ears for re-referencing. Further, 4 electrodes were placed around the
subjects’ eyes in order to collect electrooculogram (EOG) data. This EOG data was used to correct for eye-movement artifacts, as explained in section 2.3. Stimuli were presented on an 11-inch
laptop screen with a 60 Hz refresh rate, using the PsychoPy python toolbox [143]. Subjects were
seated approximately 70 cm from the screen, and used a USB-connected mouse for all task-related
responses. All subjects were right-handed and controlled the mouse with their dominant hand.
4.1.3 Data pre-processing
We first downsampled the raw EEG data from 2048 Hz to 256 Hz. For the ERP analysis, we applied
a 1-40 Hz noncausal bandpass filter (zero-phase order 1406 Chebyshev FIR filter) to remove DC
offsets and high-frequency noise. For the classification analysis, we instead used a 1-40 Hz causal
bandpass filter (order 12 Butterworth IIR filter), which is more appropriate in a BCI setting. In
order to remove eye-movement artifacts, we performed independent component analysis (ICA)
using the InfoMax algorithm [144, 145]. We regressed IC’s onto vertical and horizontal EOG
signals and removed the IC’s with r
2 values above a threshold of .4. This procedure resulted in
1-3 IC’s being identified for removal. Next, EEG channels that were identified based on a visual
inspection as too noisy (mean=13.5, std=11.8 channels removed) were replaced via spatial spline
interpolation. Lastly, data was re-referenced to the average of signals from the two ear electrodes.
Finally, trials identified based on a visual inspection as containing high noise levels were removed
from the analysis. For the gap task, 101 of 6840 trials were removed, and for the no-gap task, 0 of
5120 trials were removed. All preprocessing was done using the FieldTrip toolbox [146].
71
4.1.4 Data analysis
4.1.4.1 ERP analysis.
We performed a grand average ERP analysis (i.e., an average over subject averages) in order to investigate the neural correlates of confidence. Data was segmented into 3.2 second stimulus-locked
epochs and 1.2 second response-locked epochs. Stimulus-locked epochs started .2s before stimulus onset and ended 3s after. Response-locked epochs started .4s before the response and ended .8
seconds after. We grouped trials by confidence and compared confident trials with unconfident trials during stimulus- and response-locked epochs. Specifically, we performed two analyses. In our
first/main ERP analysis, the ‘confident’ group contained trials within the top 20% of confidence
reports across recording sessions, while the ‘unconfident’ group contained trials within the bottom
20% of confidence reports across sessions. The remaining trials that had intermediate confidence
levels were excluded from both groups due to the ambiguity in the ground truth. For the gap task
3306 out of 6739 trials were excluded. For the no-gap task 2158 out of 5120 trials were excluded.
We performed the ERP analysis for both the gap and no-gap tasks, and for both stimulus- and
response-locked epochs, with the goal of investigating how response- and stimulus-locked correlates of confidence change in the presence of a post-stimulus gap.
To test the statistical significance of the difference in ERP between confident and unconfident
trials, we used a cluster-based permutation test [147]. This method has been shown to solve the
multiple-comparisons problem, which arises when many statistical tests are performed simultaneously. The procedure for the cluster-based permutation test is explained in Appendix A.1.
In order to ensure that our conclusions from ERP analyses are not sensitive to the exclusion of
trials or the specific choice of confidence threshold, we performed supplementary analyses where
no trials were excluded, and with various confidence thresholds. These additional analyses are
described in Appendix A.2 and led to consistent conclusions.
72
4.1.4.2 EEG source localization.
To understand the neural sources of the confidence-related activity seen in our ERP analysis, we
performed source localization. The high-density 256 channel EEG system makes our experiments
well-suited for this localization compared to experiments with lower channel counts [148, 149].
EEG activity measured on the scalp is caused by intracranial electrical currents. Distributed source
localization algorithms attempt to solve the inverse problem of estimating the 3D distribution of intracranial current density from scalp EEG activity [150–152]. One such algorithm, eLORETA, has
been shown to solve this inverse problem with exact, zero-error localization even in the presence of
noise [153]. Further, several studies have shown that sources identified by the LORETA family of
algorithms are consistent with those identified by neuroimaging methods such as MRI [154–158].
It has also been shown that EEG systems with high electrode density, such as the one used in our
study, improve localization performance [148, 149]. In order to localize the sources of confidencerelated EEG activity, we used the eLORETA algorithm as implemented in the FieldTrip MATLAB
toolbox [146]. The eLORETA algorithm requires a volume conduction model of the head (head
model) to determine how electrical activity generated at the source points propagates to the scalp.
We used a template head model provided by the FieldTrip toolbox that is described in [159]. We
also performed a control analysis in Appendix A.3 and Figure 5.3 using a public dataset with simultaneous EEG and intracranial stimulation, with the stimulation electrode positions serving as
the ground truth for source localization analysis. Using this control analysis, we validated our
source localization pipeline to show its accuracy and further found that performance when using a
subject-specific head model did not significantly differ from performance when using a template
head model.
4.1.5 Single-trial decoding
We compared six classification algorithms in terms of their ability to decode confidence from single
trials of the gap task (Figure 4.3). Specifically, we used logistic regression (log), support-vector
machine (SVM), a Riemannian geometry (RG) classifier [160, 161], and three neural network
73
(NN) architectures. The neural networks included a multi-layer perceptron (MLP) network, a
spatial convolutional NN (CNN), and EEGNet, an NN that was designed specifically for use with
EEG data [162].
For SVM classification, we used a nonlinear SVM with a Gaussian kernel. SVM and logistic
regression classifiers were implemented using the scikit-learn Python library [163].
RG classifiers have been shown to be effective in various EEG classification tasks [160, 161,
164]. In this work, we use a similar approach to that used in [162]. This approach involves
dimensionality reduction via xDAWN spatial filtering [165], a nonlinear transformation based on
Riemannian geometry [160, 161], and classification via logistic regression.
We selected hyperparameters for the logistic, SVM and RG classifiers via grid search on training data within each cross-validation fold. This was done for the SVM’s Gaussian kernel size and
regularization parameters, the logistic classifier’s regularization parameter, and the RG classifier’s
regularization parameter.
NN classifiers have been shown to have better classification performance than traditional methods in several domains, and have gained popularity for their ability to approximate arbitrary nonlinear functions [166–169]. CNNs are a modification to standard NNs that use 2D convolution to
take advantage of spatial structure in input, thereby reducing the number of trainable parameters.
CNNs have shown great success in tasks such as image classification, where the input has spatial structure [170–172]. Lastly, EEGNet uses 1D temporal convolutions and has been shown to
outperform other classification methods on several EEG datasets [162].
The input to our spatial CNN is a three-dimensional tensor, where the first two dimensions are
spatial, i.e., corresponding to EEG electrode location, and the third dimension is time. In order to
use this spatial CNN with our EEG data, we needed a mapping from our 256 three-dimensional
electrode locations onto pixels in a two-dimensional 16 by 16 image, which we took as the first
two dimensions of the CNN input. It was important that this mapping preserve the spatial structure
of the electrode coordinates, i.e., that electrodes that were near each other on the EEG cap were
also near each other in the 16 by 16 image. Our method for doing this is illustrated in Figure 4.3C.
74
Essentially, we placed a 16 x 16 point grid under the 3-D electrode coordinates, and assigned each
grid point to the electrode nearest to it. In order to prevent multiple electrodes from mapping to the
same grid point, grid points were assigned sequentially, with each grid-point being assigned to the
nearest un-assigned electrode. A rectified linear unit (ReLU) activation was used as the activation
in all hidden units in both MLP and CNN classifiers, and both classifiers were implemented using
the tensorflow library [173]. The architectures of the MLP and CNN classifiers are shown in
Figure 4.3B and Figure 4.3D, respectively.
For the EEGNet classifier, we used 256 Hz data from 0-1s post-stimulus, or 256 samples per
channel, as input (Figure 4.3A, right). For all other classifiers, we used 8 Hz data from 0.2-0.7s
post-stimulus, or 4 samples per channel, as input (Figure 4.3A, left). We used a different sampling
rate and longer time duration for the EEGNet input in order to match the characteristics of the data
used to validate it in prior work. Indeed, more samples are required to get the most out of EEGNet’s
temporal convolution layers. Note that despite this, EEGNet was not the most performant classifier
(see results). To obtain ground-truth binary class labels, we thresholded confidence reports at
the 80th percentile across all subjects – trials with a reported confidence at or above this value
had a ground-truth label of ‘confident’, while those below this value had a ground-truth label of
‘unconfident’.
We assessed the performance of all the classifiers via the area under the curve (AUC) measure [73]. Briefly, each classifier takes a single trial of neural activity as input, and computes
a continuous-valued score as output. This score is then thresholded to determine the predicted
discrete class (confident or unconfident) of the associated input. By varying this classification
threshold, we also vary the true-positive rate (TPR) and false-positive rate (FPR) of the classifier.
The receiver operating characteristic (ROC) curve plots the TPR of a classifier against its FPR for
various classification thresholds, and the AUC is the area under this curve. An AUC of 1 indicates
perfect classification, while an AUC of .5 indicates that a classifier does no better than chance. All
classifiers were evaluated via 5-fold stratified cross-validation. This was done using the scikit-learn
implementation of cross-validation [163].
75
In order to ensure that non-causal training, i.e., test trials occurring before training trials, was
not providing an unrealistic advantage to the classifiers, we performed a control analysis where
for each subject, classifiers were trained on the first 80% of trials and tested on the final 20% of
trials. In order to control for the effect of stimulus blur level, we performed an additional analysis
where the most and least blurred stimuli were excluded. The results of these control analyses are
presented in Appendix A.4.
4.1.6 BCI simulation framework
In addition to studying the neural correlates of confidence and their decoding, our next goal was
to determine if confidence decoders with the same accuracy as that observed in our data could be
used as part of a BCI to improve a user’s task performance. To do this, we devised a simulated
BCI framework based on behavioral data collected in our experimental task, including decision
accuracies and confidence reports (Figure 4.4). We simulated a general stimulus-discrimination
task where a user must determine if a stimulus belongs to one of two categories. The simulated
BCI aims to help the user with making the correct decision. To do so, the BCI decodes the user’s
confidence after each stimulus presentation and repeats the stimulus until the decoded confidence is
above a given threshold. Once this confidence threshold is reached, the user is allowed to respond.
The details of the developed simulation framework are as follows: when the stimulus is first
presented, the simulated user’s true confidence, c, is drawn from a distribution based on real confidence reports collected during the gap task. The simulated decoder will then determine if the user’s
confidence is above threshold (confident) or below threshold (unconfident). In order to simulate a
realistic, imperfect decoder, we set the false-positive and true-positive rates (FPR and TPR) of the
simulated decoder to be the same as what the best classifier achieves for confidence classification
on the real EEG data. Specifically, we chose the point on the classifier’s ROC curve that maximizes decoder accuracy when 20% of trials are confident and 80% are unconfident – this reflects
the fact that our threshold for confidence is placed at the 80th percentile of the initial confidence
distribution. We found this maximum accuracy point on the ROC curve as follows: for a given
76
Figure 4.3: Confidence classifier architecture details. (A) 8 Hz data from .2 to .7 s post-stimulus
was used as input to the logistic, support vector machine, Riemannian geometry, multi-layer perceptron (MLP), and convolutional neural network (CNN) classifiers. For the EEGNet classifier, we
used 256 Hz data from 0-1 seconds post-stimulus. We used a different sampling rate and longer
time duration for the EEGNet input in order to match the characteristics of the data used to validate it in prior work. Indeed, more samples are required to get the most out of EEGNet’s temporal
convolution layers (despite this, EEGNet was not the best performant classifier, see results). An
anti-aliasing filter was applied before downsampling. (B) MLP architecture. Blocks indicate data
shape at each layer. Four samples from all 256 channels were flattened into a vector of length 1024
before being fed through fully-connected (FC) layers. In both panel B and panel D, the numbers
beneath FC layers indicate the number of units in that layer. For the MLP network, the first hidden
layer has 10 units, the second hidden layer has 5 units, and the output layer has 1 unit. (C) EEG
electrodes were mapped onto a 16x16 grid for use with the CNN. Each grid point was mapped
to the electrode nearest to it. Since 4 time-samples were used (panel A, left), the resulting tensor
was of size 16x16x4. (D) CNN architecture. Input data was fed through 2 convolutional layers
followed by a single hidden FC layer.
77
number of positive (confident) samples p and negative (unconfident) samples n, the point on the
ROC curve with highest accuracy is the point with a tangent line of slope n
p
closest to the top left
corner [174].
Given the FPR and TPR corresponding to the chosen point on the ROC curve, the decoder
(Figure 4.4, ‘Decoder’ box) is simulated as follows. If the user’s confidence is above threshold
(confident), then the decoder’s output is drawn from a Bernoulli distribution with parameter p
equal to the TPR – because TPR is the probability that the decoder will correctly output ‘confident’
when the subject is actually confident. If the user’s confidence is below threshold (unconfident),
then the decoder’s output is drawn from a Bernoulli distribution with parameter p equal to the truenegative rate (TNR), which is equal to 1 minus the FPR – because TNR is the probability that the
decoder will correctly output ‘unconfident’ when the subject is truly unconfident. For both cases,
a Bernoulli outcome of ‘1’ corresponds to a decoder output of ‘confident’ while an outcome of ‘0’
corresponds to a decoder output of ‘unconfident’. If the decoder outputs ‘unconfident’, then the
BCI repeats the stimulus. We then update the confidence according to the following rule:
ci+1 = ci +u(1−ci) (4.1)
where ci ∈ [0,1] is the user’s confidence after the i
th stimulus repetition, and u ∈ [0,1] is a confidence update parameter that determines how much the confidence increases after each stimulus
presentation (Figure 4.4, ‘Stimulus Presentation’ box). Low values of u indicate a difficult task
where several repeated presentations of the stimulus are required to gain above-threshold confidence, while high values indicate an easier task where fewer stimulus repetitions are needed. For
example, a version of our gap task where all the stimuli have the maximum blur level would correspond to a low value of u. The BCI continues the stimulus repetitions until the decoder outputs
‘confident’, at which point the user is allowed to respond. The user’s decision is simulated by a
weighted coin flip, where the probability of correctness linearly increases from .5 at c = 0 to .98 at
c = 1. This linear increase in confidence closely mirrors the confidence-accuracy curve observed
in our gap task behavioral data (Figure 4.4, ‘Decision’ box).
78
To measure the relative effectiveness of the BCI decoder, we also simulated a control task where
the stimulus was repeated randomly instead of based on a BCI decoder. For the control simulation,
after each stimulus presentation, there was a 50% chance that the user would be allowed to respond,
and a 50% chance that the stimulus would be repeated. This corresponds to a control decoder with
an FPR and TPR of .5. Importantly, to account for the fact that a post-stimulus gap is needed
to decode confidence but not needed for this random control BCI, we designed each repetition
of the control task to take less time than each repetition of the BCI task. Specifically, each BCI
repetition takes 700 ms, while each control repetition takes 450 ms. The BCI repetition time was
chosen because our logistic classifier uses data from up to 700 ms after stimulus onset. The control
repetition time was chosen based on a stimulus duration of 250ms and a reaction time of 200ms,
for a total of 450ms. This is roughly the amount of time that a subject would need to complete a
trial without any BCI intervention.
Task performance within the simulation was measured via a penalized bitrate, which we define
as
b =
ncorr −k ∗ nincorr
T
(4.2)
where ncorr is the number of correct responses, nincorr is the number of incorrect responses, T
is the total time taken to complete all trials, and k ∈ [0,∞) is the error penalty. Large values of k
indicate that errors are very costly, while small values indicate that they are not. This penalized
bitrate measures the number of correct answers per unit time, but with an additional term that
serves to penalize incorrect responses. By varying the parameters u (4.1) and k (4.2), we can
simulate tasks with varying levels of difficulty and error cost. We varied u from .01 to 1 in steps
of .01 and varied k from 0 to 10 in steps of 1. For each combination of parameters, we simulated
100,000 BCI trials and 100,000 control trials.
79
Figure 4.4: BCI simulation flowchart. First, an initial confidence value is drawn from a distribution determined by subjects’ reported confidence in the collected data. Upon stimulus presentation,
confidence is updated based on u, the confidence update parameter. Based on this updated confidence value, we probabilistically simulated whether the decoder classifies the subject as confident
or unconfident. We simulate the decoder decision based on the confusion matrix we obtained from
real EEG data. The confusion matrix contains the probability of each decoder outcome (top) given
the user’s true confidence (left), that is, the true positive rate (TPR), false negative rate (FNR), false
positive rate (FPR) and true negative rate (TNR) of the decoder on real EEG data. If the decoder
outputs ‘unconfident’, then the stimulus is repeated. If the decoder outputs ‘confident’, the subject
is allowed to respond. The subject’s response is simulated as a Bernoulli random variable, with the
probability of correctness being a linear function (denoted by f(.)) of their final confidence value.
This confidence-accuracy function is described in section 3.4 and is based on experimental data.
4.2 Results
4.2.1 Reported confidence is predictive of accuracy
Behavioral analysis showed that our subjects successfully performed the task as instructed, and
that their subjective confidence reports were good indicators of correctness on individual trials.
The Brier score [175], defined as the mean-squared error between reported confidence and correctness, was significantly better than chance level for all blur levels (p < .05, paired t-test, Figure 4.5A,E). This indicates that confidence was predictive of correctness across blur levels. Subjects also had significantly above-chance accuracy for each blur level (p < .05, paired t-test, Figure 4.5B,F). Accuracy increased monotonically with reported confidence, further indicating that
subjects’ confidence reports were well-calibrated (Figure 4.5C,G). Lastly, reaction times were significantly faster for confident trials than for unconfident trials in both the gap task and no-gap task
(p < .05, Wilcoxon rank sum test) (Figure 4.5D,H). For the no-gap task, the difference in median
80
Figure 4.5: Behavior analysis results. (A) Brier Score, defined as the mean-squared error between
reported confidence and correctness, for each blur level in the no-gap task. Smaller values indicate
that confidence is more predictive of accuracy. Stars indicate Brier Scores significantly below
chance level. Results indicate that subjects were able to successfully assess how likely they were
to be correct across all blur levels. (B) Average accuracy per blur level for the no-gap task. Stars
indicate accuracy significantly above chance level. Accuracy was significantly above chance level
for all blur levels. (C) Average accuracy for each confidence quintile for the no-gap task. (D)
Reaction time histograms for confident and unconfident trials in the no-gap task. Median (mdn)
reaction time is specified in each plot. (E-H) Same as (A-D) but for the gap task. For the gap task,
reaction times are shown relative to the earliest allowed response time, i.e., 1.75s post-stimulus.
reaction times between confident and unconfident trials was 205ms, while for the gap task, the
difference was 12ms.
4.2.2 Confidence-related ERPs are stimulus-locked
We performed a grand-averaged ERP analysis, comparing confident trials with unconfident trials
for the gap and no-gap task, for both stimulus- and response-locked epochs (Figure 4.6A). In
the no-gap task, we found that confident and unconfident ERPs at channel Pz were significantly
different (p < .05, cluster-based permutation test) both around 500ms after the stimulus onset and
around the response time (Figure 4.6A, top row), thus making it ambiguous whether the difference
was stimulus-locked or response-locked. Topographies depicting ERPs across all channels at both
81
these times also had similar characteristics, with positive differences between conditions appearing
over the parietal regions (Figure 4.6B, topography plots 1 and 2).
Interestingly, unlike the no-gap task in which there was ambiguity as to whether the activity
was stimulus- or response-locked, in the gap task, this pattern of activity was only observed in
the stimulus-locked epoch, and not in the response-locked epoch (Figure 4.6A, bottom row; Figure 4.6B, topography plots 3 and 4). Importantly, in the gap task, this stimulus-locked difference
occurred long before the response, suggesting that there is no possibility of interference between
stimulus- and response-locked activity.
Parietal regions had large ERP differences during the stimulus-locked epoch in the no-gap task,
the response-locked epoch in the no-gap task, and the stimulus-locked epoch in the gap task, suggesting that shared brain processes may be involved in all three epochs (Figure 4.6B, topography
plots 1-3). This pattern of activity is consistent with the P300 ERP, which is known to be associated with task-related stimulus processing [176, 177]. Further, the latency of the peak difference
between conditions relative to the stimulus onset, around 500 ms, suggests that the observed neural activity corresponds to a cognitive, top-down process, and not to low-level stimulus processing,
which typically occurs much earlier post-stimulus (around 100ms) [177, 178].
Given that the response-locked activity follows a different pattern in presence of a post-stimulus
gap (Figure 4.6B, topography plot 4), these results suggest that confidence-related activity in our
task is stimulus-locked and not response-locked. They suggest that in the absence of a gap, it is the
stimulus-locked activity that appears as response-locked. When a gap is added, stimulus-locked
and response-locked activity are separated, thus removing the ambiguity.
82
83
Figure 4.6: EEG analysis results. (A) Grand-average stimulus-locked (left panels) and responselocked (right panels) ERPs at electrode Pz for confident and unconfident trials for the no-gap
task (top panels, n=8 subjects) and gap task (bottom panels, n=9 subjects). Solid lines show the
grand average ERP and the shaded areas indicate the standard error of the mean across subjects.
Horizontal lavender lines above each plot denote regions where ERPs are significantly different (p
< .05, cluster-based permutation test). (B) Topographic plots are shown for the difference between
confident and unconfident conditions at 500 ms post-stimulus (left panels, plots 1 and 3) and 100
ms pre-response (right panels, plots 2 and 4). In the no-gap task, differences between confident
and unconfident conditions can be seen over parietal channels in both the stimulus-locked epoch
(plot 1) and the response-locked epoch (plot 2). In the gap task, a clear difference is seen in the
stimulus locked epoch, but not in the response-locked epoch. Further, the topographies in the
response-locked epoch of the no-gap task (plot 2) and the stimulus-locked epoch of the gap task
(plot 3) have similar characteristics. These results suggest that the response-locked confidencerelated activity in the no-gap task is actually a stimulus-locked phenomenon that is confounded by
the lack of a gap.
4.2.3 Cortical sources of confidence-related activity
Having identified that confidence-related activity is stimulus-locked in the presence of a gap, we
then used eLORETA source-localization analysis to explore which brain regions contain sources
of confidence-related activity in the gap task. We used a template head model and template electrode coordinates in order to perform this analysis (Section 4.1.4.2). We first confirmed that using
a template head model gives sources that are comparable to those obtained using subject-specific
head models based on MRI data and co-registered EEG electrode locations. To this end, we used
the Localize-MI dataset, a publicly available dataset developed for the express purpose of validating source-localization methods [156]. The dataset features high-density EEG data that was
recorded while subjects received intracranial electrical stimulation. The locations of the stimulating electrodes serve as the ground-truth for source-localization methods. Our analysis revealed
that eLORETA with both subject-specific and template anatomy performed significantly better than
chance level in localizing the stimulating electrodes (p ≤ 1.2e-12, paired t-test). Further, performance when using subject anatomy was not significantly better than when using template anatomy
(p = .1, paired t-test). This result indicates that template anatomy can indeed be used to perform
reliable source analysis. The details of this validation analysis are shown in Appendix A.3.
84
Figure 4.7: EEG source localization analysis. Neural sources of confidence-related activity 500ms
post-stimulus in the gap task, computed using the ELORETA source localization algorithm. A
cluster-based permutation test revealed a significant difference between confident and unconfident
sources (p < .01). The p-values of below-threshold clusters are shown superimposed on the MNI
template brain. No significant difference was found in the response-locked epoch of the gap task.
Having validated our source analysis pipeline, we now discuss the results of our source analysis
on the gap task EEG data. EEG data was first down-sampled to 20 Hz and separated into confident
and unconfident trials. Next, we computed a confident trial average and an unconfident trial average
for each subject. We then used eLORETA to compute the neural sources for both the stimuluslocked epoch (500ms post-stimulus) and the response-locked epoch (100ms pre-response) for each
subject and condition (confident, unconfident). This left us with 1 confident source map and one
unconfident source map for each subject. Lastly, we used a cluster-based permutation test across
subjects to test if there was a significant difference between confident and unconfident sources. We
found a significant difference (p < .01) in the stimulus-locked epoch of the gap task, but not in
the response locked epoch, with below-threshold clusters in the right occipital and temporal lobes
(Figure 4.7). This result supports our previous finding that, in the presence of a post-stimulus gap,
confidence-related activity is stimulus-locked and not response-locked.
4.2.4 Confidence can be decoded from single trial stimulus-locked pre-response
EEG activity
Our results thus far suggest that the neural correlates of confidence are stimulus-locked, and that
these correlates appear prior to the subject’s response when a gap is present. We next sought to
85
Figure 4.8: Confidence classification results using just stimulus-locked pre-response activity in the
gap task. (A) The receiver operating characteristic (ROC) curve for the logistic classifier, constructed by pooling test trials across all subjects. (B) Average performance across all 11 subjects
and 5 cross-validation folds, for 6 classification methods. Error bars show the standard deviation.
All methods performed significantly above chance level (p < 1e-8). The logistic classifier performed significantly better than the other 5 methods (p < .01). We note that the logistic classifier
outperformed EEGNet despite using fewer samples as input. Classifier abbreviations are as follows: svm (support vector machine), log (logistic regression), mlp (multilayer perceptron neural
network), cnn (convolutional neural network), rg (Riemannian geometry). (C) Classifier performance for each subject, averaged across folds. For all subjects, at least 2 of the 5 classifiers could
decode confidence significantly above chance level (p < .05). Significant decoding (p < .05) is
marked with a star. All p-values represent the results of a Benjamini-Hochberg corrected t-test.
86
decode confidence from single trial stimulus-locked data in the gap task. Note that while various
studies have looked at decoding of confidence, here by doing the decoding in the new gap task,
we could ensure that the decoding only used the pre-response activity and that activity related to
response execution was not helping the decoding results. Thus, our decoding can inform whether
a BCI that requires reliable decoding pre-response can be constructed.
We used a battery of classifiers, described in Section 4.1.5, to decode confidence from singletrial, stimulus-locked data from the gap task. All classifiers were evaluated via 5-fold stratified
cross-validation. Figure 4.8 shows the average AUC across all subjects (panel B) and across folds
for each subject (panel C). P-values for across-subject and per-subject results were corrected for
multiple comparisons via the Benjamini-Hochberg procedure [179]. For pooled results across
subjects, all classifiers were able to classify confidence with significantly above chance accuracy (p
≤ 1e-8, N=55, 1-sample t-test; Figure 4.8B), suggesting that confidence is robustly decodable from
single trial stimulus-locked pre-response EEG activity. This shows feasibility of a BCI that needs to
decode prior to response. Further, for 8 out of 11 subjects, 5 out of 6 classifiers achieved significant
decoding, while for the remaining 3 subjects, at least 2 out of 6 classifiers were significant (p ≤
.05, N=5, 1-sample t-test; Figure 4.8C). Notably, of all the classification methods, the logistic
classifier performed best, with an average AUC of .75 across all subjects and folds. Also, logistic
classification was significant in 10 out of the 11 individual subjects (Figure 4.8C). It is likely that
the logistic classifier performed better than the more sophisticated deep learning methods because
it requires relatively fewer training samples for accurate model fitting.
4.2.5 Decoded confidence can be used to improve task performance in a
simulated BCI
We used the simulation framework described in Section 4.1.6 to investigate the effectiveness of our
confidence decoder in enabling a performance improving BCI. We explored the advantage of such
a BCI as a function of task difficulty and error cost, which are quantified by u and k in equations
(4.1) and (4.2) respectively. The operating points for the BCI and control decoders are shown in
87
Figure 4.9: Simulation framework shows that confidence classifiers can be used in a BCI to improve decision making. (A) Operating points for the BCI and control classifiers, shown over the
receiver operating characteristic (ROC) curve of the logistic classifier applied to real EEG data.
The BCI operating point is the point on the ROC curve that maximizes decoder accuracy when
20% of trials are confident and 80% are unconfident. The control classifier operating point corresponds to declaring confident or unconfident randomly with 50% probability. (B) Simulation
results. Penalized bitrate for both BCI and Control conditions is plotted against the confidence update parameter and error penalty. For each parameter combination, 100,000 trials were simulated.
The results show that the BCI outperforms the control condition in regimes where the confidence
update parameter is low (high difficulty) and the error penalty is high (high cost of error).
88
Figure 4.9A. In Figure 4.9B, we plot the penalized bitrate against the parameters u and k for both
the BCI and control cases. We show only regions of the plot where the penalized bitrate is positive.
The qualitative interpretation of a negative bitrate is that the user is making so many errors that they
are better off not doing the task at all and achieving a bitrate of 0. As such, comparisons between
BCI and control conditions for negative bitrates are not meaningful.
Our results show that the BCI outperforms the control in the high-difficulty (low u), higherror cost (high k) regime. In the low-difficulty, low-error cost regime, however, the control has a
higher penalized bitrate. This is because control trials are designed to be shorter than BCI trials
(Section 4.1.6), and thus for low values of the error penalty k, this difference in trial duration
outweighs the better decision accuracy that the BCI enables – that is, it outweighs the difference
in the bitrate numerator ncorr − k ∗ nincorr. In other words, for low values of error penalty k, it is
better to favor speed over accuracy, and the faster control trials have a higher bitrate. For high error
penalties and high task difficulty, this is no longer the case; the longer trial duration of the BCI is
offset by the fact that the BCI prevents the user from making costly errors. In other words, when
the error penalty k is high, it is better to favor accuracy over speed, and thus BCI trials, which are
more accurate, achieve a higher bitrate.
Our simulation analysis thus shows the feasibility for our pre-response decoder to be used in
a BCI framework in order to improve task performance specifically when the task is difficult and
errors are costly.
4.3 Discussion
In this work, we investigated the neural correlates of confidence in a task with realistic stimuli,
showed that these correlates are stimulus-locked rather than response-locked using a novel experimental design, found that confidence can be reliably decoded before a decision is made and without help from activity related to response execution, and developed a simulated BCI framework to
show how and in what task conditions decoded confidence can be used to improve performance
89
on a stimulus discrimination task. We incorporated a sufficiently long gap between stimulus and
response in our task to allow stimulus- and response-locked brain activity to fully separate. Given
this task design, our ERP and source localization analyses revealed that confidence-related activity
is stimulus-locked. In the absence of a gap, however, this same activity appeared in the responselocked epoch due to leakage of stimulus-locked activity. Using this gap task, we then showed that
purely using stimulus-locked pre-response activity, we could decode confidence and do so better
with a logistic classifier compared to a battery of other classifiers. Finally, we used a simulated BCI
framework to show that a classifier with this level of performance observed in real EEG data can
indeed be used in a BCI in order to improve task performance especially for difficult high-stakes
tasks.
4.3.1 The importance of a post-stimulus gap
To address the question of whether confidence-related activity is stimulus-locked or responselocked, we provided two novel comparisons. First, by comparing a gap and a no-gap task, we
showed that the presence of a sufficiently long post-stimulus gap impacts whether or not confidencerelated activity appears to be stimulus-locked or response-locked. Specifically, our comparison of
the gap and no-gap tasks confirmed the hypothesis that stimulus-locked ERP differences can appear to be response-locked in the absence of a sufficient response delay (gap). To the best of our
knowledge, such a comparison of confidence-related neural activity between two stimulus discrimination tasks that differ only in terms of the presence of a gap has not been done before. Second,
we compared stimulus-locked and response-locked activity for a post-stimulus gap task. This experimental design and the two novel comparisons allowed us to conclude that confidence related
activity is stimulus-locked and not response-locked, and that confidence can be decoded purely
from stimulus-locked activity without help from activity related to response execution.
Our finding that in our task confidence does not have a response-locked element is also interesting in light of prior works regarding confidence formation. Prior works have suggested different
models for confidence formation, some suggesting that it may precede decision making [137], and
90
some suggesting that it may be a secondary process that occurs after making a decision [180, 181].
The former is more aligned with our finding that the confidence representation in our gap task
is exclusively stimulus-locked. Exploring evidence for various theories of confidence formation
across different tasks is an interesting topic for future investigations. As mentioned above, one reason that a post-stimulus gap is important is that it prevents stimulus- and response-locked activity
from overlapping. For example, the response-locked difference in the no-gap task (Figure 4.6A,
top right) could be due to the difference in reaction times between confident and unconfident trials
[182]. The median reaction time for confident trials was 205ms faster than for unconfident trials
(Figure 4.5D), and as such, it is possible that the confident response-locked ERP captures the return to baseline following the stimulus-locked response, while the non-confident ERP has already
returned to baseline. The post-stimulus gap prevents reaction time differences from influencing the
response-locked ERP in this way.
This is, however, not the only reason a post-stimulus gap should be considered. For example,
it has been shown that an ERP known as the error negativity (ERN) occurs in the response-locked
epoch when stimulus processing continues after an incorrect response is made [183]. ERN amplitudes are typically stronger when a subject is aware that they made an error. It is possible that a
long post-stimulus gap prevents subjects from making hasty decisions that lead to ‘avoidable’ errors that they are aware of, thereby attenuating the ERN. This phenomenon could possibly explain
the lack of a response-locked difference between confident and unconfident conditions in the gap
task. In studies where confidence, but not error monitoring, is the cognitive state of interest, a gap
may be necessary to dissociate confidence-related activity from error-related activity.
Another phenomenon to consider is that of temporal scaling. Prior work has shown that rather
than occurring at fixed latencies relative to events, certain neural responses can stretch or compress
to fill a stimulus-response gap [184–186]. When characterizing particular neural responses, it is
important to understand whether the response occurs at a fixed latency, or if it scales with the
stimulus-response gap. Among other reasons, this distinction is important because fixed-latency
responses may appear in the response-locked epoch with a small gap, while temporally-scaled
91
responses may not. One way of making this distinction is to compare activity between tasks where
the only difference is an enforced stimulus-response gap. In our study, we showed that in the
absence of a gap, stimulus-locked and response-locked sources of neural activity do indeed overlap,
which can suggest that the neural activity we observed occurs at a fixed latency relative to stimulus.
4.3.2 Neural sources of confidence
Our source analysis found below-threshold clusters in the right occipital and temporal lobes, suggesting that these brain regions are involved in the neural representation of confidence. The temporal and occipital lobes are a part of the ventral visual pathway, which is known to be involved
in object recognition and visual memory [187]. Our results suggest that these regions may also be
involved in representing confidence following decisions in a visual object recognition task. Several
types of visual processing have also been shown to be lateralized – for example, evidence suggests
that language processing occurs primarily in the left hemisphere, while face recognition occurs
primarily in the right hemisphere [188–190]. Some aspects of object-based attention and categorization may also be lateralized to the right hemisphere [191]. These phenomena may contribute
to our observation that neural sources of confidence in our task appear to be right-side lateralized.
Recent work in [192] involved the simultaneous recording of EEG and fMRI from subjects
performing a perceptual decision-making task, during which subjective confidence reports were
collected. In this study, analysis of post-stimulus EEG data revealed confidence-related activity
in parietal-occipital and frontal-temporal regions, while fMRI analysis showed activity in the left
middle frontal gyrus, premotor cortex, superior parietal lobule, right caudate nucleus, and cerebellum. These EEG results generally match what we find in our source localization analysis in
Figure 4.7 (i.e., occipital and temporal sources of neural activity), while the fMRI analysis finds
different, although functionally related, brain regions. Specifically, the superior parietal lobule has
close links with the occipital lobe and is involved in visuospatial perception [193]. We note that
the primary goal of [192] was the development of a team-based decision making BCI, and as such
92
did not perform stimulus- and response-locked comparisons of gap and no-gap versions of their
task, which is our main focus.
4.3.3 Confidence classifier is viable for use in a real-time BCI
The use of the gap task allowed us to explore whether, using just stimulus-locked pre-response activity, we can decode confidence. Indeed, the gap duration was chosen to be long enough to prevent
activity related to response execution from interfering and confounding the decoding results. Our
confidence classification analysis on this gap task showed that various classification algorithms
could classify confidence from single-trial stimulus-locked pre-response EEG activity alone. Interestingly, among the considered classifiers, the logistic classifier achieved the highest accuracy,
even outperforming the more sophisticated deep learning methods. In order to prevent overfitting
on our dataset, we had to keep the size and number of layers in our MLP and CNN classifiers
relatively small. It is likely that neural network models could use more complex architectures and
achieve higher performance if more data is collected from each subject. However, collecting more
training data would come at the cost of time and comfort for the subjects and future users of a BCI,
and as such may not be desirable. Here, with a manageable number of trials (640) per subject, we
were able to achieve a relatively high AUC of .75.
Beyond the classifier performance itself, an important point of consideration is whether this
classifier is viable for use within a BCI framework. Viability requires not only that the classifier’s
AUC is sufficient to allow for the BCI to improve the user’s performance, but also that the classifier
can be run in real time. In Section 4.2.5, we performed an extensive simulation that verified the first
point, that our classifiers were accurate enough to enable a BCI for decision making. Regarding
the second point, we note that the time it takes for the trained classifiers to produce an output once
given input data is negligible compared to the other delays present in the BCI system, such as
reaction times and ERP latencies. For these reasons, these confidence classifiers are indeed viable
for use in a real-time BCI.
93
4.3.4 BCI Applications
We have shown that within our simulation framework, a realistic, imperfect confidence decoder
can be used to improve performance on a stimulus discrimination task. This simulation assumes
that the BCI has the ability to repeat the stimulus, i.e., that the task takes place in a controlled
environment where the stimulus presentation can be controlled by a computer. Further, since the
decoder uses stimulus-locked data to estimate the user’s confidence, it must have access to the
stimulus onset times. It is important to consider whether it is reasonable to expect these conditions
to hold in realistic use cases, and what can be done in cases where they do not.
In some realistic situations, it is entirely possible that the stimulus can be controlled by the BCI.
For example, in assembly line quality control, a human inspector must look at samples pulled from
the assembly line and determine whether or not they are up to standard. We envision a use case
where the inspector is presented with images of products coming off the assembly line and must
determine whether they are defective or not. In such a case, our BCI setup is directly applicable, as
it is possible for the BCI to repeat or adjust the presentation of such images. Additionally, since the
BCI can control when the images appear, the decoder would have access to stimulus onset times.
There are, however, situations where the aforementioned conditions do not hold. If the user’s
task involves reacting to and making decisions regarding stimuli that occur naturally, at random, or
in some other way that cannot be controlled by a computer, then our BCI framework would require
some adjustments to be applicable. First, an event-detection method is needed so that stimuluslocked data can be collected. Prior work has developed algorithms that can detect behavioral or
stimulus events in real time from neural activity [55, 105]. Such algorithms can be used alongside the BCI in order to determine when stimuli appear. Once a stimulus is detected, neural data
following the stimulus can then be fed into a classifier as usual. Second, if the BCI cannot repeat
the stimulus, then some other feedback method is required in order to improve the user’s decision
performance. Possible alternatives to stimulus repetition include providing additional task-relevant
information to the user [29], providing input from AI or human agents performing the same task,
or simply informing the user that they are unconfident and should take more time before making a
94
decision. With the use of an event detection algorithm and an appropriate feedback signal, it should
be possible for a BCI to improve the user’s performance even when stimulus times are unknown
and cannot be controlled by a computer.
4.3.5 Future directions
We have shown with numerical simulations that our decoder can be used to improve task performance in a BCI framework. Specifically, the BCI improves performance when the cost of error is
high, and the task is difficult. An important future direction would be to implement and test such a
BCI with human subjects. The end goal of this research would be to develop a BCI that can assist
human users in practical, real-world tasks. As stepping stones to this end goal, BCIs can be constructed for and tested with increasingly complex and more realistic tasks. One such complexity
would be a task in which stimuli cannot be controlled by the BCI, as discussed in Section 4.3.4.
This would require the use of event-detection algorithms, such as those developed in chapters 2
and 3, and the development and testing of feedback methods that do not involve controlling the
stimulus itself as discussed in Section 4.3.4. Future work can explore the use of dynamical modeling methods in the development of decoding algorithms to improve cognitive state decoding [21,
47, 92, 94, 102, 194–197]. Further, adaptively tracking the changes in EEG signals over time or
detecting switches in this activity can further improve the performance of such BCIs [41, 59, 198–
203].
95
Chapter 5
Conclusion
In this thesis, we developed several key components of a performance-improving BCI. We developed event detection and decoding methods for multimodal neural signals that can be used to
enable decoding when event times are unknown. We then showed that decision confidence can
be decoded from noninvasive neural activity and used to provide feedback to a user performing a
decision task.
In chapters 2 and 3, we constructed parametric models that describe how transient events are
encoded in multiple modalities of recorded neural activity, and derived maximum likelihood estimators of event times and classes. We then validated these estimators on both simulated and
real neural activity. We showed that the point-process matched filter can successfully detect and
classify saccades from primate neural activity, and showed that the multimodal event detector can
fuse information from both spike and LFP channels to improve performance over each individual
modality.
In chapter 4, we investigated the neural correlates of decision confidence in EEG activity. We
developed a visual stimulus discrimination task and found that neural activity is modulated by
confidence after the stimulus, but not after the response, in the presence of a post-stimulus gap. We
then showed that decision confidence can be decoded from single-trial EEG activity, and developed
a simulation to demonstrate that decoded confidence can be used to provide feedback and improve
decision accuracy in a simulated BCI.
96
Future work can use the methods presented here to develop a BCI system and test it with human
subjects. It would also be important to investigate additional ways in which decoded cognitive
states, such as confidence and attention, can be used to design feedback signals that improve a
user’s performance on a variety of tasks.
97
References
1. Moran, D. Evolution of Brain-Computer Interface: Action Potentials, Local Field Potentials
and Electrocorticograms. Current opinion in neurobiology 20, 741–745. doi:10.1016/j.
conb.2010.09.010 (2010).
2. Shanechi, M. M. Brain–Machine Interface Control Algorithms. IEEE Transactions on Neural Systems and Rehabilitation Engineering 25, 1725–1734 (2017).
3. Brandman, D. M., Cash, S. S. & Hochberg, L. R. Review: Human Intracortical Recording
and Neural Decoding for Brain Computer Interfaces. IEEE transactions on neural systems
and rehabilitation engineering : a publication of the IEEE Engineering in Medicine and
Biology Society 25, 1687–1696. doi:10.1109/TNSRE.2017.2677443 (2017).
4. Lebedev, M. A. & Nicolelis, M. A. L. Brain–Machine Interfaces: Past, Present and Future.
Trends in Neurosciences 29, 536–546. doi:10.1016/j.tins.2006.07.004 (2006).
5. Donoghue, J. P. Bridging the Brain to the World: A Perspective on Neural Interface Systems.
Neuron 60, 511–521. doi:10.1016/j.neuron.2008.10.037 (2008).
6. Schwartz, A. B., Cui, X. T., Weber, D. J. & Moran, D. W. Brain-Controlled Interfaces:
Movement Restoration with Neural Prosthetics. Neuron 52, 205–220 (2006).
7. Nicolelis, M. A. L. & Lebedev, M. A. Principles of Neural Ensemble Physiology Underlying
the Operation of Brain-Machine Interfaces. Nature reviews. Neuroscience 10, 530 (2009).
8. Hatsopoulos, N. G. & Suminski, A. J. Sensing with the Motor Cortex. Neuron 72, 477–487
(2011).
9. Thakor, N. V. Translating the Brain-Machine Interface. Science translational medicine 5,
210–217 (2013).
10. Andersen, R. A., Kellis, S., Klaes, C. & Aflalo, T. Toward More Versatile and Intuitive
Cortical Brain–Machine Interfaces. Current Biology 24, R885–R897 (2014).
11. Shenoy, K. V. & Carmena, J. M. Combining Decoder Design and Neural Adaptation in
Brain-Machine Interfaces. Neuron 84, 665–680 (2014).
12. Sajda, P., Muller, K. & Shenoy, K. V. Brain-Computer Interfaces [from the Guest Editors].
IEEE Signal Processing Magazine 25, 16–17. doi:10.1109/MSP.2008.4408438 (2008).
98
13. Sadtler, P. T., Quick, K. M., Golub, M. D., Chase, S. M., Ryu, S. I., Tyler-Kabara, E. C.,
et al. Neural Constraints on Learning. Nature 512, 423–426. doi:10.1038/nature13665
(2014).
14. Bolus, M. F., Willats, A. A., Whitmire, C. J., Rozell, C. J. & Stanley, G. B. Design Strategies
for Dynamic Closed-Loop Optogenetic Neurocontrol in Vivo. Journal of Neural Engineering 15, 026011. doi:10.1088/1741-2552/aaa506 (2018).
15. Parra, L. C., Christoforou, C., Gerson, A. C., Dyrholm, M., Luo, A., Wagner, M., et al.
Spatiotemporal Linear Decoding of Brain State. IEEE Signal Processing Magazine 25, 107–
115. doi:10.1109/MSP.2008.4408447 (2008).
16. Cinel, C., Valeriani, D. & Poli, R. Neurotechnologies for Human Cognitive Augmentation:
Current State of the Art and Future Prospects. Frontiers in Human Neuroscience 13. doi:10.
3389/fnhum.2019.00013 (2019).
17. van Erp, J., Lotte, F. & Tangermann, M. Brain-Computer Interfaces: Beyond Medical Applications. Computer 45, 26–34. doi:10.1109/MC.2012.107 (2012).
18. Gramann, K., Fairclough, S. H., Zander, T. O. & Ayaz, H. Editorial: Trends in Neuroergonomics. Frontiers in Human Neuroscience 11. doi:10.3389/fnhum.2017.00165 (2017).
19. Naseer, N., Ayaz, H. & Dehais, F. Portable and Wearable Brain Technologies for Neuroenhancement and Neurorehabilitation. BioMed Research International 2018. doi:10.1155/
2018/1806374 (2018).
20. Zander, T. O. & Kothe, C. Towards Passive Brain–Computer Interfaces: Applying Brain–Computer
Interface Technology to Human–Machine Systems in General. Journal of Neural Engineering 8, 025005. doi:10.1088/1741-2560/8/2/025005 (2011).
21. Sani, O. G., Yang, Y., Lee, M. B., Dawes, H. E., Chang, E. F. & Shanechi, M. M. Mood Variations Decoded from Multi-Site Intracranial Human Brain Activity. Nature Biotechnology
36, 954–961. doi:10.1038/nbt.4200 (2018).
22. Yang, Y., Connolly, A. T. & Shanechi, M. M. A Control-Theoretic System Identification
Framework and a Real-Time Closed-Loop Clinical Simulation Testbed for Electrical Brain
Stimulation. Journal of Neural Engineering 15, 066007 (2018).
23. Boldt, A. & Yeung, N. Shared Neural Markers of Decision Confidence and Error Detection.
Journal of Neuroscience 35, 3478–3484. doi:10.1523/JNEUROSCI.0797-14.2015 (2015).
24. Fleming, S. M., Huijgen, J. & Dolan, R. J. Prefrontal Contributions to Metacognition in Perceptual Decision Making. Journal of Neuroscience 32, 6117–6125. doi:10.1523/JNEUROSCI.
6489-11.2012 (2012).
99
25. Kubanek, J., Hill, J., Snyder, L. H. & Schalk, G. Cortical Alpha Activity Predicts the Confidence in an Impending Action. Frontiers in Neuroscience 9. doi:10.3389/fnins.2015.
00243 (2015).
26. Gherman, S. & Philiastides, M. G. Neural Representations of Confidence Emerge from
the Process of Decision Formation during Perceptual Choices. NeuroImage 106, 134–143.
doi:10.1016/j.neuroimage.2014.11.036 (2015).
27. Graziano, M., Parra, L. C. & Sigman, M. Neural Correlates of Perceived Confidence in a
Partial Report Paradigm. Journal of Cognitive Neuroscience 27, 1090–1103. doi:10.1162/
jocn_a_00759 (2015).
28. Herding, J., Ludwig, S., von Lautz, A., Spitzer, B. & Blankenburg, F. Centro-Parietal EEG
Potentials Index Subjective Evidence and Confidence during Perceptual Decision Making.
NeuroImage 201, 116011. doi:10.1016/j.neuroimage.2019.116011 (2019).
29. Desender, K., Murphy, P., Boldt, A., Verguts, T. & Yeung, N. A Post-Decisional Neural
Marker of Confidence Predicts Information-Seeking in Decision-Making. Journal of Neuroscience, 2620–18. doi:10.1523/JNEUROSCI.2620-18.2019 (2019).
30. Krumpe, T., Gerjets, P., Rosenstiel, W. & Spuler, M. Decision Confidence: EEG Correlates ¨
of Confidence in Different Phases of an Old/New Recognition Task. Brain-Computer Interfaces 6, 162–177. doi:10.1080/2326263X.2019.1708539 (2019).
31. Moran, D. W. & Schwartz, A. B. Motor Cortical Representation of Speed and Direction
during Reaching. J.Neurophysiol. 82, 2676–2692. doi:10 . 1152 / jn . 1999 . 82 . 5 . 2676
(1999).
32. DeAngelis, G. C., Ohzawa, I. & Freeman, R. D. Receptive-Field Dynamics in the Central
Visual Pathways. Trends in Neurosciences 18, 451–458. doi:10 . 1016 / 0166 - 2236(95 )
94496-R (1995).
33. Ahrens, M. B., Paninski, L. & Sahani, M. Inferring Input Nonlinearities in Neural Encoding
Models. Network (Bristol, England) 19, 35–67. doi:10.1080/09548980701813936 (2008).
34. Aertsen, A. M. H. J. & Johannesma, P. I. M. The Spectro-Temporal Receptive Field. Biological Cybernetics 42, 133–143. doi:10.1007/BF00336731 (1981).
35. Depireux, D. A., Simon, J. Z., Klein, D. J. & Shamma, S. A. Spectro-Temporal Response
Field Characterization with Dynamic Ripples in Ferret Primary Auditory Cortex. Journal
of Neurophysiology 85, 1220–1234. doi:10.1152/jn.2001.85.3.1220 (2001).
36. Fernandes, H. L., Stevenson, I. H., Phillips, A. N., Segraves, M. A. & Kording, K. P.
Saliency and Saccade Encoding in the Frontal Eye Field During Natural Scene Search.
Cerebral Cortex (New York, NY) 24, 3232–3245. doi:10.1093/cercor/bht179 (2014).
100
37. Marrocco, R. T. & Li, R. H. Monkey Superior Colliculus: Properties of Single Cells and
Their Afferent Inputs. Journal of Neurophysiology 40, 844–860. doi:10.1152/jn.1977.
40.4.844 (1977).
38. White, B. J., Berg, D. J., Kan, J. Y., Marino, R. A., Itti, L. & Munoz, D. P. Superior Colliculus Neurons Encode a Visual Saliency Map during Free Viewing of Natural Dynamic
Video. Nature Communications 8, 14263. doi:10.1038/ncomms14263 (2017).
39. Linden, J. F., Liu, R. C., Sahani, M., Schreiner, C. E. & Merzenich, M. M. Spectrotemporal
Structure of Receptive Fields in Areas AI and AAF of Mouse Auditory Cortex. Journal of
Neurophysiology 90, 2660–2675. doi:10.1152/jn.00751.2002 (2003).
40. Wu, W., Gao, Y., Bienenstock, E., Donoghue, J. P. & Black, M. J. Bayesian Population
Decoding of Motor Cortical Activity Using a Kalman Filter. Neural Computation 18, 80–
118. doi:10.1162/089976606774841585 (2006).
41. Gilja, V., Nuyujukian, P., Chestek, C. A., Cunningham, J. P., Yu, B. M., Fan, J. M., et
al. A High-Performance Neural Prosthesis Enabled by Control Algorithm Design. Nature
Neuroscience 15, 1752–1757. doi:10.1038/nn.3265 (2012).
42. Brown, E. N., Frank, L. M., Tang, D., Quirk, M. C. & Wilson, M. A. A Statistical Paradigm
for Neural Spike Train Decoding Applied to Position Prediction from Ensemble Firing Patterns of Rat Hippocampal Place Cells. Journal of Neuroscience 18, 7411–7425. doi:10 .
1523/JNEUROSCI.18-18-07411.1998 (1998).
43. Eden, U. T., Frank, L. M., Barbieri, R., Solo, V. & Brown, E. N. Dynamic Analysis of
Neural Encoding by Point Process Adaptive Filtering. Neural Computation 16, 971–998.
doi:10.1162/089976604773135069 (2004).
44. Truccolo, W., Eden, U. T., Fellows, M. R., Donoghue, J. P. & Brown, E. N. A Point Process
Framework for Relating Neural Spiking Activity to Spiking History, Neural Ensemble, and
Extrinsic Covariate Effects. Journal of Neurophysiology 93, 1074–1089. doi:10.1152/jn.
00697.2004 (2005).
45. Shanechi, M. M., Hu, R. C., Powers, M., Wornell, G. W., Brown, E. N. & Williams, Z. M.
Neural Population Partitioning and a Concurrent Brain-Machine Interface for Sequential
Motor Function. Nature Neuroscience 15, 1715–1722. doi:10.1038/nn.3250 (2012).
46. Shanechi, M. M., Orsborn, A. L., Moorman, H. G., Gowda, S., Dangi, S. & Carmena, J. M.
Rapid Control and Feedback Rates Enhance Neuroprosthetic Control. Nature Communications 8, 13825. doi:10.1038/ncomms13825 (2017).
47. Hsieh, H.-L., Wong, Y. T., Pesaran, B. & Shanechi, M. M. Multiscale Modeling and Decoding Algorithms for Spike-Field Activity. Journal of Neural Engineering. doi:10.1088/
1741-2552/aaeb1a (2018).
101
48. Kass, R. E. & Ventura, V. A Spike-Train Probability Model. Neural Computation 13, 1713–
1720 (2001).
49. Citi, L., Ba, D., Brown, E. N. & Barbieri, R. Likelihood Methods for Point Processes with
Refractoriness. Neural computation 26, 237–263 (2014).
50. Banerjee, T., Choi, J., Pesaran, B., Ba, D. & Tarokh, V. Classification of Local Field Potentials Using Gaussian Sequence Model. 2018 IEEE Statistical Signal Processing Workshop
(SSP), 683–687 (2018).
51. Olson, B. P., Si, J., Hu, J. & He, J. Closed-Loop Cortical Control of Direction Using Support
Vector Machines. IEEE Transactions on Neural Systems and Rehabilitation Engineering 13,
72–80. doi:10.1109/TNSRE.2004.843174 (2005).
52. Pesaran, B., Pezaris, J. S., Sahani, M., Mitra, P. P. & Andersen, R. A. Temporal Structure
in Neuronal Activity during Working Memory in Macaque Parietal Cortex. Nature Neuroscience 5, 805–811. doi:10.1038/nn890 (2002).
53. Kemere, C., Santhanam, G., Yu, B. M., Afshar, A., Ryu, S. I., Meng, T. H., et al. Detecting Neural-State Transitions Using Hidden Markov Models for Motor Cortical Prostheses.
Journal of Neurophysiology 100, 2441–2452. doi:10.1152/jn.00924.2007 (2008).
54. Kao, J. C., Nuyujukian, P., Ryu, S. I. & Shenoy, K. V. A High-Performance Neural Prosthesis Incorporating Discrete State Selection With Hidden Markov Models. IEEE Transactions
on Biomedical Engineering 64, 935–945. doi:10.1109/TBME.2016.2582691 (2017).
55. Bokil, H., Pesaran, B., Andersen, R. & Mitra, P. A Method for Detection and Classification
of Events in Neural Activity. IEEE Transactions on Biomedical Engineering 53, 1678–
1687. doi:10.1109/TBME.2006.877802 (2006).
56. Bulkin, D. A. & Groh, J. M. Distribution of Visual and Saccade Related Information in the
Monkey Inferior Colliculus. Frontiers in Neural Circuits 6. doi:10.3389/fncir.2012.
00061 (2012).
57. Goldberg, M. E. & Wurtz, R. H. Activity of Superior Colliculus in Behaving Monkey.
I. Visual Receptive Fields of Single Neurons. Journal of Neurophysiology 35, 542–559.
doi:10.1152/jn.1972.35.4.542 (1972).
58. Daley, D. J. & Vere-Jones, D. An Introduction to the Theory of Point Processes: Volume II:
General Theory and Structure (Springer Science & Business Media, 2007).
59. Hsieh, H.-L. & Shanechi, M. M. Optimizing the Learning Rate for Adaptive Estimation
of Neural Encoding Models. PLOS Computational Biology 14, e1006168. doi:10.1371/
journal.pcbi.1006168 (2018).
60. Bar-David, I. Communication under the Poisson Regime. IEEE Transactions on Information
Theory 15, 31–37 (1969).
102
61. Itti, L. Visual Salience. Scholarpedia 2, 3327. doi:10.4249/scholarpedia.3327 (2007).
62. Smith, A. C. & Brown, E. N. Estimating a State-Space Model from Point Process Observations. Neural Comput. 15, 965–991 (2003).
63. Shanechi, M. M., Orsborn, A. L. & Carmena, J. M. Robust Brain-Machine Interface Design
Using Optimal Feedback Control Modeling and Adaptive Point Process Filtering. PLOS
Computational Biology 12, e1004730. doi:10.1371/journal.pcbi.1004730 (2016).
64. Shanechi, M. M., Williams, Z. M., Wornell, G. W., Hu, R., Powers, M. & Brown, E. N. A
Real-Time Brain-Machine Interface Combining Motor Target and Trajectory Intent Using
an Optimal Feedback Control Design. PLOS ONE 8, e59049 (2013).
65. Yang, Y. & Shanechi, M. M. An adaptive and generalizable closed-loop system for control
of medically induced coma and other states of anesthesia. en. Journal of Neural Engineering
13. Publisher: IOP Publishing, 066019. doi:10.1088/1741-2560/13/6/066019 (2016).
66. Agarwal, R., Chen, Z., Kloosterman, F., Wilson, M. A. & Sarma, S. V. A Novel Nonparametric Approach for Neural Encoding and Decoding Models of Multimodal Receptive Fields.
Neural Computation 28, 1356–1387. doi:10.1162/NECO_a_00847 (2016).
67. Kang, X., Sarma, S. V., Santaniello, S., Schieber, M. & Thakor, N. V. Task-Independent
Cognitive State Transition Detection from Cortical Neurons during 3-D Reach-to-Grasp
Movements. IEEE Transactions on Neural Systems and Rehabilitation Engineering 23,
676–682 (2015).
68. Weisstein, E. W. Full Width at Half Maximum. From MathWorld–A Wolfram Web Resource
(2014).
69. Tibshirani, R. Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological) 58, 267–288 (1996).
70. Marino, R. A., Levy, R., Boehnke, S., White, B. J., Itti, L. & Munoz, D. P. Linking Visual
Response Properties in the Superior Colliculus to Saccade Behavior. European Journal of
Neuroscience 35, 1738–1752. doi:10.1111/j.1460-9568.2012.08079.x (2012).
71. Dorrscheidt, G. H. The Statistical Significance of the Peristimulus Time Histogram (PSTH).
Brain Research 220, 397–401. doi:10.1016/0006-8993(81)91232-4 (1981).
72. Truccolo, W., Hochberg, L. R. & Donoghue, J. P. Collective Dynamics in Human and Monkey Sensorimotor Cortex: Predicting Single Neuron Spikes. Nature Neuroscience 13, 105–
111. doi:10.1038/nn.2455 (2010).
73. Fawcett, T. An Introduction to ROC Analysis. Pattern Recognition Letters. ROC Analysis
in Pattern Recognition 27, 861–874. doi:10.1016/j.patrec.2005.10.010 (2006).
103
74. Rodgers, J. L. & Nicewander, W. A. Thirteen Ways to Look at the Correlation Coefficient.
The American Statistician 42, 59–66. doi:10.2307/2685263 (1988).
75. Rule, M. E., Vargas-Irwin, C., Donoghue, J. P. & Truccolo, W. Contribution of LFP Dynamics to Single-Neuron Spiking Variability in Motor Cortex during Movement Execution.
Frontiers in Systems Neuroscience 9. doi:10.3389/fnsys.2015.00089 (2015).
76. Itti, L., Koch, C. & Niebur, E. A Model of Saliency-Based Visual Attention for Rapid Scene
Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 1254–1259.
doi:10.1109/34.730558 (1998).
77. Treue, S. Visual Attention: The Where, What, How and Why of Saliency. Current Opinion
in Neurobiology 13, 428–432. doi:10.1016/S0959-4388(03)00105-3 (2003).
78. Ptak, R. The Frontoparietal Attention Network of the Human Brain: Action, Saliency,
and a Priority Map of the Environment. The Neuroscientist 18, 502–515. doi:10 . 1177 /
1073858411409051 (2012).
79. Li, J., Levine, M. D., An, X., Xu, X. & He, H. Visual Saliency Based on Scale-Space
Analysis in the Frequency Domain. IEEE Transactions on Pattern Analysis and Machine
Intelligence 35, 996–1010. doi:10.1109/TPAMI.2012.147 (2013).
80. Brown, E. N., Barbieri, R., Ventura, V., Kass, R. E. & Frank, L. M. The Time-Rescaling
Theorem and Its Application to Neural Spike Train Data Analysis. Neural computation 14,
325–346. doi:10.1162/08997660252741149 (2002).
81. Cynader, M. & Berman, N. Receptive-Field Organization of Monkey Superior Colliculus.
Journal of Neurophysiology 35, 187–201. doi:10.1152/jn.1972.35.2.187 (1972).
82. Fecteau, J. H. & Munoz, D. P. Correlates of Capture of Attention and Inhibition of Return across Stages of Visual Processing. Journal of Cognitive Neuroscience 17, 1714–1727.
doi:10.1162/089892905774589235 (2005).
83. McIlwain, J. T. Visual Receptive Fields and Their Images in Superior Colliculus of the Cat.
Journal of Neurophysiology 38, 219–230. doi:10.1152/jn.1975.38.2.219 (1975).
84. Markowitz, D. A., Curtis, C. E. & Pesaran, B. Multiple Component Networks Support
Working Memory in Prefrontal Cortex. Proceedings of the National Academy of Sciences
112, 11084–11089. doi:10.1073/pnas.1504172112 (2015).
85. Huang, Y., Brandon, M. P., Griffin, A. L., Hasselmo, M. E. & Eden, U. T. Decoding Movement Trajectories Through a T-Maze Using Point Process Filters Applied to Place Field
Data from Rat Hippocampal Region CA1. Neural Computation 21, 3305–3334. doi:10 .
1162/neco.2009.10-08-893 (2009).
104
86. Lawhern, V., Wu, W., Hatsopoulos, N. & Paninski, L. Population Decoding of Motor Cortical Activity Using a Generalized Linear Model with Hidden States. Journal of Neuroscience
Methods 189, 267–280. doi:10.1016/j.jneumeth.2010.03.024 (2010).
87. Bair, W. & Koch, C. Temporal Precision of Spike Trains in Extrastriate Cortex of the Behaving Macaque Monkey. Neural Computation 8, 1185–1202. doi:10.1162/neco.1996.
8.6.1185 (1996).
88. Butts, D. A., Weng, C., Jin, J., Yeh, C.-I., Lesica, N. A., Alonso, J.-M., et al. Temporal
Precision in the Neural Code and the Timescales of Natural Vision. Nature 449, 92–95.
doi:10.1038/nature06105 (2007).
89. Reinagel, P. & Reid, R. C. Temporal Coding of Visual Information in the Thalamus. Journal
of Neuroscience 20, 5392–5400. doi:10.1523/JNEUROSCI.20-14-05392.2000 (2000).
90. Mehta, M. R., Lee, A. K. & Wilson, M. A. Role of Experience and Oscillations in Transforming a Rate Code into a Temporal Code. Nature 417, 741. doi:10.1038/nature00807
(2002).
91. Pesaran, B., Vinck, M., Einevoll, G. T., Sirota, A., Fries, P., Siegel, M., et al. Investigating
Large-Scale Brain Dynamics Using Field Potential Recordings: Analysis and Interpretation.
Nature Neuroscience 21, 903. doi:10.1038/s41593-018-0171-8 (2018).
92. Yang, Y., Sani, O. G., Chang, E. F. & Shanechi, M. M. Dynamic Network Modeling and
Dimensionality Reduction for Human ECoG Activity. Journal of Neural Engineering 16,
056014. doi:10.1088/1741-2552/ab2214 (2019).
93. Wang, C. & Shanechi, M. M. Estimating Multiscale Direct Causality Graphs in Neural
Spike-Field Networks. IEEE Transactions on Neural Systems and Rehabilitation Engineering 27, 857–866. doi:10.1109/TNSRE.2019.2908156 (2019).
94. Abbaspourazad, H., Hsieh, H. & Shanechi, M. M. A Multiscale Dynamical Modeling and
Identification Framework for Spike-Field Activity. IEEE Transactions on Neural Systems
and Rehabilitation Engineering 27, 1128–1138. doi:10 . 1109 / TNSRE . 2019 . 2913218
(2019).
95. Bighamian, R., Wong, Y. T., Pesaran, B. & Shanechi, M. M. Sparse Model-Based Estimation of Functional Dependence in High-Dimensional Field and Spike Multiscale Networks.
Journal of Neural Engineering. doi:10.1088/1741-2552/ab225b (2019).
96. Stavisky, S. D., Kao, J. C., Nuyujukian, P., Ryu, S. I. & Shenoy, K. V. A High Performing Brain–Machine Interface Driven by Low-Frequency Local Field Potentials Alone and
Together with Spikes. Journal of neural engineering 12, 036009 (2015).
97. Bansal, A. K., Truccolo, W., Vargas-Irwin, C. E. & Donoghue, J. P. Decoding 3D Reach and
Grasp from Hybrid Signals in Motor and Premotor Cortices: Spikes, Multiunit Activity, and
Local Field Potentials. Journal of neurophysiology 107, 1337–1355 (2011).
105
98. Mehring, C., R., J., Vaadia, E., de Oliveira, S. C., Aertsen, A. & Rotter, S. Inference of Hand
Movements from Local Field Potentials in Monkey Motor Cortex. Nature neuroscience 6,
1253 (2003).
99. Belitski, A., Panzeri, S., Magri, C., Logothetis, N. K. & Kayser, C. Sensory Information
in Local Field Potentials and Spikes from Visual and Auditory Cortices: Time Scales and
Frequency Bands. Journal of Computational Neuroscience 29, 533–545. doi:10 . 1007 /
s10827-010-0230-y (2010).
100. Perel, S., Sadtler, P. T., Oby, E. R., Ryu, S. I., Tyler-Kabara, E. C., Batista, A. P., et al.
Single-Unit Activity, Threshold Crossings, and Local Field Potentials in Motor Cortex Differentially Encode Reach Kinematics. Journal of Neurophysiology 114, 1500–1512. doi:10.
1152/jn.00293.2014 (2015).
101. Eden, U. T., Frank, L. M. & Tao, L. in Dynamic Neuroscience: Statistics, Modeling, and
Control (eds Chen, Z. & Sarma, S. V.) 29–52 (Springer International Publishing, Cham,
2018). doi:10.1007/978-3-319-71976-4_2.
102. Abbaspourazad, H., Choudhury, M., Wong, Y. T., Pesaran, B. & Shanechi, M. M. Multiscale Low-Dimensional Motor Cortical State Dynamics Predict Naturalistic Reach-andGrasp Behavior. Nature Communications 12, 607. doi:10.1038/s41467- 020- 20197- x
(2021).
103. Wang, C., Pesaran, B. & Shanechi, M. M. Modeling multiscale causal interactions between
spiking and field potential signals during behavior. en. Journal of Neural Engineering 19.
Publisher: IOP Publishing, 026001. doi:10.1088/1741-2552/ac4e1c (2022).
104. Giannakis, G. & Tsatsanis, M. Signal Detection and Classification Using Matched Filtering
and Higher Order Statistics. IEEE Transactions on Acoustics, Speech, and Signal Processing 38, 1284–1296. doi:10.1109/29.57557 (1990).
105. Sadras, N., Pesaran, B. & Shanechi, M. M. A Point-Process Matched Filter for Event Detection and Decoding from Population Spike Trains. Journal of Neural Engineering 16,
066016. doi:10.1088/1741-2552/ab3dbc (2019).
106. Buzsaki, G., Anastassiou, C. A. & Koch, C. The Origin of Extracellular Fields and Currents ´
— EEG, ECoG, LFP and Spikes. Nature Reviews Neuroscience 13, 407–420. doi:10.1038/
nrn3241 (2012).
107. Einevoll, G. T., Kayser, C., Logothetis, N. K. & Panzeri, S. Modelling and Analysis of
Local Field Potentials for Studying the Function of Cortical Circuits. Nature Reviews Neuroscience 14, 770–785. doi:10.1038/nrn3599 (2013).
108. Coleman, T. P. & Sarma, S. S. A Computationally Efficient Method for Nonparametric
Modeling of Neural Spiking Activity with Point Processes. Neural Computation 22, 2002–
2030. doi:10.1162/NECO_a_00001-Coleman (2010).
106
109. Chen, Z. An Overview of Bayesian Methods for Neural Spike Train Analysis. Computational Intelligence and Neuroscience 2013, 251905. doi:10.1155/2013/251905 (2013).
110. Boldt, A. & Yeung, N. Shared Neural Markers of Decision Confidence and Error Detection.
Journal of Neuroscience 35, 3478–3484. doi:10.1523/JNEUROSCI.0797-14.2015 (2015).
111. Fernandez-Vargas, J., Tremmel, C., Valeriani, D., Bhattacharyya, S., Cinel, C., Citi, L., et
al. Subject- and Task-Independent Neural Correlates and Prediction of Decision Confidence
in Perceptual Decision Making. Journal of Neural Engineering 18, 046055. doi:10.1088/
1741-2552/abf2e4 (2021).
112. Sadras, N., Sani, O. G., Ahmadipour, P. & Shanechi, M. M. Post-Stimulus Encoding of
Decision Confidence in EEG: Toward a Brain–Computer Interface for Decision Making.
Journal of Neural Engineering 20, 056012. doi:10.1088/1741-2552/acec14 (2023).
113. Shanechi, M. M. Brain–Machine Interfaces from Motor to Mood. Nature Neuroscience 22,
1554–1564. doi:10.1038/s41593-019-0488-y (2019).
114. Allison, B. Z., Brunner, C., Altstatter, C., Wagner, I. C., Grissmann, S. & Neuper, C. A ¨
Hybrid ERD/SSVEP BCI for Continuous Simultaneous Two Dimensional Cursor Control.
Journal of Neuroscience Methods 209, 299–307. doi:10.1016/j.jneumeth.2012.06.022
(2012).
115. Citi, L., Poli, R., Cinel, C. & Sepulveda, F. P300-Based BCI Mouse With GeneticallyOptimized Analogue Control. IEEE Transactions on Neural Systems and Rehabilitation
Engineering 16, 51–61. doi:10.1109/TNSRE.2007.913184 (2008).
116. Galan, F., Nuttin, M., Lew, E., Ferrez, P. W., Vanacker, G., Philips, J., ´ et al. A BrainActuated Wheelchair: Asynchronous and Non-Invasive Brain–Computer Interfaces for Continuous Control of Robots. Clinical Neurophysiology 119, 2159–2169. doi:10 . 1016 / j .
clinph.2008.06.001 (2008).
117. Krusienski, D. J., Sellers, E. W., Cabestaing, F., Bayoudh, S., McFarland, D. J., Vaughan,
T. M., et al. A Comparison of Classification Techniques for the P300 Speller. Journal of
Neural Engineering 3, 299–305. doi:10.1088/1741-2560/3/4/007 (2006).
118. Ma, R., Aghasadeghi, N., Jarzebowski, J., Bretl, T. & Coleman, T. P. A Stochastic Control
Approach to Optimally Designing Hierarchical Flash Sets in P300 Communication Prostheses. IEEE Transactions on Neural Systems and Rehabilitation Engineering 20, 102–112.
doi:10.1109/TNSRE.2011.2179560 (2012).
119. Mak, J. N., Arbel, Y., Minett, J. W., McCane, L. M., Yuksel, B., Ryan, D., et al. Optimizing
the P300-based Brain–Computer Interface: Current Status, Limitations and Future Directions. Journal of Neural Engineering 8, 025003. doi:10.1088/1741-2560/8/2/025003
(2011).
107
120. Murguialday, A. R., Aggarwal, V., Chatterjee, A., Cho, Y., Rasmussen, R., O’Rourke, B.,
et al. Brain-Computer Interface for a Prosthetic Hand Using Local Machine Control and
Haptic Feedback in 2007 IEEE 10th International Conference on Rehabilitation Robotics
(2007), 609–613. doi:10.1109/ICORR.2007.4428487.
121. Omar, C., Akce, A., Johnson, M., Bretl, T., Ma, R., Maclin, E., et al. A Feedback InformationTheoretic Approach to the Design of Brain–Computer Interfaces. International Journal of
Human-Computer Interaction 27, 5–23. doi:10.1080/10447318.2011.535749 (2010).
122. Ortner, R., Allison, B. Z., Korisek, G., Gaggl, H. & Pfurtscheller, G. An SSVEP BCI to Control a Hand Orthosis for Persons With Tetraplegia. IEEE Transactions on Neural Systems
and Rehabilitation Engineering 19, 1–5. doi:10.1109/TNSRE.2010.2076364 (2011).
123. Tonin, L. & Millan, J. d. R. Noninvasive Brain–Machine Interfaces for Robotic Devices. ´
Annual Review of Control, Robotics, and Autonomous Systems 4, 191–214. doi:10.1146/
annurev-control-012720-093904 (2021).
124. Ezzyat, Y. & Rizzuto, D. S. Direct Brain Stimulation during Episodic Memory. Current
Opinion in Biomedical Engineering. Neural Engineering/ Novel Biomedical Technologies:
Neuromodulation 8, 78–83. doi:10.1016/j.cobme.2018.11.004 (2018).
125. Garces Correa, A., Orosco, L. & Laciar, E. Automatic Detection of Drowsiness in EEG ´
Records Based on Multimodal Analysis. Medical Engineering & Physics 36, 244–249.
doi:10.1016/j.medengphy.2013.07.011 (2014).
126. Li, G., Lee, B. & Chung, W. Smartwatch-Based Wearable EEG System for Driver Drowsiness Detection. IEEE Sensors Journal 15, 7169–7180. doi:10.1109/JSEN.2015.2473679
(2015).
127. Pal, N. R., Chuang, C.-Y., Ko, L.-W., Chao, C.-F., Jung, T.-P., Liang, S.-F., et al. EEG-Based
Subject- and Session-independent Drowsiness Detection: An Unsupervised Approach. EURASIP
Journal on Advances in Signal Processing 2008, 519480. doi:10 . 1155 / 2008 / 519480
(2008).
128. Chavarriaga, R., Usˇcumli ´ c, M., Zhang, H., Khaliliardali, Z., Aydarkhanov, R., Saeedi, S., ´ et
al. Decoding Neural Correlates of Cognitive States to Enhance Driving Experience. IEEE
Transactions on Emerging Topics in Computational Intelligence 2, 288–297. doi:10.1109/
TETCI.2018.2848289 (2018).
129. Millan, J. R., Renkens, F., Mourino, J. & Gerstner, W. Noninvasive Brain-Actuated Control
of a Mobile Robot by Human EEG. IEEE Transactions on Biomedical Engineering 51,
1026–1033. doi:10.1109/TBME.2004.827086 (2004).
130. Chatterjee, A., Aggarwal, V., Ramos, A., Acharya, S. & Thakor, N. V. A Brain-Computer
Interface with Vibrotactile Biofeedback for Haptic Information. Journal of NeuroEngineering and Rehabilitation 4, 40. doi:10.1186/1743-0003-4-40 (2007).
108
131. Seet, M., Harvy, J., Bose, R., Dragomir, A., Bezerianos, A. & Thakor, N. Differential Impact
of Autonomous Vehicle Malfunctions on Human Trust. IEEE Transactions on Intelligent
Transportation Systems 23, 548–557. doi:10.1109/TITS.2020.3013278 (2022).
132. Valeriani, D., Cinel, C. & Poli, R. Group Augmentation in Realistic Visual-Search Decisions
via a Hybrid Brain-Computer Interface. Scientific Reports 7, 7772. doi:10.1038/s41598-
017-08265-7 (2017).
133. Parra, L., Spence, C., Gerson, A. & Sajda, P. Response Error Correction-a Demonstration of
Improved Human-Machine Performance Using Real-Time EEG Monitoring. IEEE Transactions on Neural Systems and Rehabilitation Engineering 11, 173–177. doi:10 . 1109 /
TNSRE.2003.814446 (2003).
134. Faller, J., Cummings, J., Saproo, S. & Sajda, P. Regulation of Arousal via Online Neurofeedback Improves Human Performance in a Demanding Sensory-Motor Task. Proceedings
of the National Academy of Sciences 116, 6482–6490. doi:10.1073/pnas.1817207116
(2019).
135. Li, Y., Li, X., Ratcliffe, M., Liu, L., Qi, Y. & Liu, Q. A Real-Time EEG-based BCI System for Attention Recognition in Ubiquitous Environment in Proceedings of 2011 International Workshop on Ubiquitous Affective Awareness and Intelligent Interaction (Association
for Computing Machinery, New York, NY, USA, 2011), 33–40. doi:10.1145/2030092.
2030099.
136. Fernandez-Vargas, J., Valeriani, D., Cinel, C., Sadras, N., Ahmadipour, P., Shanechi, M. M.,
et al. Confidence Prediction from EEG Recordings in a Multisensory Environment in Proceedings of the 2020 10th International Conference on Biomedical Engineering and Technology (Association for Computing Machinery, New York, NY, USA, 2020), 269–275.
doi:10.1145/3397391.3397426.
137. Yeung, N. & Summerfield, C. Metacognition in Human Decision-Making: Confidence and
Error Monitoring. Philosophical Transactions of the Royal Society B: Biological Sciences
367, 1310–1321. doi:10.1098/rstb.2011.0416 (2012).
138. Ehinger, B. V. & Dimigen, O. Unfold: An Integrated Toolbox for Overlap Correction, NonLinear Modeling, and Regression-Based EEG Analysis. PeerJ 7, e7838. doi:10 . 7717 /
peerj.7838 (2019).
139. Jung, T.-P., Makeig, S., Westerfield, M., Townsend, J., Courchesne, E. & Sejnowski, T. J.
Analysis and Visualization of Single-Trial Event-Related Potentials. Human Brain Mapping
14, 166–185. doi:10.1002/hbm.1050 (2001).
140. Smith, N. J. & Kutas, M. Regression-Based Estimation of ERP Waveforms: II. Nonlinear
Effects, Overlap Correction, and Practical Considerations. Psychophysiology 52, 169–181.
doi:10.1111/psyp.12320 (2015).
109
141. Woldorff, M. G. Distortion of ERP Averages Due to Overlap from Temporally Adjacent
ERPs: Analysis and Correction. Psychophysiology 30, 98–119. doi:10 . 1111 / j . 1469 -
8986.1993.tb03209.x (1993).
142. Sanchez-Lopez, J., Fernandez, T., Silva-Pereyra, J., Mesa, J. A. M. & Russo, F. D. Differences in Visuo-Motor Control in Skilled vs. Novice Martial Arts Athletes during Sustained
and Transient Attention Tasks: A Motor-Related Cortical Potential Study. PLOS ONE 9,
e91112. doi:10.1371/journal.pone.0091112 (2014).
143. Peirce, J., Gray, J. R., Simpson, S., MacAskill, M., Hochenberger, R., Sogo, H., ¨ et al. PsychoPy2: Experiments in Behavior Made Easy. Behavior Research Methods 51, 195–203.
doi:10.3758/s13428-018-01193-y (2019).
144. Amari, S.-i., Cichocki, A. & Yang, H. A New Learning Algorithm for Blind Signal Separation. Adv. Neural. Inform. Proc. Sys. 8 (1999).
145. Bell, A. J. & Sejnowski, T. J. An Information-Maximization Approach to Blind Separation
and Blind Deconvolution. Neural Computation 7, 1129–1159. doi:10.1162/neco.1995.
7.6.1129 (1995).
146. Oostenveld, R., Fries, P., Maris, E. & Schoffelen, J.-M. FieldTrip: Open Source Software for
Advanced Analysis of MEG, EEG, and Invasive Electrophysiological Data https://www.hindawi.com/journals/cin/2011/156869/.
Research Article. 2011. doi:10.1155/2011/156869.
147. Maris, E. & Oostenveld, R. Nonparametric Statistical Testing of EEG- and MEG-data. Journal of Neuroscience Methods 164, 177–190. doi:10.1016/j.jneumeth.2007.03.024
(2007).
148. Lantz, G., Grave de Peralta, R., Spinelli, L., Seeck, M. & Michel, C. M. Epileptic Source
Localization with High Density EEG: How Many Electrodes Are Needed? Clinical Neurophysiology 114, 63–69. doi:10.1016/S1388-2457(02)00337-1 (2003).
149. Song, J., Davey, C., Poulsen, C., Luu, P., Turovets, S., Anderson, E., et al. EEG Source
Localization: Sensor Density and Head Surface Coverage. Journal of Neuroscience Methods
256, 9–21. doi:10.1016/j.jneumeth.2015.08.015 (2015).
150. Koles, Z. J. Trends in EEG Source Localization. Electroencephalography and Clinical Neurophysiology 106, 127–137. doi:10.1016/S0013-4694(97)00115-6 (1998).
151. Michel, C. M., Murray, M. M., Lantz, G., Gonzalez, S., Spinelli, L. & Grave de Peralta,
R. EEG Source Imaging. Clinical Neurophysiology 115, 2195–2222. doi:10 . 1016 / j .
clinph.2004.06.001 (2004).
152. Michel, C. M. & Brunet, D. EEG Source Imaging: A Practical Review of the Analysis Steps.
Frontiers in Neurology 10. doi:10.3389/fneur.2019.00325 (2019).
110
153. Pascual-Marqui, R. D. Discrete, 3D Distributed, Linear Imaging Methods of Electric Neuronal Activity. Part 1: Exact, Zero Error Localization (2007).
154. Benar, C.-G., Grova, C., Kobayashi, E., Bagshaw, A. P., Aghakhani, Y., Dubeau, F., ´ et al.
EEG–fMRI of Epileptic Spikes: Concordance with EEG Source Localization and Intracranial EEG. NeuroImage 30, 1161–1170. doi:10 . 1016 / j . neuroimage . 2005 . 11 . 008
(2006).
155. Koessler, L., Benar, C., Maillard, L., Badier, J.-M., Vignal, J. P., Bartolomei, F., et al. Source
Localization of Ictal Epileptic Activity Investigated by High Resolution EEG and Validated
by SEEG. NeuroImage 51, 642–653. doi:10.1016/j.neuroimage.2010.02.067 (2010).
156. Mikulan, E., Russo, S., Parmigiani, S., Sarasso, S., Zauli, F. M., Rubino, A., et al. Simultaneous Human Intracerebral Stimulation and HD-EEG, Ground-Truth for Source Localization Methods. Scientific Data 7, 127. doi:10.1038/s41597-020-0467-x (2020).
157. Nakasatp, N., Levesque, M. F., Barth, D. S., Baumgartner, C., Rogers, R. L. & Sutherling,
W. W. Comparisons of MEG, EEG, and ECoG Source Localization in Neocortical Partial
Epilepsy in Humans. Electroencephalography and Clinical Neurophysiology 91, 171–178.
doi:10.1016/0013-4694(94)90067-1 (1994).
158. Seeck, M., Lazeyras, F., Michel, C. M., Blanke, O., Gericke, C. A., Ives, J., et al. NonInvasive Epileptic Focus Localization Using EEG-triggered Functional MRI and Electromagnetic Tomography. Electroencephalography and Clinical Neurophysiology 106, 508–
512. doi:10.1016/S0013-4694(98)00017-0 (1998).
159. Oostenveld, R., Stegeman, D. F., Praamstra, P. & van Oosterom, A. Brain Symmetry and
Topographic Analysis of Lateralized Event-Related Potentials. Clinical Neurophysiology:
Official Journal of the International Federation of Clinical Neurophysiology 114, 1194–
1202. doi:10.1016/s1388-2457(03)00059-2 (2003).
160. Congedo, M., Barachant, A. & Bhatia, R. Riemannian Geometry for EEG-based BrainComputer Interfaces; a Primer and a Review. Brain-Computer Interfaces 4, 1–20. doi:10.
1080/2326263X.2017.1297192 (2017).
161. Lotte, F., Bougrain, L., Cichocki, A., Clerc, M., Congedo, M., Rakotomamonjy, A., et al. A
Review of Classification Algorithms for EEG-based Brain–Computer Interfaces: A 10 Year
Update. Journal of Neural Engineering 15, 031005. doi:10 . 1088 / 1741 - 2552 / aab2f2
(2018).
162. Lawhern, V. J., Solon, A. J., Waytowich, N. R., Gordon, S. M., Hung, C. P. & Lance, B. J.
EEGNet: A Compact Convolutional Neural Network for EEG-based Brain–Computer Interfaces. Journal of Neural Engineering 15, 056013. doi:10.1088/1741- 2552/aace8c
(2018).
111
163. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. ScikitLearn: Machine Learning in Python. Journal of Machine Learning Research 12, 2825–2830
(2011).
164. Barachant, A. & Congedo, M. A Plug&Play P300 BCI Using Information Geometry 2014.
doi:10.48550/arXiv.1409.0107.
165. Rivet, B., Souloumiac, A., Attina, V. & Gibert, G. xDAWN Algorithm to Enhance Evoked
Potentials: Application to Brain-Computer Interface. IEEE transactions on bio-medical engineering 56, 2035–2043. doi:10.1109/TBME.2009.2012869 (2009).
166. Bengio, Y. Learning Deep Architectures for AI. Foundations and Trends® in Machine
Learning 2, 1–127. doi:10.1561/2200000006 (2009).
167. Deng, L. & Yu, D. Deep Learning: Methods and Applications. Foundations and Trends in
Signal Processing 7, 197–387. doi:10.1561/2000000039 (2014).
168. LeCun, Y., Bengio, Y. & Hinton, G. Deep Learning. Nature 521, 436–444. doi:10.1038/
nature14539 (2015).
169. Schmidhuber, J. Deep Learning in Neural Networks: An Overview. Neural Networks 61,
85–117. doi:10.1016/j.neunet.2014.09.003 (2015).
170. Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet Classification with Deep Convolutional Neural Networks. Communications of the ACM 60, 84–90. doi:10.1145/3065386
(2017).
171. Lecun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE 86, 2278–2324. doi:10 . 1109 / 5 . 726791
(1998).
172. O’Shea, K. & Nash, R. An Introduction to Convolutional Neural Networks (2015).
173. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., et al. {TensorFlow}: A
System for {Large-Scale} Machine Learning in 12th USENIX Symposium on Operating
Systems Design and Implementation (OSDI 16) (2016), 265–283.
174. Provost, F. & Fawcett, T. Analysis and Visualization of Classifier Performance: Comparison
under Imprecise Class and Cost Distributions, 6.
175. Gneiting, T. & Raftery, A. E. Strictly Proper Scoring Rules, Prediction, and Estimation.
Journal of the American Statistical Association 102, 359–378. doi:10.1198/016214506000001437
(2007).
176. Johnson, R. A Triarchic Model of P300 Amplitude. Psychophysiology 23, 367–384. doi:10.
1111/j.1469-8986.1986.tb00649.x (1986).
112
177. Polich, J. Updating P300: An Integrative Theory of P3a and P3b. Clinical Neurophysiology
118, 2128–2148. doi:10.1016/j.clinph.2007.04.019 (2007).
178. Luck, S. J., Woodman, G. F. & Vogel, E. K. Event-Related Potential Studies of Attention.
Trends in Cognitive Sciences 4, 432–440. doi:10.1016/S1364-6613(00)01545-X (2000).
179. Benjamini, Y. & Hochberg, Y. Controlling The False Discovery Rate - A Practical And Powerful Approach To Multiple Testing. J. Royal Statist. Soc., Series B 57, 289–300. doi:10.
2307/2346101 (1995).
180. Fleming, S. M. & Daw, N. D. Self-Evaluation of Decision-Making: A General Bayesian
Framework for Metacognitive Computation. Psychological Review 124, 91. doi:10.1037/
rev0000045 (20161222).
181. Fleming, S. M. & Dolan, R. J. The Neural Basis of Metacognitive Ability. Philosophical
Transactions of the Royal Society B: Biological Sciences 367, 1338–1349. doi:10.1098/
rstb.2011.0417 (2012).
182. Poli, R., Cinel, C., Citi, L. & Sepulveda, F. Reaction-Time Binning: A Simple Method for
Increasing the Resolving Power of ERP Averages. Psychophysiology 47, 467–485. doi:10.
1111/j.1469-8986.2009.00959.x (2010).
183. Yeung, N., Botvinick, M. M. & Cohen, J. D. The Neural Basis of Error Detection: Conflict
Monitoring and the Error-Related Negativity. Psychological Review 111, 931–959. doi:10.
1037/0033-295X.111.4.931 (2004).
184. Hassall, C. D., Harley, J., Kolling, N. & Hunt, L. T. Temporal Scaling of Human ScalpRecorded Potentials During Interval Estimation. bioRxiv, 2020.12.11.421180. doi:10.1101/
2020.12.11.421180 (2021).
185. Wang, J., Narain, D., Hosseini, E. A. & Jazayeri, M. Flexible Timing by Temporal Scaling
of Cortical Responses. Nature Neuroscience 21, 102–110. doi:10 . 1038 / s41593 - 017 -
0028-6 (2018).
186. Williams, A. H., Poole, B., Maheswaranathan, N., Dhawale, A. K., Fisher, T., Wilson, C. D.,
et al. Discovering Precise Temporal Patterns in Large-Scale Neural Recordings through Robust and Interpretable Time Warping. Neuron 105, 246–259.e8. doi:10.1016/j.neuron.
2019.10.020 (2020).
187. Kravitz, D. J., Saleem, K. S., Baker, C. I., Ungerleider, L. G. & Mishkin, M. The Ventral
Visual Pathway: An Expanded Neural Framework for the Processing of Object Quality.
Trends in cognitive sciences 17, 26–49. doi:10.1016/j.tics.2012.10.011 (2013).
188. Rossion, B., Joyce, C. A., Cottrell, G. W. & Tarr, M. J. Early Lateralization and Orientation
Tuning for Face, Word, and Object Processing in the Visual Cortex. NeuroImage 20, 1609–
1624. doi:10.1016/j.neuroimage.2003.07.010 (2003).
113
189. Stephan, K. E., Marshall, J. C., Penny, W. D., Friston, K. J. & Fink, G. R. Interhemispheric
Integration of Visual Processing during Task-Driven Lateralization. Journal of Neuroscience
27, 3512–3522. doi:10.1523/JNEUROSCI.4766-06.2007 (2007).
190. Stephan, K. E., Marshall, J. C., Friston, K. J., Rowe, J. B., Ritzl, A., Zilles, K., et al. Lateralized Cognitive Processes and Lateralized Task Control in the Human Brain. Science 301,
384–386. doi:10.1126/science.1086025 (2003).
191. Ayzenberg, V. & Behrmann, M. The Dorsal Visual Pathway Represents Object-Centered
Spatial Relations for Object Recognition. Journal of Neuroscience 42, 4693–4710. doi:10.
1523/JNEUROSCI.2257-21.2022 (2022).
192. Valeriani, D., O’Flynn, L. C., Worthley, A., Sichani, A. H. & Simonyan, K. Multimodal
Collaborative Brain-Computer Interfaces Aid Human-Machine Team Decision-Making in
a Pandemic Scenario. Journal of Neural Engineering 19, 056036. doi:10 . 1088 / 1741 -
2552/ac96a5 (2022).
193. Johns, P. in Clinical Neuroscience (ed Johns, P.) 27–47 (Churchill Livingstone, 2014).
doi:10.1016/B978-0-443-10321-6.00003-5.
194. Kao, J. C., Nuyujukian, P., Ryu, S. I., Churchland, M. M., Cunningham, J. P. & Shenoy, K. V.
Single-trial dynamics of motor cortex and their applications to brain-machine interfaces. en.
Nature Communications 6. Number: 1 Publisher: Nature Publishing Group, 7759. doi:10.
1038/ncomms8759 (2015).
195. Vyas, S., Golub, M. D., Sussillo, D. & Shenoy, K. V. Computation Through Neural Population Dynamics. Annual Review of Neuroscience 43. eprint: https://doi.org/10.1146/annurevneuro-092619-094115, 249–275. doi:10.1146/annurev-neuro-092619-094115 (2020).
196. Sani, O. G., Abbaspourazad, H., Wong, Y. T., Pesaran, B. & Shanechi, M. M. Modeling
behaviorally relevant neural dynamics enabled by preferential subspace identification. en.
Nature Neuroscience 24. Number: 1 Publisher: Nature Publishing Group, 140–149. doi:10.
1038/s41593-020-00733-0 (2021).
197. Sani, O. G., Pesaran, B. & Shanechi, M. M. Where is all the nonlinearity: flexible nonlinear modeling of behaviorally relevant neural dynamics using recurrent neural networks
en. Pages: 2021.09.03.458628 Section: New Results. 2021. doi:10.1101/2021.09.03.
458628.
198. Ahmadipour, P., Yang, Y., Chang, E. F. & Shanechi, M. M. Adaptive tracking of human
ECoG network dynamics. en. Journal of Neural Engineering 18. Publisher: IOP Publishing,
016011. doi:10.1088/1741-2552/abae42 (2021).
199. Yang, Y., Ahmadipour, P. & Shanechi, M. M. Adaptive latent state modeling of brain network dynamics with real-time learning rate optimization. en. Journal of Neural Engineering
18. Publisher: IOP Publishing, 036013. doi:10.1088/1741-2552/abcefd (2021).
114
200. Song, C. Y., Hsieh, H.-L., Pesaran, B. & Shanechi, M. M. Modeling and Inference Methods for Switching Regime-Dependent Dynamical Systems with Multiscale Neural Observations. Journal of Neural Engineering 19, 066019. doi:10 . 1088 / 1741 - 2552 / ac9b94
(2022).
201. Shanechi, M. M., Orsborn, A., Moorman, H., Gowda, S. & Carmena, J. M. High-performance
brain-machine interface enabled by an adaptive optimal feedback-controlled point process decoder in 2014 36th Annual International Conference of the IEEE Engineering in
Medicine and Biology Society ISSN: 1558-4615 (2014), 6493–6496. doi:10.1109/EMBC.
2014.6945115.
202. Orsborn, A. L., Moorman, H. G., Overduin, S. A., Shanechi, M. M., Dimitrov, D. F. & Carmena, J. M. Closed-Loop Decoder Adaptation Shapes Neural Plasticity for Skillful Neuroprosthetic Control. Neuron 82, 1380–1393. doi:10 . 1016 / j . neuron . 2014 . 04 . 048
(2014).
203. Linderman, S., Johnson, M., Miller, A., Adams, R., Blei, D. & Paninski, L. Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems in Proceedings of the
20th International Conference on Artificial Intelligence and Statistics (eds Singh, A. & Zhu,
J.) 54 (PMLR, 2017), 914–922.
115
Appendices
A The Neural Correlates of Decision Confidence: Additional
Details and Control Analyses
A.1 Cluster-Based Permutation Test
We now briefly explain the procedure for the cluster-based permutation test [147]. We consider
EEG data from a single channel for two experimental conditions, c1 and c2. For the purpose of
this explanation, each trial is s samples long and has a label corresponding to its experimental
condition. To perform the cluster-based permutation test, we first compute t-values at each of the
s samples to compare conditions c1 and c2. We then select all samples with t-values above a
specified threshold. For our analysis, t-values were thresholded above the 95th percentile for the
ERP analysis and at the 99th percentile for the source localization analysis. After thresholding,
selected samples that are temporally adjacent are grouped into clusters. In other words, if two
samples are both above threshold and temporally adjacent, they will be part of the same cluster.
For each cluster, we compute a cluster statistic by taking the sum of t-values for all samples within
the cluster. This clustering procedure (all steps until this point) is then performed for 1000 random
permutations of the trial labels. For each of these permutations, the maximum cluster statistic is
recorded. For each cluster that was computed using the true labels, the cluster-level p-value is
the proportion of permutations where the maximum cluster statistic is greater than that cluster’s
statistic, i.e. the sum of t-values for all samples within the cluster. Clusters were declared to be
116
significant if their cluster-level p-values were less than .05 for the ERP analysis, and if they were
less than .01 for the source localization analysis.
A.2 Confidence Threshold Sensitivity Analysis
We repeated the ERP analysis described in Section 2.4.1 without excluding trials with confidence
reports below the 80th percentile and above the 20th percentile. We performed two versions of
this analysis, setting the threshold for confident trials at the 80th percentile (Figure 5.1) and at the
50th percentile (Figure 5.2). We note that in both Figure 5.1 and Figure 5.2, there is no significant
difference between conditions in the stimulus-locked epoch of the no-gap task, even though there
was a significant difference in Figure 4.6. This is because the neural activity for middle-confident
trials lies in between the activity for the least and most confident trials and including them in the
analysis brings the confident and unconfident averages closer together. In the main ERP analysis
(Figure 4.6), we chose to exclude the middle 60% of trials, with the rationale that extreme values
of reported confidence by users are more likely to be reported decisively and thus likely more
representative examples of neural signals associated with confidence. We emphasize that our main
results are not based on the stimulus-locked no-gap epoch. Instead, our main conclusions from
comparisons of the no-gap and gap task ERPs are based on the response-locked epoch of the nogap task and both response-locked and stimulus-locked epochs of the gap task. In all versions
of the ERP analysis (Figures 4.6, 5.1, 5.2), we found significant ERP differences in similar time
periods for all three relevant epochs. Thus, our main conclusions also hold based on alternative
Figures 5.1 and 5.2, and are not affected by our choice regarding which figure is used as the main
figure.
A.3 Source Localization Control Analysis
We used data from the publicly available Localize-MI dataset in order to compare the performance
of eLORETA source localization while using subject-specific anatomy vs template anatomy [156].
The dataset features simultaneous EEG and intracranial stimulation, with the stimulation electrode
117
Figure 5.1: ERP analysis, using all trials split at 80th percentile. We repeated the ERP analysis
shown in Figure 6, but without excluding any trials. Instead, all trials at and above the 80th
percentile of confidence reports were placed in the ‘confident’ group, while the remaining trials
below the 80th percentile were placed in the ‘unconfident’ group (in Figure 6, we again considered
the trials above 80th percentile as confident but only considered trials below 20th percentile as
unconfident unlike here). Results are qualitatively similar to the original analysis – the stimulusand response-locked epochs of the no-gap task have differences over parietal channels, but when a
gap is added, this pattern is revealed to be stimulus-locked.
positions serving as the ground truth for source localization analysis. Specifically, 7 subjects participated in 5-10 neural recording sessions with stimulation. Within each session, the stimulating
electrode was kept at a fixed location, and 40-60 stimulation trials were performed, with one stimulation pulse per trial. The locations of the stimulating electrodes serve as the ground truth for
source-localization methods and allow for their evaluation. This dataset also contained head models based on subject MRIs, and therefore allowed us to assess our eLORETA analysis pipeline with
both subject-specific anatomy and template anatomy.
We performed this assessment by comparing the localization error of our method with a chancelevel error to determine if the localization error was significantly better than chance. We applied
118
Figure 5.2: ERP analysis, using all trials split at 50th percentile. Identical to Figure 5.1, but with
confident and unconfident trials split at the 50th percentile. Results are qualitatively similar to
the original analysis. This indicates that our findings are not sensitive to the specific choice of
confidence threshold.
119
eLORETA to trial-averaged EEG data from each session. The localization error was computed
as the distance between the stimulating electrode and the source voxel with the highest current
density. We computed the chance level error by taking the average distance from the stimulating
electrode to every voxel in the eLORETA source space. This represents the expected error if
the location of maximum current density was chosen as a random voxel with a uniform probability
distribution. We then compared the performance of template and subject anatomy after normalizing
each session’s error by its chance level. This was done to account for the difference in chance level
between anatomy types.
Our analysis revealed that eLORETA with both subject-specific and template anatomy performed significantly better than chance level (Figure 5.3-A). The results using subject anatomy
were not significantly better than results using template anatomy (Figure 5.3-B). These results validate our source localization pipeline and indicate that template anatomy can indeed be used to
perform reliable source analysis.
A.4 Temporal Split for Classifier Cross-Validation
We repeated our classification analysis using a temporal split instead of 5-fold cross validation for
training and evaluation. For each subject, the first 80% of trials were used for training, and the
final 20% of trials were used for testing. While k-fold cross validation allows us to make the most
of our data by testing on every single trial, it has the shortcoming that for some folds, the test trials
may have occurred prior to some of the training trials. If there are slow drifts in neural activity,
then this non-causal training may give results that are not reflective of what would happen in a real
BCI application, where training must be causal. With a temporal split, we mitigate this issue at
the cost of not being able to test the classifier on all of our data. The results of the temporal split
classification are shown in Figure 5.4. The performance of the logistic classifier when using k-fold
cross validation was not significantly different than when using a temporal split (p=0.48, 2-sample
t-test) (note that the logistic classifier was the most performant classifier here). This indicates
120
that non-causal k-fold cross validation did not significantly change the results of our classification
analysis.
We performed an additional classification analysis where the most and least blurred trials were
excluded in order to control for the effect of stimulus blur. We did not perform this for a single blur
level because there were too few trials per blur level (around 128 per subject). When the most and
least blurred trials were excluded, the logistic classifier again performed best and achieved an average AUC of .61. The logistic classifier’s performance was significantly better than chance level (p
< .05, 1-sample t-test). Note that the peak difference between confident/unconfident conditions occurs around 500ms after the stimulus onset, whereas low-level sensory processing typically occurs
much earlier post-stimulus (on the order of 100ms) [177, 178]. So this control analysis combined
with the timing of the ERP suggest that ERPs do not reflect low-level sensory processing.
121
Figure 5.3: eLORETA performance comparison for template vs subject-specific anatomical models. (A) Comparison to chance level. For both anatomy types, eLORETA performed significantly
better than chance (template anatomy p = 1.2e-12, subject anatomy p = 7.9e-15, paired t-test). (B)
Comparison between template and subject anatomy, normalized by chance level. Although normalized eLORETA error with subject anatomy was slightly less than with template anatomy, this
difference was not significant (p=.1).
122
Figure 5.4: Classifier results using temporal cross validation. We repeated our classification analysis using a temporal train-test split instead of 5-fold cross validation. All methods performed
significantly above chance level. The performance of the logistic classifier was not significantly
different between cross validation methods, indicating that our main analysis did not artificially
benefit from non-causal training.
123
Abstract (if available)
Abstract
Brain-computer interfaces (BCIs) consist of hardware and software that create a communication channel between a user's brain and external devices. BCI technology has typically been used to restore functionality to injured or impaired patients. Outside of clinical applications, there is also growing interest in developing BCIs that can improve a user’s capabilities. In this work, we develop key components and show proof-of-concept of a BCI that can improve a user's performance on a given task by providing feedback based on decoded cognitive states.
First, we develop a method to detect stimulus- and behavior-related events from neural activity, addressing the challenge of multiple event types that have varying spatiotemporal signatures. This method enables neural decoding in cases where task-related event times, such as stimulus onsets, are unknown ahead of time. We develop event-detection algorithms for binary spiking data and for multimodal data consisting of both spike and local field potential (LFP) signals. Second, we investigate the neural correlates of decision confidence in electroencephalogram (EEG) activity. We discover that neural activity is modulated by confidence in the post-stimulus epoch, but not in the post-response epoch, and show that confidence can be decoded from single-trial post-stimulus EEG activity. We then design a simulated BCI framework to show that this confidence decoding is accurate enough to build a BCI that can improve performance on a decision-making task, especially when the difficulty and error cost are high.
The advancements made in this work can facilitate the development of cognitive BCIs that can be used in naturalistic settings without constrained tasks or prior knowledge of event times.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Geometric and dynamical modeling of multiscale neural population activity
PDF
Dynamical representation learning for multiscale brain activity
PDF
Switching dynamical systems with Poisson and multiscale observations with applications to neural population activity
PDF
Multiscale spike-field network causality identification
PDF
Neural spiketrain decoder formulation and performance analysis
PDF
Decoding memory from spatio-temporal patterns of neuronal spikes
PDF
Human motion data analysis and compression using graph based techniques
PDF
On the electrophysiology of multielectrode recordings of the basal ganglia and thalamus to improve DBS therapy for children with secondary dystonia
PDF
Dynamic neuronal encoding in neuromorphic circuits
PDF
Learning from limited and imperfect data for brain image analysis and other biomedical applications
PDF
Novel graph representation of program algorithmic foundations for heterogeneous computing architectures
PDF
Differential verification of deep neural networks
PDF
Data-driven and logic-based analysis of learning-enabled cyber-physical systems
PDF
Therapeutic electrical stimulation strategies for neuroregeneration and neuroprotection of retinal neurons
PDF
insideOut: Estimating joint angles in tendon-driven robots using Artificial Neural Networks and non-collocated sensors
PDF
Facilitating myocontrol for children with cerebral palsy
PDF
Dealing with unknown unknowns
PDF
Towards efficient edge intelligence with in-sensor and neuromorphic computing: algorithm-hardware co-design
PDF
Bidirectional neural interfaces for neuroprosthetics
PDF
Fabrication and packaging of three-dimensional Parylene C neural interfaces
Asset Metadata
Creator
Sadras, Nitin
(author)
Core Title
Detection and decoding of cognitive states from neural activity to enable a performance-improving brain-computer interface
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Degree Conferral Date
2024-05
Publication Date
06/12/2024
Defense Date
03/13/2024
Publisher
Los Angeles, California
(original),
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
brain-computer interface,decision confidence,EEG,event detection,neural decoding,OAI-PMH Harvest
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Shanechi, Maryam (
committee chair
), Bogdan, Paul (
committee member
), Valero-Cuevas, Francisco (
committee member
)
Creator Email
nitin.sadras@gmail.com,nsadras@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC113996172
Unique identifier
UC113996172
Identifier
etd-SadrasNiti-13083.pdf (filename)
Legacy Identifier
etd-SadrasNiti-13083
Document Type
Dissertation
Format
theses (aat)
Rights
Sadras, Nitin
Internet Media Type
application/pdf
Type
texts
Source
20240612-usctheses-batch-1167
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
brain-computer interface
decision confidence
EEG
event detection
neural decoding