Close
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Reaching decisions in dynamic environments
(USC Thesis Other)
Reaching decisions in dynamic environments
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Reac hing decisions in dynamic en vironmen ts by Vincen t Enac hescu A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY NEUROSCIENCE Decem b er 2021 ii A cknow le dgements I would like to extend heartfelt acknowledgements to all of the people in my life who supported me through this journey, but in particular: To Nicolas Schweighofer, Stefan Schaal, Vasileios Christopoulos; my academic advisors and mentors for providing unceasing support and advice, while allowing me the independence to learn from my mistakes. To my dissertation committtee; Carolee Winstein, Isabelle Brocas, and John Mon- terosso, for their patience, curioisty, support, and commitment to helping me find a truly inter-disciplinary path. To my academic family, the members of the Computational Learning and Motor Control laboratory, Computational Neurorehabilitation laboratroy at USC in Los Angeles, and the Autonomous Motion Department at MPI in Tübingen, Germany; for the friendship, the lessons, the inspiration and the all the conversations that have shaped the ideas and the person I am today. And to my family, whose love makes it all possible. iii A bstr act Choosing and coordinating movements in pursuit of behavioral goals is arguably the ultimate purpose of the nervous system. Historically, choice and coordination have been studied as distinct and separable processes, and this view prevails to this day. It is held that the cortical regions within the frontal lobe evaluate all possible outcomes in terms of total subjective utility, select the outcome with the greatestvalue, andthencommunicatethisdesiredoutcometothemotorsystemfor planning and execution. However, this view is increasingly contradicted by neural recordings and behavioral experiments, and not consistent with the ethological origins of behavior. A more recent set of theories argue that the processes of choosing and coordi- nating are inherently linked due to the evolutionary pressures of a competitive, dynamic environment. According to these theories, decisions are made between potential actions, which compete for selection through mutual inhibition; other cortical regions estimate some aspect of valuation and project excitation to influ- ence this competition. Unlike the prevailing framework, these dynamic decision making theories are able to link quantifiable predictions about overt behavior, like movement, to predicted patterns of neural activity. This thesis combines the approach of computational modeling and human psy- chophysics experiments to evaluate the predictions of these frameworks. We also introduce a novel variation of the action based model, a semi-hierarchical ‘frontal’ loopthatisresponsibleforintegratingactionrelatedcostsintoanongoingdecision. The predictions of these models are assessed in two studies on human reaching un- der uncertainty; in the first we note reward weighted mixing of movement plans and reaction time patterns that are consistent with action based decision making. Results from the second study suggest a wide heterogeneity of sensitivity to motor related costs among subjects, but also provide some evidence of time dependence on this action cost estimation, as predicted by the introduced model. Contents iv Con ten ts A c kno w ledgemen ts ii Abstract iii List of Figures vii List of T ables xiv 1 In tro duction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Movements first . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.2 From the top down; utility theory to game theory . . . . . . 2 1.1.3 Psychology meets homo economicus . . . . . . . . . . . . . 4 1.1.4 From the bottom up; reflexology to optimal feedback control 7 1.1.5 The Serial model: a ‘good’ based view . . . . . . . . . . . . 9 1.1.6 Limitations of the serial model . . . . . . . . . . . . . . . . 11 1.1.7 Dynamic decision making . . . . . . . . . . . . . . . . . . . 15 2 What if y ou are not certain? 20 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.2.1 Behavioral paradigm . . . . . . . . . . . . . . . . . . . . . . 24 2.2.2 Initial approach direction varies with target probability . . . 24 2.2.3 Reaction time varies with the target probability . . . . . . . 26 2.2.4 Actionselection,reactiontimeandchoiceconfidenceemerge through action competition . . . . . . . . . . . . . . . . . . 28 2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.3.1 General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.3.2 Theriskofconflatingevidenceaccumulationwithpre-decision confidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.3.3 From signal-detection theory to evidence accumulation to desirability competition . . . . . . . . . . . . . . . . . . . . 38 2.3.4 Motor averaging versus visual averaging hypothesis for ac- tion selection . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Contents v 2.4 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.4.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.4.2 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . 42 2.4.3 Experimental paradigm . . . . . . . . . . . . . . . . . . . . 43 2.4.4 Behavioral data analysis . . . . . . . . . . . . . . . . . . . . 44 2.4.5 Neurodynamical framework . . . . . . . . . . . . . . . . . . 44 2.4.6 Data Availability . . . . . . . . . . . . . . . . . . . . . . . . 48 3 Reac hing task under effort uncertain t y 49 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.1.1 The cost of effort . . . . . . . . . . . . . . . . . . . . . . . . 51 3.1.2 Anticipating effort costs . . . . . . . . . . . . . . . . . . . . 53 3.1.3 Reaching decisions about effort . . . . . . . . . . . . . . . . 53 3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.3 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.3.1 Trial Structure . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.3.2 Session Structure . . . . . . . . . . . . . . . . . . . . . . . . 58 3.3.3 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.3.4 Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.4.1 Logistic choice regression . . . . . . . . . . . . . . . . . . . 67 3.4.2 Functional density analysis . . . . . . . . . . . . . . . . . . 73 3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4 A dual neural field mo del of action selection 84 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.1.1 Conflict dependent estimation of action costs . . . . . . . . 87 4.1.2 Inhibition-only interventions . . . . . . . . . . . . . . . . . . 89 4.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.2.1 Design Motivations . . . . . . . . . . . . . . . . . . . . . . . 90 4.2.2 Model Architecture . . . . . . . . . . . . . . . . . . . . . . . 91 4.2.3 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.2.4 Dynamic neural fields . . . . . . . . . . . . . . . . . . . . . 95 4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.3.1 Reaction time simulations . . . . . . . . . . . . . . . . . . . 98 4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5 Conclusions 105 5.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.1.1 Summary of experiments and results . . . . . . . . . . . . . 105 5.1.2 Dual and back again . . . . . . . . . . . . . . . . . . . . . . 109 5.1.3 The need to unify choice and control . . . . . . . . . . . . . 111 Contents vi Bibliograph y 114 List of Figures vii List of Figures 1.1 Sk etc h of the information flo w and temp oral ordering of the decision pro cess as describ ed b y the ‘go o d’ based mo del (Padoa-Schioppa, 2011). The first phase is representation ( 1), in- coming sensory information is used to form representations of po- tential outcomes. In the second phase, valuation ( 2), the prefrontal cortexassignsanestimatedsubjectiveutilityvaluetoeachpotential outcome. The outcome assessed to have the best subjective utility to cost trade-off is selected ( 3). And finally, the desired outcome is passed to the motor system and transformed into an action plan and then executed ( 4). . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.2 Information flo w in the serial, ‘go o d’ based mo del . Exter- nal features are processed from incoming sensory information and combined with internal features and biases to form estimations of the expected utility of potential outcomes. The outcome associ- ated with the greatest utility (or ‘good’) is selected, and projected to the cortical motor areas where a “goal to action” transformation is performed to turn the desired outcome into a movement plan. . . 12 1.3 Sk etc h of the information flo w through neural structures as prop osed in the affordance comp etition theory , adapted from Figure 1 in (Cisek, 2007). In this simplified view, incoming vi- sual input is transformed into features for interaction (affordances) that are used to specify actions; the classic example being a han- dle on a mug, which enables interaction through an immediately obvious and stable grip. These affordances and other cues trigger the formation of plans for taking action that are distributed across the PPC, M1 and SMA, with reciprocal connections to the basal ganglia. Potential action plans compete for selection through re- ciprocal inhibition; other cortical regions participate in the action selection process by estimating some choice contingency (risk, de- lay, cost, etc.) and modulating the activity of the corresponding action plan accordingly. . . . . . . . . . . . . . . . . . . . . . . . . 16 List of Figures viii 1.4 A sc hematic diagram of the computational mo del based on the affordance comp etition theory of decision making . Each layer of the model is depicted by a set of grey circles, each representing a neuron or unit tuned to a specific parameter; i.e. a movement towards a particular direction. Visual information from V1 is projected along the ventral and dorsal streams, which ex- tract spatial and semantic features. This information drives the activation of movement plans in the PMd, and motor policies in M1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.1 A graphical represen tation of the exp erimen tal setup from t w o p ersp ectiv es. Participants (a) were seated directly in front of a Phantom haptic robot (c), with their index fingers inserted in a finger-tip adaptor (b) and their midline aligned with the center of an LCD monitor (d). Reaching movements took place in the x− y plane, +y being towards the screen and +x being towards the right hand side of the screen. The distance from the head of the individuals to the finger starting position along the y axis was about d subject = 0.30 m and slightly varied across participants. The distance from the finger starting position to the screen display was d display = 0.35 m. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.2 T ask design and exp erimen tal paradigm. ( A): A reaching trial started with a fixation cross presented on the center of the screen for about 1.5 s. Then, either a single or two unfilled cues were presented simultaneously in both visual fields. After 300 ms the central fixation cross was extinguished (“go-signal”), and the participants had to perform a rapid reaching movement towards the target(s) within 1 s. Once the reach trajectory crossed a trigger threshold (red discontinuous line), one of the cues (or the single cue) was filled-in black indicating the actual goal location. Re- sponses before the go-signal or reaches that exceeded the maximum movement time (1 s) were aborted and not used for further anal- ysis. ( B): The color of the cues in the dual-target trials indicated the target probabilities - blue cues corresponded to equiprobable targets, whereas green and red cues corresponded to targets with 80% and 20% probability, respectively. Single cues always had blue color. ( C): The distance between the origin and the midpoint of the two cues was d reach = 0.2 m. The distance between the cue and the midpoint was d separation = 0.15 m. The trigger threshold - i.e., distance between the origin and the location that the actual goal location was revealed - was set to d threshold = 0.05 m. . . . . . . . . 26 List of Figures ix 2.3 Reac h tra jectories for differen t target probabilities . ( A): Representative single-trial trajectories (thin traces) and the cor- responding average trajectories (thick tracres) from single- (black trace) and two-target trials with equal (blue trace) and unequal (green trace) probabilities, when the actual goal was located in the left hemifield. ( B): Similar to A but for actual goal located in the right hemifield. Target probability influences the reach trajectories. When people were certain about the goal location, reaches were aimed directly to the target. When they were uncertain, reaches were launched to an intermediary location between the targets and then corrected in-flight to the cued target location. The spatially averaged behavior was biased towards the likely target. . . . . . . . 27 2.4 Approac h direction and reaction time. ( A): Approach direc- tion and ( B) reaction time across participants, number of targets and probabilities. Positive and negative approach directions corre- spond to reaches launched closer to the right and left target, respec- tively. Approach directions around 0 ◦ correspond to reaches aimed towards the intermediate location between the two targets. ( C): Reaction time as a function of the approach direction in equiprob- able (blue trace) and unequiprobable (green trace) sessions ( D): Reaction time as a function of target separation computed from single-target trials across 3 participants. Error bars correspond to standard error (SE), solid lines show the polynomial regression fit- ting (linear in panels A and D, quadratic and cubic in panels B and C, respectively) and the colored shadow areas illustrate the confidence interval of the polynomial regression results. Target probability influences both the approach direction and the reaction time of the reaches. However, reaction time and approach direction are not fully mediated by the target probability. Instead, reaches with longer reaction times often launch to an intermediate location between the potential goals. . . . . . . . . . . . . . . . . . . . . . . 29 List of Figures x 2.5 Mo del arc hitecture of the “reac h-b efore-y ou-kno w” task. The neural fields consist of 181 neurons and their spatial dimension spans the semi-circular space between 0 ◦ and 180 ◦ . Each neuron in the reach planning field is connected with a stochastic optimal control system. Once the activity of a neuron exceeds a thresh- old γ, the corresponding controller generates a sequence of reach actions towards the preferred direction of the neuron. The reach planning field receives excitatory inputs from the spatial sensory input field that encodes the angular representation of the potential targets, and the expected outcome field that encodes the expected outcome of the competing targets (blue, red and green Gaussian distributions correspond to cues with 0.5, 0.2 and 0.8 target prob- ability, respectively). It also receives inhibitory inputs from the reach cost field that encodes the effort required to implement the available sequences of actions - i.e., move to a particular direction from the current state. The normalized activity of the reach plan- ning field encodes the “desirability” of the M available sequences of actions (i.e., neurons with activation level above the threshold γ) at a given time and state and acts as a weighting factor on each individual sequence of actions. Because the relative desirability is time- and state- dependent, a range of behavior from weighted av- eraging (i.e., spatial averaging trajectories) to winner-take-all (i.e., direct reaches to one of the cues) is generated. . . . . . . . . . . . 30 2.6 Sim ulated neural activit y and reac h b eha vior. ( A): A rep- resentative example of the simulated model activity as a function of time in the reach planning field for a dual-target trial with the actual goal located in the left visual field. The red discontinuous lines indicate the target onset, the movement onset, and the goal onset. The corresponding reach trajectory is shown in the upper inset. ( B): Simulated activity of two planning neurons centered at the location of the cued (continuous traces) and the uncued (dis- continuous traces) target, from a representative single-target trial (black trace) and two dual-target trials with equal (blue traces) and unequal (green trace) probabilities. A reach movement is ini- tiated when the activity of one of the neurons exceeds the response threshold (gray discontinuous trace). When only a single target is presented, the neuronal activity ramps up quickly to the response threshold resulting in faster reactions and direct reaches to the tar- get. However, when two targets are simultaneously presented, the neurons compete for selection through inhibitory interactions re- sulting often in slower reaction times and spatially averaged move- ments. If one of the alternatives is assigned with higher probability, the competition is biased to the likely target leading to faster re- sponses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 List of Figures xi 2.7 Approac h direction and reaction time of the sim ulated reac hes. ( A): Approach direction and ( B) reaction time of the simulated reaches across number of targets and probabilities. ( C): Reaction time as a function of the approach direction in the sim- ulated equiprobable (blue trace) and unequiprobable (green trace) sessions. Error bars correspond to standard error (SE), solid lines show the polynomial regression fitting (linear in panel A, quadratic in panel B and cubic in panel C) and the colored shadow areas il- lustrate the confidence interval of the polynomial regression results. Consistent with the human findings, the model predicts that target probability influences both the approach direction and the reaction time of the movements. However, reaction time and approach di- rection are not fully mediated by the target probability. Instead, the longer it takes to resolve the action competition, the more likely it is the losing population to be still active at the movement onset, resulting in spatially averaged reaches. . . . . . . . . . . . . . . . . 34 3.1 Diagram of the sequence of events in each reaching trial (A); during (i) fixation the subject is fixated on a central cross, (ii) in presen- tation, one or two targets appear and their coloring indicates the relative level of difficulty. At the end of presentation, a cue is given to start movement (iii), and once the subject’s finger has crossed the threshold line (dashed light red line in adjoining figure), the targets change color to reveal the associated points (iv). Sketch of a top down view (B) of the experimental setup, (a) the positions of the virtual targets, black L and R dashed circles. . . . . . . . . . 57 3.2 Plot of the logistic model coefficients found from fitting the subject data to the experimental parameters listed on top of each variable’s plot. The model was fit four times using; the subject’s successful choices, choices from all trials including time outs (final), and then the inferred selected target through the initial and threshold phases of the trials. The final two computed from the movement launch angle and the lateral position and threshold, respectively. . . . . . . 68 3.3 The logisitic regression model of the probability of choosing the righttarget, usingtheexperimentalconditionsasparameters. Each β represents a fitted coefficient, the superscripts L and R refer to the left and right targets respectively, p and r indicate the number of points and target size, respectively. Finally, n trial is the trial sequence index and t present is the presentation time. . . . . . . . . . 68 List of Figures xii 3.4 Plots of the coefficients found by fitting logistic regression models on subject’s choice of target, using trial conditions as predictive parameters. Since the models are fitted to the probability of choos- ing the right target, positive coefficients indicate that marginal in- creases of the associated variable result in an increased probability of selecting the right target. In ( a), the coefficients associated with right target points are plotted against the coefficients for left target points for each subject. The gray line shows a simple linear model fitted to the relationship between the left and right side cofficients. Plotted in ( b) are the coefficients for the left and right target sizes. The dashed gray line indicates a hypothetical trend line of equal sensitivity to changes in the right and left target size. . . . . . . . . 71 3.5 Plot of the subjects’ estimated lateral bias, determined using a binomial test on all trials with equal points and target sizes. . . . . 72 3.6 Plots of the cumulative distribution of reaction times across a num- ber of trials conditions; the Vincentized distributions combined across all subjects are plotted in the thick lines, while the cumula- tive distribution curves for individual subjects are plotted behind in low opacity. Plot ( a) shows the reaction times from single target (non-choice) trials across the different lengths of presentation time, plot ( c), shows the same for two target trials. Plot ( b) shows the reaction times in single target trials conditioned on the size of the presented target, while ( d) shows two target trials conditioned on the difference in size between the two targets . . . . . . . . . . . . . 73 3.7 Visualization of the functional principal components decomposition of reaction time distributions, the dashed lines represent the shift in distribution associated with an increase in the associated variable, while the gray solid line represents the mean reaction time distri- bution. The plot ( a) shows the results using the difference in target sizes, while ( b) and ( c) show the results from the decomposition conditioned on the size of the left and right target size respectively. 75 3.8 Visualization of the functional principal components decomposition of reaction time distributions, the dashed lines represent the shift in distribution associated with an increase in the associated variable, while the gray solid line represents the mean reaction time distri- bution. The plot ( a) shows the results using the difference in target sizes, while ( b) and ( c) show the results from the decomposition conditioned on the size of the left and right target size respectively. 76 List of Figures xiii 4.1 Sk etc h of the information flo w through neural structures in the prop osed ‘dual field’ mo del . The two neural circuits or ‘loops’ form the basis of this model, where interconnected patches of cortex through several regions are used to represent a continuum of features. For the motor control loop, these are features like the desired movement in planning space, represented in PPC and SMA, while the plan in motoneuron space is represented in M1. In the frontal loop, the regions in the prefrontal cortex represent features related to the valuation and anticipated costs, with the ACC. . . . 88 4.2 Diagram of information flo w in a ’go o d’ theoretic mo del. . 92 4.3 Plot and equation describing the shap e of the mo del’s in- hibitory output projected from the frontal field to the motor field. 94 4.4 An example simulation result of the proposed model performing a simplified version of a choice task. In the simulated trial, targets are initially presented at -40 and 40, after the movement is initiated and crosses threshold, the target at 40 is extinguished. The plots of motor field activity and frontal field activity show the neuronal activity over time (with yellow being the highest activity level), and the black points represent activity that exceeds the threshold. The plots of the fields’ output (right most plots) show the output activity, forthemotorfieldthedashedlineindicateswhichattached optimal controller is active, and for the frontal field the output indicates the center of the frontal inhibition. The frontal inhibition (bottom left) shows the early conflict in the motor field increasing activation of the inhibitory curve until the motor field stabilizes. . . 98 4.5 An example simulation of the proposed model performing a version of the experiment with only a single target (non-choice). Note that in this trial, the frontal bias is initially placed in the center, this is meant to recreate the effect of the instructed visual fixation that occurs in the related experiment. . . . . . . . . . . . . . . . . . . . 99 4.6 Results from the reaction time simulations; the plot (A) shows the observed distribution of a human subject in the related experiment from single target trials and (B) shows the distribution from sin- gle stimulus (non-choice) trials, while (B) shows simulations from choice trials with similar size stimuli, (C) shows the results from trials with unequal stimuli and finally (D) shows the results from trials with similar stimuli but including a frontal bias term. . . . . . 99 5.1 Heterogeneous clustering of subject’s movement strategies. Plotted are the results of a t-SNE manifold trained on the velocity profiles collected from single target trials in the experiment describe in Chapter 3; the result is an embedding space which groups similar trajectories together. Each color represents a different subject, and just from inspection it is clear there are many subjects with similar overlapping strategies, and others who a quite idiosyncratic. . . . . 108 List of Tables xiv List of T ables 3.1 Logistic models and fitted coefficients for trial success prediction . . 66 3.2 Logistic choice models and fitted coefficients cross trial phases . . . 69 3.3 Logistic choice target size interaction models . . . . . . . . . . . . . 70 3.4 Results from the 50-50 MANOVA test on the principal components of reaction time distributions. . . . . . . . . . . . . . . . . . . . . . 75 Introduction 1 Chapter 1 In tro duction 1.1 Bac kground “Nothing in biology makes sense except in the light of evolution” — Theodosius Dobzhansky, 1973 1.1.1 Mo v emen ts first A common critique of the study neuroscience is that it lacks a central unifying framework or dogma. There are a few foundational principles, like Ramon y Cajal’s neuron doctrine (Bullock, 1959), Sherrington’s theory of integrated action (Sherrington, 1906), or Mountcastle’s more recent cortical microcolumn theory (Mountcastle, 1997), that span the many disciplines and branches of the field, but no overarching structure or paradigm. Unlike the theory of evolution in biology, or the central dogma of molecular biology, there is no central frame of reference or motivating idea. This is in large part because neuroscience, and especially decision or motor neuroscience, are inherently interdisciplinary fields that have grown from the intersection of many different lines of academic thought. Without this orienting principle, it becomes difficult to bridge the languages, concepts and ideas from the disparate fields of study and build on them. However, within the field of motor control, there is increasing support for a movement-first dogma; all nervous systems originally evolved, and ultimately ex- ist to coordinate movement (Graziano and Webb, 2017; Cisek, 2019; Kaas, 2006). Introduction 2 Of course, this is not a statement that can be directly tested or falsified, but a scientific paradigm ( Kuhn, 2012) that outlines a direction for orienting study and making testable predictions. This is, no doubt, a somewhat self-interested view for motor theorists, but it provides a clear and useful framework to organize the study of biological movement and decisions. It is especially helpful in providing a clear rationale for determining which evidence and concepts are most relevant from the many overlapping influences on decision making. Some of the best evidence supporting this viewpoint does not come from the field of motor control, but comes from the evolutionary record. Amidst the explosion of complex life in the late Ediacaran and early Cambrian, an ancestral form of the hydra diverged from the demospongiae (sponges), (Erwin et al., 2011; Budd, 2008) and marked the introduction of the first nervous ‘net’, a precursor to the nervous system. This proto-organ system was capable of using its photoreceptor and other sensory cells to modulate its motor cells (Bode et al., 1988). All evolutionary descendants of this common ancestor share a descendent of this early nervous net, and the capability of self directed motion. The genetic lineage that starts the chordata family, comprising of all vertebrates, diverges relatively soon after, and there is widespread evidence that the gross anatomical structure has been highly conserved over evolutionary history (Holland and Holland, 1999). To put it plainly, this paradigm argues that all nervous systems, including our own brain, exist to answer the question, “How to move next?” With this paradigm in mind, we will trace out the foundations of the current theory of neural decision making for movement. 1.1.2 F rom the top do wn; utilit y theory to game theory It is possibly because evolution has shaped nervous systems to be so well adapted to answering this question of “How to move next?”, that most of the history of thinking about decisions has focused on the question of “What to do next?”. The process of creating movement comes so naturally and without thought that it was easy to overlook, especially in comparison to the unavoidable struggles of conscious deliberation. The relative difficulty of thinking about abstracted choices rather than physical interactive ones created a comparative advantage for those who had the resources to focus on these more cognitive skills. Introduction 3 This meant those who had the first had the time and space to carefully analyze de- cisions had the most to gain from the study of abstracted and hypothetical choices. And as such, the first formalizations of choices considered games of chance, be- ginning with Pascal and Fermat creating the outlines of statistics while trying to analyze hypothetical wagers (Ore, 1960). The first full mathematical analysis of choices that embraced randomness in outcomes was Bernoulli’s revised answer to the same wager, and introduced the abstraction of probabilities to represent situations that were not individually predictable, but had a consistent underlying distribution (Bernoulli, 1954). In addition to the mathematical concept of proba- bilities, Bernoulli‘s introduced an important concept of ’utility’, a measure of the desirability of possible gains that encapsulated the subjective rather than absolute value of an outcome. These mathematical tools allowed for quantification and reasoning about ‘risk’, and the mathematical processes were symbiotic with the newly developed me- chanical and industrial processes. It is difficult to overstate how impactful the development of these early methods of quantifying decisions were, so much of the foundation of modern economies, from insurance to financial and capital markets are fundamentally dependent on the tools of probabilities and utility. But, in large part because of the explosion in scale engendered by industrialization, the problems arising from not having good estimate of underlying probabilities started to become more apparent. FrankH.Knightisoftencitedasthefirsttodelineatethisdistinctionbetween‘risk’ and ‘uncertainty’. Risky choices involved unpredictable events, but with known distributions, uncertain choices, on the other hand involved “unknown unknowns” unpredictable outcomes from unknown distributions. Knight’s treatment of un- certainty focused on questions from the perspective of a firm, but this concept of uncertainty was developed into a full logical calculus of choice, called “game the- ory” by Von Neumann and Morgenstern (von Neumann and Morgenstern, 1980). Game theory formalized the rules of utility maximization under uncertainty into a prescriptive theory that; any situation where an agent makes choices about risky or unknown options could be mathematically and systematically analyzed. A parallel type of decision making under uncertainty, which also developed many practical applications, is the study of sequential sampling. Many had noted the enormous potential value in statistical tests which could iteratively process in- coming information to update a prediction or inference, essentially “mechanizing” Introduction 4 a decision process; again, the motivations for this research were almost entirely industrial. During the second world war lead, a Columbia research group lead by A. Wald pioneered sequential hypothesis testing while attempting to improve re- liability and quality assurance on car parts for the war effort ( Berger, 2017; Wald, 1945). This problem was of such importance, several projects in Britain developed similar methods at the same time, the extent of which was not fully declassified until the 1980s (Barnard, 1946). It is not an exaggeration to say that the development of these theories of decisions paralleled and partially facilitated an enormous amount of the technological, eco- nomic and political change through the industrial age. But it is important to note that many, if not all the origins and motivations for the study of decision making revolved around industrial, mechanical or mathematical abstractions and embody a “top-down” view of the world. Because of this legacy, much of the study of human decision making, descends from this formulation of a decision-maker as a ‘rationalagent’, whichsystematicallyseekstomaximizeaninternalformofsubjec- tive utility. This description, of human choices determined from utility estimates, was first fully given by Samuelson in 1937; while writing still purely in the realm of economics, he demonstrated that given a consistent set of indifference points, a person would make the same decisions “as if” they had explicit internal utility values (Samuelson, 1937) and acted to maximize the total experience utility. 1.1.3 Psyc hology meets homo economicus However, the historical roots of these ideas were all fundamentally normative or prescriptive; trying to identify the best or most optimal choice in a decision, not describe or predict what humans really do. The development of these models of decision making did not initially generate much interest in psychology, partly be- causeoftheireconomicfocus, butalsopartlybecausepsychologistsweresuspicious of many of assumptions underlying the “economic man”. It was Ward Edwards who introduced and translated these ideas into a psychological context, arguing for the value of these frameworks to guide quantitative structures for experiments (Edwards, 1954). For psychologists and those that primarily studied human behavior, the idea that people acted as if they were optimizing some measure of utility was difficult to Introduction 5 accept; but, the challenge was to identify specific instances in which people sys- tematically, and better yet, knowingly, deviate from the predictions of utility maximization. Maurice Allais (Allais, 1953) was the first to identify a choice framing that consistently led to violations of utility maximization that could not be explained away by factors like people’s misunderstanding or lack of time or information. It would not be sufficient to merely point out that people often make suboptimal choices with respect to utility; it was necessary to show that people did so deliberately, knowingly, and predictably. After Allais and later Ellsberg (Ellsberg, 1961), researchers continued to identify diverse instances in which people systematically violated the predictions of util- ity theory, often simultaneously identifying a heuristic or mental short-cut that explained the violation. To Herbert Simon, these were examples of “bounded ra- tionality”, in which people’s choices reflected maximizing utility within the bounds mental of capacities and processes (Simon, 1972). While some have continued to interpret bounded rationality as the shortcomings of a system attempting to emu- late utility maximization, Simon himself, and many others have argued that these heuristics are natural features of the underlying shape of the cognitive system. These systems deviated from utility maximization not because of they were not capable; but because the heuristic-like system is only sometimes wrong, and oth- erwise cheaper, faster and usually better adapted to our needs as humans (and our animal ancestors). Among those who believed this were Kahneman and Tversky, who introduced ‘prospect theory’ (Kahneman and Tversky, 1979), a mathemati- cal theory explaining many of the violations of expected utility theory in decisions under risk. Importantly, this theory allowed for quantitative predictions of choices and computational features that might be found in the underlying neural mecha- nisms. The introduction of sequential sampling analysis to psychology also established another approach to mathematically modeling decisions in psychology. Initially they were used to model a simple memory retrieval tasks (Ratcliff , 1978), but have mainly been used to model perceptual choice tasks. The sequential sampling models were adapted into dynamical systems, which modeled the accumulation of evidence as a time varying ‘decision variable’; a ‘decision’ is reached when the decision variable exceeds one of the pre-specified thresholds. The earliest version (the “diffusion model”) was capable of accurately modeling choices, errors and Introduction 6 reaction times for a subject responding to a binary choice (Ratcliff and McKoon , 2008), but was not straightforward to extend beyond two choices. A variation extended this model to any number choices, representing each as a “leaky integrator” of evidence, and also borrowed the concept of mutual inhibition between all choices to model the winner-take-all dynamics exhibited in multi- alternativedecisions(UsherandMcClelland,2001). Anotherrelevantdevelopment building on the leaky integrators, multi-alternative decision field theory ( Roe et al., 2001) (MDFT) introduced a layer of valences that are feedforward connected to the accumulators representing available choices. An ‘attentional’ mechanism is also introduced that transforms the incoming sensory input, amplifying the signal of the ‘attended’ dimension. Because of these modifications, the model is capable of recreating the “elimination by aspects” heuristic (Tversky, 1972) observed in multi-alternative and multi-attribute decision making by humans. The development of mathematical models of decisions, from prospect theory to neural field theory, is particularly notable, not only because they represent an incredibly productive cross-pollination of ideas in economics, psychology and neu- roscience. But also because these simple models have been applied successfully to predict or describe phenomena across all three of Marr’s levels of analysis (Marr and Poggio, 1976). Largely, these models were initially aimed at the ‘computa- tional’ level; utility and prospect theory only attempt to simulate an abstraction of the underlying processes. Similarly with diffusion and accumulator models, which also began as totally abstract descriptions of processes (Ratcliff , 1978), but have been extended to test predictions about representations and mechanisms (Usher and McClelland, 2004). Remarkably, these models have even proven useful at the representation and hardware level, directly modeling neural activity during decision tasks (Ratcliff et al. , 2003; Schall et al., 2011; Smith and Ratcliff , 2004). Yet, the applicability of these dynamic decision models, including decision field theory, in the analysis of movement choices is limited for a number of reasons. As we will revisit in detail, these models have largely been concerned with specific typesofdecisions; theoptionswerediscrete, categoricalandusuallyaboutabstract concepts like currency or future events. Firstly because movement decisions, and our internal states are continuous by nature and not discrete or categorical. Sec- ondly because movements take place in noisy dynamic environments with shifting task constraints that require feedback driven systems to make smooth movements. Introduction 7 Andfinallybecausetherearesignificantcoststhatoccurintheprocessofanaction that can be independent of the costs or rewards of the outcome. 1.1.4 F rom the b ottom up; reflexology to optimal feedbac k con trol The specific question of motor coordination, how humans and other mammals se- lect and execute movements in order achieve a desired outcome, has a shorter his- tory than that of economic decision making. This is partly because the technology to properly study movements is relatively recent, and also, arguably, because out- side artistic expression and aesthetics there were not many practical applications for better understanding movement. It is an interesting historical parallel that the first application to bring interest in systematically studying human movement was Frederick Taylor’s work on optimizing and rationalizing laborers in industry. Similarly, one of the first publications specifically on the subject of motor control focused on the accuracy of movements by even ‘unskilled’ laborers (Woodworth, 1899). Arguably the founding narrative of the motor control field is Sherrington’s theory of integrative action (Sherrington, 1906), which introduced a remarkable amount of the concepts in the field. A key aspect of this theory was his rejection the prevailing ‘reticular theory’ of the nervous system and full embrace of the nascent neuron doctrine. Recognizing that nervous cells are individual units, differentiated into effectors, receptors and interneurons connected by synaptic communication junctions, makes an explanation for short one-way reflex loops simple and easily testable. The concept of reflex loops plays a central role in Sherrington’s theory; “the main secret of nervous co-ordination lies evidently in the compounding of reflexes” ( Sherrington, 1906). This was likely due to another insightful choice in focusing electrophysiological studies on the spinal cord, which was, in surgical terms a more practical option. His discoveries of the wide behavioral repertoire of the spinal cord made it natural to describe the workings of the nervous system as coordinated system of reflexes. The central role of reflexes and feedback loops was again emphasized by Bern- stein, who also developed his theories in close observations of industrial workers (Bernstein, 1967). He also noted a tendency to “freeze” degrees of freedom, while engaging in a particular task. According to him, these were all in effort to solve Introduction 8 the biggest problem faced by the neuromuscular system; the human skeleton con- tains many joints, each joint is actuated by many muscles, and each movement can be performed by muscles in many ways, leading to an enormous amount re- dundancy. The role of reflex loops and locking unused movement dimensions was to enable descending control from higher nervous centers in a simplified planning space, rather than joint or muscle space. The concept of a reduced space for planning or control was further developed to address unexplained patterns of variability in movement tasks, as described in the “uncontrolled manifold hypothesis” (Scholz and Schöner, 1999). In repeated movements, subjects have repeatedly shown smaller amounts of variability in task relevant dimensions than the baseline noise level in the motor neuron signal would suggest, however, variability grows in any task irrelevant dimensions. This implies that a task defines a subspace or manifold based on its success criteria that exists within the larger controllable space, e.g. the task of swiping on a phone exists on a subspace that sits on the Cartesian plane where your finger intersects the screen. The effects noted in the uncontrolled manifold hypothesis were explained through the lens of control theory in a framework called ‘optimal feedback control theory’ (TodorovandJordan,2002). Inthisframework, ataskspecificsetofconstraints(a cost function) and pre-specified behavioral goal is assumed; movements are shaped to minimize the cost function while achieving the goal. Whenever the movement starts to deviate from the optimal plan, due to motor noise or environmental changes, onlyerrorsindimensionsrelevanttothegoalarecorrected. This‘minimal intervention’ principle allows variability to pool in the task irrelevant dimensions, as noted in the uncontrolled manifold hypothesis, and observed in the patterns of error corrections by human subjects (Cole and Abbs, 1987; Robertson and Miall, 1997). One problematic limitation of this framework is the need to pre-specify a task dependent cost function. This is not a straightforward task because there have been many different cost functions found to apply in some situations but not others; inpoint-to-pointreachingmovements,squaredminimizedjerkofthehand‘s position famously well predicts the arm’s trajectories (Hogan, 2003). In others, minimizing torque about the joints predicted movement shapes (Uno et al., 1989), while others have argued that minimizing the endpoint variance is most predictive. Finally, some argue that given the signal dependent nature of noise in motor commands, a penalty term (or “regularizer”) for the squared magnitude of the Introduction 9 neural command signal is equivalent to the minimum variance cost (Harris and Wolpert, 1998; Diedrichsen et al., 2010). There is a straightforward evolutionary argument that metabolic energy consumed by the muscles during a movement would be an appropriate cost to minimize, given the competitive pressures. A few studies have argued and found evidence supporting metabolic energy as an optimization criteria (Alexander, 1997; Nelson, 1983), or at least the movements trend toward metabolically optimal (Huang et al., 2012). However, others have found strong evidence against the notion that metabolic energy is optimized (Kistemaker et al., 2010), and a another found that the physical cost features minimized are not necessarily consistent on a particular task as the dynamics of the environment change (Berniker et al., 2013). Despite of the limitations imposed by requiring a ‘goal’ and associated ‘cost func- tion’,optimalfeedbackcontroltheoryhasbeenwidelysuccessful(LiuandTodorov, 2007; Diedrichsen et al., 2010), both in predicting human movement as well as a practical framework for generating and simulating movement. While there is yet no direct neural evidence that the nervous system is specifically estimating, and trying to minimize a quantity like jerk or metabolic energy, the evidence does sup- port the idea that in these situations, the neural motor circuit acts “as-if” it were optimizing in this way. This makes optimal feedback control theory very useful in creating models of behavior, but by design, it is not concerned with motivation states or internal utilities, it is only concerned with the how of the movement itself and not the why. 1.1.5 The Serial mo del: a ‘go o d’ based view Given that the models of the “top-down” approach assume a decision results in specifying a desired outcome, and the “bottom-up” optimal feedback control the- ory of coordination presumes a behavioral goal, it would seem natural to simply connect the two models. This is roughly the approach underlying the current prevailing framework for the neural mechanisms involved in decision making. The canonical theory is called the ‘good’ based theory and is given in (Padoa-Schioppa, 2011) (other reviews with similar descriptions Rangel:2008cf, Kable and Glimcher (2009)). In this theory, ‘good’ is a domain general neural representation of subjective utility that combines all internal and external contingencies into a “common currency”. Introduction 10 3 2 3 4 representation valuation selection planning & movement 2 MT OFC vmPFC PPC M1 V1 ITS SMA BG 4 4 1 1 1 Figure 1.1: Sk etc h of the information flo w and temp oral ordering of the decision pro cess as describ ed b y the ‘go o d’ based mo del ( P adoa- Sc hioppa , 2011 ). The first phase is represen tation ( 1 ), incoming sensory infor- mation is used to form represen tations of p oten tial outcomes. In the second phase, v aluation ( 2 ), the prefron tal cortex assigns an estimated sub jectiv e util- it y v alue to eac h p oten tial outcome. The outcome assessed to ha v e the b est sub jectiv e utilit y to cost trade-off is selected ( 3 ). And finally , the desired out- come is passed to the motor system and transformed in to an action plan and then executed ( 4 ). The ‘good’ value is estimated for all possible outcomes, and the outcome with the greatest value is selected. After choosing an outcome and communicating it to the motor areas, a “goal-to-action” transform is computed and the desired plan acted out by the neuromuscular system. The assertion that action planning occurs only after selection of a desired outcome is central to this theory, because it argues that decisions occur between potential outcomes. The ‘good’ based model, and many other cognitive theories, strongly argue that the orbitofrontal and ventromedial prefrontal cortex are the regions responsible for handling abstract decision making and representing subjective utility values. Introduction 11 The authors of the ‘good’ model were also the first to report neural recordings from the orbitofrontal cortex that appeared to directly encode subjective utility (Padoa-Schioppa and Assad, 2006). But there has been a very long history to the idea that the prefrontal cortex is responsible for higher cognitive processes like decision making or valuation, first informed by clinical reports of neurological damage (Bechara et al., 2000) and later explored systematically with lesion studies (Gallagher et al., 1999). Further neurophysiological studies in primates identified several more character- istics about prefrontal cortical activity that supports the argument that these regions encode subjective utility. One study demonstrated that these were “menu- invariant” representations, meaning that they were independent of alternative choices (Padoa-Schioppa and Assad, 2007); this a critical property to support transitive preferences. Another reported evidence for dynamic range adaptation of these values, activity in the OFC adjusted in sensitivity to optimally discrimi- nate between values (Padoa-Schioppa, 2009). Imaging studies of humans have also consistently reported that activity in the OFC and vmPFC correlates with subjective expected value, but there is a fair amount of disagreement about the specifics of the encoding. Across a number of imaging studies, the most medial portion of the OFC (overlapping with vmPFC) was most strongly associated with the hedonic utility of an outcome, the reward without costs subtracted (Kringelbach, 2005). Activity in the remainder of the OFC reflected a mix of the intrinsic subjective value (immediate pleasure), com- puted value, and net value (minus the costs) (Peters and Buchel, 2010; Sescousse et al., 2010). All of these results are consistent with a representation of subjective utility value in the prefrontal cortex, but, as we will address, the framing and design of these experiments makes it difficult to extend this certainty to decisions about movement. 1.1.6 Limitations of the serial mo del In the previous section, we introduced the current prevailing description of how neural processes are organized to perform any kind of value based decision making, and discussed the wide range of evidence supporting the orbitofrontal cortex as the site of expected utility representation. The ‘good’ based decision framework looks like it well describes all of economic behavior we might observe from a human, Introduction 12 action plan movement “motor circuit” SMA-BG-M1 Intergation Areas (V4, MT, ITS) PPC OFC, vmPFC OFC representation size “goal to action” transfrom outcome selection motivation patience risk preference effort preference cortical region encoding space commodity quantity delay risk cost Outcome A Value Neuromotor Task ‘Good’ Outcome Outcome B Value External Features Internal Features chosen outcome Figure 1.2: Information flo w in the serial, ‘go o d’ based mo del . Ex- ternal features are pro cessed from incoming sensory information and com bined with in ternal features and biases to form estimations of the exp ected utilit y of p oten tial outcomes. The outcome asso ciated with the greatest utilit y (or ‘go o d’) is selected, and pro jected to the cortical motor areas where a “goal to action” transformation is p erformed to turn the desired outcome in to a mo v emen t plan. lying still in a small diameter tube using buttons to give responses. And while it could be argued that since these are economic choices, about abstract entities, and so physical setting should not matter, it is specifically argued that the same process applies to decisions that involve significant action costs ( Padoa-Schioppa, 2011). Similarly, the authors of this argument for a similar framework (Rangel et al., 2008), also highlight foraging and other movement heavy interactive behaviors as examples of value based decision making. While Padoa-Schioppa and colleagues, authors of the ‘good’ based view, continue to argue that these neural decision making frameworks are suitable to explain abstract economic decisions and physical movement choices; there are several fun- damental problems in applying this theory to movements. In discussing these cri- tiques of this model, we will step back into the movement-first paradigm which will Introduction 13 help to make the limitations of the framework clear. The first is that an increasing amount of neural and behavioral evidence directly contradicts the assertion that ‘choosing’ and ‘coordinating’ are separate processes. Second, the evidence in sup- port of a single unified representation of ‘good’ or subjective utility is conflicted at best. And finally, the historical legacy of the study of decision making has im- parted a serial, “top-down” bias that focuses on problems with discrete states and actions which is incompatible with the dynamic demands imposed on the nervous system. The serial model asserts that deciding on a desired outcome (choosing) is a cog- nitive process that is completely separate from the process of planning an action; after an outcome is selected, it is passed to the planning areas which perform a “goal-to-action” transform (Padoa-Schioppa, 2011). But, this serial and strictly sequentialorderingisflatlycontradictedbyanumberofstudieswhichneuralactiv- ity representing reflected motor and decision parameters for one or more potential actions was observed in cortical regions assumed to be strictly motor (Platt and Glimcher, 1999; Dorris and Glimcher, 2004; Janssen and Shadlen, 2005; Sugrue et al., 2004; Cisek and Kalaska, 2005). Additionally, pharmacological inactivation of some of the same motor planning areas in the parietal cortex, has been observed to result in greater distortion in choice preferences than motor dysfunction (Wilke et al., 2013; Christopoulos et al., 2015; Wardak et al., 2002). These observations are not only inconsistent with the assertion that action se- lection and planning are separate, the mixing of motor and decision variables across a number of cortical regions undermines the assertion that a single unified representation of subjective utility is used when forming a decision. Similar inac- tivation studies of the superior colliculus (McPeek and Keller, 2004) also resulted in disruptions in choice of saccade target than in motor performance. A head orientation task found evidence that subcortical vestibular system neurons were better predictors of expected reward than cortical signals (Liu et al., 2012). And a wide array of studies have found reward activity in the striatum that is capable of encoding more than one choice, across multiple time scales, and even uncertainty (Hikosaka et al., 2008). Interestingly, while there are many neural imaging and electrophysiological studies that show activity in the vmPFC or OFC represents expected utility of the chosen option, a recent study in rats used recording and optogenetic inactivation to show evidence that the signals in the OFC were critical for learning but not within trial choice (Miller et al., 2018). Introduction 14 The final critique of the prevailing neuroeconomic view is that the framing and type of decisions that have been studied has imparted a “top-down” perspective that is difficult to reconcile with what we know about the motor system. It is worth returning to the historical origins and framing of these decision making problems, the ‘good’ based framework inherits a perspective on decisions that is originally formulated for the games and concerns of gentlemen; wagers, industrial and military strategy, or speculative finance. These types of activities almost all have a number of distinct underlying proper- ties. First, there are very explicitly defined objectives, and the value of an action can almost always be precisely quantified working back from the objectives. An- other property of these games is the underlying states are generally discrete, well defined and often even fully observable. These states also exist in more or less a closed system that implicitly assumes a relevant time window, either through discounting rates or truncating of probabilities. Finally, the actions or decisions are discrete, centrally dictated and executed in a quasi-static turn based fashion; there is the world state before the action, then the agent selects and performs an action, directly resulting in a final world state. In a wager, or at least in the mathematical representation of one, there is: an initial state, such as the cards in your hand and the central pot, an opportunity for action, immediately followed by the outcome of that action. Unlike deciding how to walk across the room and sit at the poker table; which can be accomplished in innumerable ways with nearly as many constraints, and in such a scenario, each movement may reveal information. A strategic, game theory type analysis of the scenario might conclude; ”walk confidently but not briskly to the closest side of the chair and sit with relaxed posture.” Which might be optimal advice, and maybe a sophisticated model could predict a desired walking pace and gait, but does this at all capture the complexity and specificity of what the neuromuscular systems need to accomplish? The bias towards studying these serial, semi-static (or turn based) structured decisions has been compounded by the technological constraints of neural imaging andmostneuralrecordings. Instudiesthatrelyonneuralimagingtechnologieslike fMRIthetimeresolutionmeanthatthetimecourseofdecisionscannotbeprecisely studied, and the movement constraints mean the response modalities are limited. Even beyond neural imaging, the practical and technological considerations of Introduction 15 collecting movement data in controlled settings have also discouraged the focus on movement in decision studies. We argue that theses implicit biases and assumptions are consequences of the “top-down”, serial worldview that is embedded in the types of games and choices studied in theories like game theory. The strict serial ordering paints a picture of purely feedforward information processing, from senses, to outcomes, then action plans, and finally movement. But, from the very beginning of the field, motor theorists have recognized the central role of reflex loops, and later central pattern generators, in coordinating movement. Finally, movement decisions are by nature not discrete or categorical decisions; they are both continuous in their neural control; which is problematic for multi- alternativedecision theories like MDFT, because they stillusually assume choiceis exclusive. Andmovementsarecontinuoustemporally,whichmakestheassumption that action related costs are estimated and integrated into the representation of subjective value, if not implausible, certainty questionable. If there are many possible movements to achieve a desired outcome, how is this summarized? How arethemanyfactorsthatinfluencethiscost, likeposture, fatigue, clothingorother environmental considerations incorporated in such rapid fashion? If conditions change, do we reconsider the entire decision or only the movement? 1.1.7 Dynamic decision making The shortcomings of the serial, ‘good’ based paradigm, and specifically its inability to explain the neural activity leading up to a decision in primates, have motivated the development of alternative frameworks for decision making in the brain. These models are also influenced by the evolutionary and ethological arguments that support the movement-first dogma introduced earlier; the nervous system evolved under a continuous pressure for dynamic interaction (Cisek, 2019). And as we know from the evolutionary record, the original role for the nervous system was to coordinate movement. Recognizing the primacy of movement also enforces the constraint that evolu- tion by mutation and natural selection imposes; any new subsystem is “formed by numerous, successive, slight modifications” to a simpler, functioning system (Darwin, 1859). It is an already implicit assumption made through the use of comparative neuroanatomy; but, since it is now understood “newer” brain regions Introduction 16 action specification action selection control loop MT OFC vmPFC PPC M1 V1 ITS SMA BG ACC adapted from (Cisek, 2007) Figure 1.3: Sk etc h of the information flo w through neural structures as prop osed in the affordance comp etition theory , adapted from Figure 1 in ( Cisek , 2007 ). In this simplified view, incoming visual input is transformed in to features for in teraction (affordances) that are used to sp ecify actions; the classic example b eing a handle on a m ug, whic h enables in teraction through an immediately ob vious and stable grip. These affordances and other cues trig- ger the formation of plans for taking action that are distributed across the PPC , M1 and SMA , with recipro cal connections to the basal g anglia. P o- ten tial action plans comp ete for selection through recipro cal inhibition; other cortical regions participate in the action selection pro cess b y estimating some c hoice con tingency (risk, dela y , cost, etc.) and mo dulating the activit y of the corresp onding action plan accordingly . are not distinct additions, and rather formed through differentiation of existing brain regions and shifting of existing axonal projection patterns (Deacon, 1990), the constraints and conclusions we can draw from comparative neuroanatomy are strengthened. This means that any ‘cognitive’ or other higher level processes we observe in humans and other animals must have evolved from, and in parallel with, the processes for motor control, and economic decisions are an extension or movement decisions not the other way around. A detailed example of such an ‘action’ based decision making framework is the affordance competition theory (Cisek, 2007), which argues that environmental Introduction 17 (and sometimes internal) cues reflexively trigger the formation of plans for move- ment or interaction (Gibson, 2014). The decision process occurs as a ‘competition’ between potential action plans, through lateral inhibition connections in circuits spread between the basal ganglia - thalamocortical “motor circuit” (Alexander and Crutcher, 1990). Other cortical areas participate in action selection by estimat- ing a contingency of a potential action plan (its risk, physical effort or emotional response) and projecting excitatory activity to the action plan representation. The affordance competition theory was specifically proposed to explain the neural activity that reflected both movement planning activity and reward valuation for two distinct action plans, recorded in the parietal cortex of macaque monkeys, which was assumed to be strictly involved in movement and not decisions (Cisek and Kalaska, 2005). This evidence strengthened the case made by early obser- vations These observations are not consistent with the ‘good’ based model’s hard delineation between the cognitive processes of deciding and movement planning. In the original proposal of this theory, a dynamic computational model, inspired and modeled on decision field theory is presented that makes predictions about behavior and neural activity (Cisek, 2006). The choice nodes of decision field theory are reinterpreted as neurons within a small sensorimotor map that make up a population coding for a specific behavior. Two layers are added to the model between the sensory input and the choice neurons, which each loosely represent a cortical region and compute a specific decision feature that is passed to the choice layer. Finally the choice nodes are associated with an optimal controller, tuned to a specific variation of the output behavior; the output is determined by an activity weighted average over the all the choice nodes. This computational model is particularly powerful because it allows us to link behavioral results in humans to neural observations in other animals. It bridges three levels of Marr’s levels of analysis, linking the computational processes of action selection to theories about the representations (neural population coding of planning space) and potential neural mechanisms (winner-take-all through recip- rocal inhibition). This enables approaches like virtual lesion studies, which have made predictions consistent with the observations in pharmacological inactivation experiments (Christopoulos et al., 2015). Another important prediction made by this framework is that unresolved competi- tion between action plans at the time of movement initiation results in a weighted Introduction 18 g b r b r g M1 PMd PFC PPC V1 “dorsal stream” “ventral stream” motor plans motor policies semantic information spatial information action Figure 1.4: A sc hematic diagram of the computational mo del based on the affordance comp etition theory of decision making . Eac h la y er of the mo del is depicted b y a set of grey circles, eac h represen ting a neuron or unit tuned to a sp ecific parameter; i.e. a mo v emen t to w ards a particular direction. Visual information from V1 is pro jected along the v en tral and dorsal streams, whic h extract spatial and seman tic features. This information driv es the activ ation of mo v emen t plans in the PMd , and motor p olicies in M1 . average of the active plans. Given that these action plans are roughly similar, the resulting movement should reflect a “spatial averaging” of these actions plans, a phenomenon that has been noted in saccade and reaching studies using distractors (Platt and Glimcher, 1997; Wardak et al., 2002; Welsh et al., 1999; Tipper et al., 1998; Sailer et al., 2002). Since these action plans are actively “in competition” and modulated by the valuations occurring in other cortical regions, the resulting movement should reflect the temporal integration of decision variables. A number of human psychophysics studies have developed experimental designs that force subjects into action while only incomplete information about reward or action cost is available. The dynamics of movement during this initial period Introduction 19 of uncertainty provides a window to probe the effects caused by manipulating specific decision parameters. Combined with computational models, this design of experiment enables us to safe and non-invasively probe the decision process in humans. What if you are not certain? 20 Chapter 2 What if y ou are not certain? A common computation underlying action selection, reaction time, and confidence judgemen t Abstract From what to wear to a friend’s party, to whether to stay in academia or pursue a career in industry, nearly all of our decisions are accompanied by a degree of confi- dence that provides an assessment of the expected outcome. Although significant progress has been made in understanding the computations underlying confidence judgment, the preponderance of studies measures confidence at or after a decision. However, confidence has a time-course and can influence not only the final choice, but also any action taken before the final decision. In the current study, we in- troduce pre-decision confidence as distinctive behavior in dynamic decisions that evolve while acting. In these types of problems, people cannot wait to accumulate information about the alternatives, but instead, they have to decide while acting. Using a reaching task with goal-location uncertainty, we test the hypothesis that confidence about the current best action affects not only the final choice, but also any action taken before the final decision. By comparing experimental findings with model predictions, we provide direct evidence that action selection, reaction time and choice confidence all emerge from a common computation in which paral- lel prepared actions compete based on the overall desirability of targets and action plans. What if you are not certain? 21 2.1 In tro duction On January 15, 2009, the US Airways flight 1549, a domestic flight from La Guardia airport in New York City to Seattle/Tacoma, experienced a complete loss of thrust in both engines after encountering a flock of Canada geese. As the aircraft lost altitude, the air traffic control asked the pilot if he could either return to La Guardia or to land at the nearby Teterboro airport. Having less than 5 minutes after the bird strike to land the plane, the pilot rejected both options, because he was not c onfident that he could make any runway. Instead, he safely glided the plane to ditch in the Hudson river. Later investigation showed that the low altitude and the lack of power on both engines would not allow for a successful landing to either airport. This incident describes a ubiquitous situation in which choice confidence - i.e., the subjective belief that a given action is more desir able than any alternative - has a key role in guiding behavior, especially in dynamic decisions that are made under pressure and while acting. Although confidence is an essential component in human behavior, only recently have we begun to decipher the computations underlying confidence. However, most of this under- standing has been built on a fairly restrictive experimental paradigm involving simple decisions like perceptual judgments Foote and Crystal (2007); Hampton (2001); Kepecs et al. (2008); Kiani et al. (2014); Fetsch et al. (2014) and value- baseddecisionsDeMartinoetal.(2012)whereactionsoccuronlyafteradecisionis made. Importantly, most of these studies measure confidence at or after a decision is made - i.e., the post-hoc subjective probability that a made decision is correct given the evidence. They focus on cognitively reportable measures of confidence, which is viewed as an essential dimension of metacognition that influences future directions (meta-cognitive post-decision confidence). The core idea is that confi- dence is computed only after a decision is made and serves as a weighting factor to balance prior knowledge with new observations for making better decisions in the future Meyniel and Dehaene (2017). A recent theory argues against this hy- pothesis suggesting that confidence is not computed in a post-hoc manner, but it emerges within the decision-making process, influencing not only future decisions, but also any action taken before making a decision Meyniel et al. (2015); Dotan et al. (2018). Although the mechanisms of post-decision confidence have been ex- tensively studied, there is no strong consensus on whether pre-decision confidence exists and how it affects behavior before and during the decision-making process - e.g., before the pilot decided to ditch the plane in the Hudson river Pouget et al. What if you are not certain? 22 (2016). This is mainly due to the lack of reliable measurements to monitor pre- decisions confidence. Additionally, in most of the previous studies, individuals did not have to perform an action before selecting a choice, making it challenging (or even impossible) to monitor confidence prior and during the decision-making process. In the current study, we aim to elucidate whether pre-decision confidence exists in dynamic decisions that evolve while acting. Here, people cannot wait to accumu- late information about the alternative options before selecting an action. Instead, they have to decide while acting. Therefore, we hypothesize that confidence that an action will lead to a better set of outcomes than the alternatives emerges throughout the decision-making process, and influences not only the final choice, but also any action planne d and taken before a decision is made. We designed a “reach-before-you-know” experiment that involved rapid reaches to two potential targets presented simultaneously in both hemifields Chapman et al. (2010); Galli- van et al. (2011). Critically, the actual goal location was not disclosed before the movement onset. Dual-target trials were interleaved with single-target trials in which one target was presented either in the left or the right hemifield. By vary- ing the target probability to induce different levels of uncertainty, we tested how goal location uncertainty influences behavior. We found that when both targets had about the same probability of action, individuals did not pre-select one of the targets and correct their actions, if needed, after the goal onset. Instead, they delayed to initiate an action and moved towards an intermediary location, waiting to collect more information before selecting one of the targets - a spatial averag- ing strategy reported in previous studies Hudson et al. (2007); Chapman et al. (2010); Gallivan and Chapman (2014). On the contrary, when one of the targets had higher probability of action, reaches had faster responses and launched closer to the likely target. These findings suggest that target certainty influences both planning and execution of actions in decisions with multiple competing options. Surprisingly, the relationship between approach direction with reaction time was not fully mediated by the target probability. Instead, when people waited longer to initiate an action, reaches were frequently launched towards an intermediary location between the potential goals, regardless of the target probability. To better understand the relationships between goal uncertainty, reaction time and trajectories, we modeled the decision task within a recently proposed com- putational theory Christopoulos et al. (2015); Christopoulos and Schrater (2015). What if you are not certain? 23 This theory is an extension of the evidence accumulation models and builds on the affordance competition hypothesis, in which multiple actions are formed concur- rently and compete over time until one has sufficient evidence to win the competi- tion Cisek (2007); Cisek and Kalaska (2010). We replace evidence with desirability - a continuously accumulated quantity that integrates all sources of information about the relative value of an action with respect to alternatives. Reaching move- ments are generated as a mixture of actions weighted by their relative desirability values. In analogy with the normative evidence accumulation models Beck et al. (2008); Pleskac and Busemeyer (2010); Kiani et al. (2014), we determine choice confidence through the desirability values. Ambiguous desirabilities indicate that the net evidence supporting one option over the others is weak and therefore the confidence level about the current best action is low. On the contrary, when one action outperforms the alternatives, the net evidence is strong and choice con- fidence in high. Therefore, the “winning” action determines the selected target and the reaction time, whereas the “losing” actions contribute to the computation of confidence - i.e., the closer the desirability of the non-selected actions to the desirability of the selected one, the lower the choice confidence. This is similar to the “balance-of-evidence” idea used in evidence accumulation models to deter- mine the degree of post-decision confidence in perceptual judgment tasks Vickers (1979). Because desirability is time- and state- dependent, and action competition often does not end after movement onset, selected actions can be changed or cor- rected in-flight (i.e., change of mind) when confidence is sufficiently low, and/or in the presence of new incoming information. Hence, the model predicts that both movement direction and reaction time can be used as easy-to-measure proxies for confidence. When people are uncertain about the current best option, decisions aredelayedbybothmovingtowardsanintermediarylocationandbyhavinglonger reaction time. In contrast, when they are certain, reaches are initiated faster and move directly to a target. Importantly, the model predicts that the association between approach direction and reaction time is not fully mediated by the goal uncertainty. Instead, action competition can diminish choice confidence leading to slower responses regardless of target probability. Overall, model predictions are consistent with human findings providing direct evidence that action selec- tion, reaction time and choice confidence emerge through a common mechanism of desirability-driven competition between parallel prepared actions. What if you are not certain? 24 2.2 Results 2.2.1 Beha vioral paradigm A schematic representation of the experimental setup is shown in Fig. 2.1. Par- ticipants were instructed to perform rapid reaches using a robotic manipulandum under a “reach-before-you-know” paradigm Chapman et al. (2010); Gallivan et al. (2011) in which either one (single-target trials) or two (dual-target trials) poten- tial targets presented simultaneously in opposite hemifields. For dual-target trials, the cues appeared symmetric around the vertical axis of the screen. By varying the number of potential targets and their probabilities, we induce different level of uncertainty to study the computations underlying choice confidence in action decisions. Each participant ran two separate sessions. In the equiprobable ses- sion, a trial started with participants fixating on a central cross, followed by the presentation of one or two unfilled blue circles in the screen Fig. 2.2A. When the fixation cue was extinguished, an auditory cue signaled the individuals to initiate their responses. Once the reaching movement exceeded a threshold, one of the targets filled-in black indicating the actual goal location. The unequiprobable session was similar to equiprobable except for the dual-target trials, in which one of the potential targets was always assigned with higher probability (0.8) than alternative one (0.2). The targets with the high and low probabilities were in- dicated by unfilled green and red cues, respectively. In single-target trials (i.e., target probability 1) which were randomly interleaved with the dual-target trials in both sessions, a single unfilled blue cue was presented in the left or the right hemifield. The set of target configurations is shown in Fig. 2.2B. Participants achieved an overall success rate around 93% and their performance was similar between the two sessions (93% and 90% respectively). 2.2.2 Initial approac h direction v aries with target proba- bilit y Goal location uncertainty is well known to have a strong effect on reach trajecto- ries, where the initial movement trajectory is aimed between targets. This motor behavior, which has been extensively reported before Hudson et al. (2007); Chap- man et al. (2010); Gallivan and Chapman (2014); Stewart et al. (2014), indicates What if you are not certain? 25 Figure 2.1: A graphical represen tation of the exp erimen tal setup from t w o p ersp ectiv es . P articipan ts (a) w ere seated directly in fron t of a Phan tom haptic rob ot (c), with their index fingers inserted in a finger-tip adaptor (b) and their midline aligned with the cen ter of an LCD monitor (d). Reac hing mo v emen ts to ok place in thex−y plane, +y b eing to w ards the screen and +x b eing to w ards the righ t hand side of the screen. The distance from the head of the individuals to the finger starting p osition along the y axis w as ab out d subject =0.30 m and sligh tly v aried across participan ts. The distance from the finger starting p osition to the screen displa y w as d display =0.35 m. that the approach direction of the initial reaches varies with the target probabil- ity, a finding we replicated. Representative single-trial trajectories (thin traces) fromdifferenttargetprobabilitiesandthecorrespondingaveragetrajectories(thick traces) for goal located in the left and right hemifield are illustrated in Figs. 2.3A and B, respectively. When there was no uncertainty, reaches were made directly to the goal target (black traces). However, when the goal location was unknown at movement onset, but both targets had the same probability, reaches were aimed to an intermediary position between the potential goal locations (blue traces). These spatially averaged movements were reliably biased towards the side of space with the most likely target (green traces). Hence, individuals did not pre-select one of the potential targets prior to movement onset. Instead, they delayed their de- cisions by moving towards an intermediate location to collect more information before taking the final action. We compared the approach direction across partic- ipants, number of targets and probabilities and found that it is directly correlated with the target certainty (best fit linear regression model; R-square = 0.971, p- value = 0.00212 of the linear coefficient) Fig. 2.4A. However, we also found that uncertainty has a big impact on reach timing; specifically, the participants moved slower when they were uncertain about the current best action (see supporting information I for more details). Therefore, our findings suggest that when people are uncertain about the current best action, they both delayed their decision and What if you are not certain? 26 Figure 2.2: T ask design and exp erimen tal paradigm . ( A): A reac hing trial started with a fixation cross presen ted on the cen ter of the screen for ab out 1.5 s. Then, either a single or t w o unfilled cues w ere presen ted sim ultaneously in b oth visual fields. After 300 ms the cen tral fixation cross w as extinguished (“go-signal”), and the participan ts had to p erform a rapid reac hing mo v emen t to w ards the target(s) within 1 s. Once the reac h tra jectory crossed a trigger threshold (red discon tin uous line), one of the cues (or the single cue) w as filled- in blac k indicating the actual goal lo cation. Resp onses b efore the go-signal or reac hes that exceeded the maxim um mo v emen t time (1 s ) w ere ab orted and not used for further analysis. ( B ): The color of the cues in the dual-target trials indicated the target probabilities - blue cues corresp onded to equiprobable targets, whereas green and red cues corresp onded to targets with 80% and 20% probabilit y , resp ectiv ely . Single cues alw a ys had blue color. ( C ): The distance b et w een the origin and the midp oin t of the t w o cues w as d reach = 0.2 m . The distance b et w een the cue and the midp oin t w asd separation =0.15m . The trigger threshold - i.e., distance b et w een the origin and the lo cation that the actual goal lo cation w as rev ealed - w as set to d threshold =0.05 m . moved slower towards an intermediary location between the targets, a strategy consistent with increasing chances of collecting more information before making a choice. 2.2.3 Reaction time v aries with the target probabili t y The dual effects of goal uncertainty on reach trajectory and timing suggest target certainty is incorporated into both acting (trajectory generation) and planning processes. Intuitively, it is reasonable that target probability influences action planning to delay initiating action when uncertain about the best option. This predicts reaction times (RT) would be a direct function of target probability. On What if you are not certain? 27 Figure 2.3: Reac h tra jectories for differen t target probabilities . ( A): Represen tativ e single-trial tra jectories (thin traces) and the corresp onding a v- erage tra jectories (thic k tracres) from single- (blac k trace) and t w o-target trials with equal (blue trace) and unequal (green trace) probabilities, when the actual goal w as lo cated in the left hemifield. ( B ): Similar to A but for actual goal lo cated in the righ t hemifield. T arget probabilit y influences the reac h tra jec- tories. When p eople w ere certain ab out the goal lo cation, reac hes w ere aimed directly to the target. When they w ere uncertain, reac hes w ere launc hed to an in termediary lo cation b et w een the targets and then corrected in-fligh t to the cued target lo cation. The spatially a v eraged b eha vior w as biased to w ards the lik ely target. average, this prediction is validated as illustrated in Fig. 2.4B for single-target trials, two-target trials with equal probability and two-target trials with unequal probability, with RT averaged across participants. While RT is significantly cor- related with the target certainty (best fit quadratic regression model; R-square = 0.994, p-value = 0.002 of the quadratic coefficient), a trial-by-trial analysis showed that the effect on initiation timing was indirect and actually mediated by a latent variable influencing both RT and the approach direction of a trajectory. By plotting RT vs. approach direction separately for the two sessions, we found that changes in RT are independent of target probability and accounted for by ap- proachdirection. Fig.2.4CshowsRTasafunctionoftheinitialapproachdirection across all participants and trials separately for the two sessions. Importantly, RT increases with reaches to intermediary location between the potential goal loca- tions and peaks around 20 ◦ (possibly due to the biomechanical constraints of the reaching movements) regardless of the target probability (best fit cubic regression model; R-square > 0.95, p-value < 0.01 for the cubic coefficient in both sessions). To ensure that this effect was not due to some inherent constraints induced by the experimental setup - i.e., reaches launched to targets located at the center of the screen have longer RTs than reaches aimed to peripheral targets - we varied What if you are not certain? 28 the target separation between 0.10 m to 0.20 m (which corresponds to a visual angle between 26.5 and 45 degrees) in the equiprobable session and computed the RT in the single-target trials. No significant association was found between target location and RT (p-value> 0.197 of the regression coefficients for linear and curvi- linear regression analysis), Fig. 2.4D. In this analysis we used 3 individuals, who were not part of the main experiment and did not go through the training session before running the task. This could explain why RTs were slightly longer com- pared to single-target trials in the two main sessions. Overall, our findings suggest that approach direction and reaction time are driven by trial-by-trial variations in a latent variable, which we identify with decision confidence as we describe in the following sections. 2.2.4 A ction selection, reaction time and c hoice confidence emerge through action comp etition Our results require a decision computation that would produce joint changes in trajectory and RT as a function of trial-by-trial fluctuations in a latent variable. A recently developed theory Christopoulos et al. (2015); Christopoulos and Schrater (2015) predicts exactly these effects using decision confidence as the latent vari- able. In the theory, action decisions are made through a continuous competition of parallel prepared actions by dynamically integrated all sources of information about the quality of the alternative options. The neurodynamic implementation of this theory for a dual-target trial is presented in Fig. 2.5. The framework consists of a set of dynamic neural fields (DNFs), which mimic the neural processes under- lying spatial sensory input, expected outcome, reach cost (i.e., effort) and reach planning Christopoulos et al. (2015). Each DNF simulates the dynamic evolution of firing rate activity within a neuronal population. The functional properties of each DNF are determined by the lateral interactions within the field and the connections with other fields Erlhagen and Schöner (2002); Schöner (2008). The “reach planning” field employs a neuronal population code over 181 potential movement directions to plan motor actions towards these directions. It receives one-to-one excitatory inputs from the “spatial sensory input” field that encodes the angular representation of the targets and the “expected outcome” field that represents the expected outcome of aiming to a particular direction. Each neuron in the reach planning field is projected to a stochastic optimal control system. What if you are not certain? 29 Figure 2.4: Approac h direction and reaction time . ( A): Approac h direc- tion and ( B ) reaction time across participan ts, n um b er of targets and probabil- ities. P ositiv e and negativ e approac h directions corresp ond to reac hes launc hed closer to the righ t and left target, resp ectiv ely . Approac h directions around 0 ◦ corresp ond to reac hes aimed to w ards the in termediate lo cation b et w een the t w o targets. ( C ): Reaction time as a function of the approac h direction in eq uiprob- able (blue trace) and unequiprobable (green trace) sessions ( D ): Reaction time as a function of target separation computed from single-target trials across 3 participan ts. Error bars corresp ond to standard error (SE), solid lines sho w the p olynomial regression fitting (linear in panels A and D, quadratic and cu- bic in panels B and C, resp ectiv ely) and the colored shado w areas illustrate the confidence in terv al of the p olynomial regression results. T arget probabilit y influences b oth the approac h direction and the reaction time of the reac hes. Ho w ev er, reaction time and approac h direction are not fully mediated b y the target probabilit y . Instead, reac hes with longer reaction times often launc h to an in termediate lo cation b et w een the p oten tial goals. Once the activity of a reach neuron i exceeds a threshold γ at the current state x t , the corresponding controller initiates an optimal sequence of actions (i.e., pol- icy, π ∗ ) to move the “hand” towards the preferred direction of that neuron (see materials and methods section for more details). The reach planning field receives also inhibitory inputs from the “reach cost” field that encodes the effort required to implement each policy π ∗ at the current state. The normalized activity of the reach planning field represents the desir ability of the motor actions at any time and state, and acts as a weighting factor on them. It reflects how “desirable” it is What if you are not certain? 30 Figure 2.5: Mo del arc hitecture of the “reac h-b efore-y ou-kno w” task . The neural fields consist of 181 neurons and their spatial dimension spans the semi-circular space b et w een 0 ◦ and 180 ◦ . Eac h neuron in the reac h planning field is connected with a sto c hastic optimal con trol system. Once the activ- it y of a neuron exceeds a threshold γ , the corresp onding con troller generates a sequence of reac h actions to w ards the preferred direction of the neuron. The reac h planning field receiv es excitatory inputs from the spatial sensory input field that enco des the angular represen tation of the p oten tial targets, and the exp ected outcome field that enco des the exp ected outcome of the comp eting tar- gets (blue, red and green Gaussian distributions corresp ond to cues with 0.5, 0.2 and 0.8 target probabilit y , resp ectiv ely). It also receiv es inhibitory inputs from the reac h cost field that enco des the effort required to implemen t the a v ailable sequences of actions - i.e., mo v e to a particular direction from the curren t state. The normalized activit y of the reac h planning field enco des the “desirabilit y” of the M a v ailable sequences of actions (i.e., neurons with activ ation lev el ab o v e the threshold γ ) at a giv en time and state and acts as a w eigh ting factor on eac h individual sequence of actions. Because the relativ e desirabilit y is time- and state- dep enden t, a range of b eha vior from w eigh ted a v eraging (i.e., spatial a v eraging tra jectories) to winner-tak e-all (i.e., direct reac hes to one of the cues) is generated. What if you are not certain? 31 to move to a particular direction with respect to the alternatives. Because desir- ability is time- and state- dependent, the weighted mixture of individual actions automatically produces a range of behavior, from direct reaching movement to weighted averaging. Fig. 2.6A illustrates the activity of the planning field as a function of time for a representative dual-target trial with equiprobable targets. Initially, the field activ- ity is in the resting state. After targets onset, two neuronal populations selective for the targets are formed and compete through mutual inhibitory interactions, while integrating information about the target certainty and action cost to bias the competition. Once the activity of one them exceeds a response threshold, the corresponding target is selected and a reaching movement is initiated. Frequently, the neuronal activity of the unselected target is not suppressed before movement onset, resulting in reaches towards intermediary locations between the targets (top inset in Fig. 2.6A). After the movement onset, the two neuronal ensembles retain activity and compete against each other until the goal onset. To get better insight on the model computations consider two neurons, one from each population, centered at the target locations. Fig. 2.6B depicts the activity of each neuron (i.e., which reflects its current desirability value) as function of time for a dual-target trial with equal (blue traces) and unequal (green traces) target probability. The neuron that exceeds the response threshold first (contin- uous traces) dictates the reaction time and the selected target. Intuitively, if the race between the neurons is a close call (blue traces), it means that the net evi- dence supporting that the selected target is more desirable than the alternative is weakandtherefore individualsshould be lessconfidentabouttheirchoices. Onthe other hand, if the race was a landslide (green traces), it means that one alternative outperforms the other and therefore individuals should be more confident about their choice. Going back to the population analysis, the “winning” population determines the reaction time and the selected target, whereas the “losing” one contributes to the computation of the confidence that the selected option is the best current alternative. Because the selected action is produced by the weighted average of the active individual motor-plans at any time and state, the difference betweenthedesirabilityvaluesdeterminesthemomentarymovementdirectionand the speed - i.e., strong competition between the active neuronal populations re- sults in slower movements towards intermediate locations between the two targets (see supporting information I for more details). Note that in the absence of action What if you are not certain? 32 competition (i.e., single-target trials), the activity of the neuron exceeds the re- sponse threshold faster than when two actions compete for selection (black trace). Hence, reaches have shorter RTs and aim directly to the goal location. Overall, the theory is analogous to the normative race models in perceptual decisions in which two accumulators integrate sensory evidence in favor of two alternative options Vickers and Packer (1982); Kiani et al. (2014). The accumulator that reaches its upper bound faster dictates the reaction time and the choice, whereas the losing accumulator contributes to the computation of certainty that the choice is correct balance-of-evidence hypothesis Vickers (2001)). Figure 2.6: Sim ulated neural activit y and reac h b eha vior . ( A): A represen tativ e example of the sim ulated mo del activit y as a function of time in the reac h planning field for a dual-target trial with the actual goal lo cated in the left visual field. The red discon tin uous lines indicate the target onset, the mo v emen t onset, and the goal onset. The corresp onding reac h tra jectory is sho wn in the upp er inset. ( B ): Sim ulated activit y of t w o planning neurons cen tered at the lo cation of the cued (con tin uous traces) and the uncued (dis- con tin uous traces) target, from a represen tativ e single-target trial (blac k trace) and t w o dual-target trials with equal (blue traces) and unequal (green trace) probabilities. A reac h mo v emen t is initiated when the activit y of one of the neurons exceeds the resp onse threshold (gra y discon tin uous trace). When only a single target is presen ted, the neuronal activit y ramps up quic kly to the re- sp onse threshold resulting in faster reactions and direct reac hes to the target. Ho w ev er, when t w o targets are sim ultaneously presen ted, the neurons comp ete for selection through inhibitory in teractions resulting often in slo w er reaction times and spatially a v eraged mo v emen ts. If one of the alternativ es is assigned with higher probabilit y , the comp etition is biased to the lik ely target leading to faster resp onses. We simulated the two equiproble and unequiprobable sessions within the com- putational theory, using the parameter values presented in the supporting infor- mation II. Consistent with the human behavior, we found that target probabil- ity is correlated with the approach direction Fig. 2.7A (best fit linear regression model: R-square = 0.984, p-value = 0.0005 of the linear coefficient) and the RT What if you are not certain? 33 Fig. 2.7B (best fit quadratic regression model: R-square = 0.984, p-value = 0.008 of the quadratic coefficient). We also tested trial-by-trial association between RT and approach direction and found the same independence from target probability, Fig. 2.7C (best fit cubic regression model: R-square > 0.95, p-value < 0.007 for the cubic coefficient in both sessions). In particular, simulated reaches aimed to- wards an intermediary location between the potential targets had longer RT than reaches launched closer to one of the competing options regardless of the target probability. This is explained by the inhibitory competition between the neuronal ensembles that slows down the reach onset and leads to spatial averaging move- ments, if the population of the unselected action is not completely suppressed at the movement initiation. Considering that the difference between the desirabil- ity values determines the confidence of the selected action suggests that approach direction and RT are not fully coupled but there is a third variable (i.e., confi- dence level) that influences the association between them. That is, the longer that you wait to take an action, the less confident you are feeling about the selected action, because often the unselected one is not fully rejected. Overall, our find- ings provide direct evidence that action selection, reaction time and confidence that the selected option is better than the alternatives emerge through a common mechanism of desirability-driven competition between parallel prepared actions. 2.3 Discussion 2.3.1 General Uncertaintyisubiquitousinourinteractionswiththeexternalworld, anddecisions regularly must be made in the face of it. Even after a decision is made, there is residual uncertainty that persists in the form of subjective choice certainty, reflecting the strength of our belief that an option is better, in the sense it is more likely correct or has a higher expected outcome than its alternatives. Over the past years, many studies have looked at how confidence emerges in decision making Ferrell (1995); Balakrishnan and Ratcliff (1996); Hampton (2001); Pleskac and Busemeyer (2007); Kepecs et al. (2008); Kiani and Shadlen (2009); Kiani et al. (2014); van den Berg et al. (2016); Dotan et al. (2018). The preponderance of thesestudiesmeasuredconfidencebyexplicitlyaskingtheparticipantstoratetheir subjectiveconfidenceoftheirchoices Kianietal.(2014);vandenBergetal.(2016). Postdecision wager methods have also been introduced to measure confidence in What if you are not certain? 34 Figure 2.7: Approac h direction and reaction time of the sim ulated reac hes . ( A ): Approac h direction and ( B ) reaction time of the sim ulated reac hes across n um b er of targets and probabilities. ( C ): Reaction time as a function of the approac h direction in the sim ulated equiprobable (blue trace) and unequiprobable (green trace) sessions. Error bars corresp ond to standard error (SE), solid lines sho w the p olynomial regression fitting (linear in panel A, quadratic in panel B and cubic in panel C) and the colored shado w areas il- lustrate the confidence in terv al of the p olynomial regression results. Consisten t with the h uman findings, the mo del predicts that target probabilit y influences b oth the approac h direction and the reaction time of the mo v emen ts. Ho w- ev er, reaction time and approac h direction are not fully mediated b y the target probabilit y . Instead, the longer it tak es to resolv e the action comp etition, the more lik ely it is the losing p opulation to b e still activ e at the mo v emen t onset, resulting in spatially a v eraged reac hes. nonverbal animals - subjects can opt out a decision for a secure but small reward (“surebet”)whentheyarenotcertainHampton(2001);Kepecsetal.(2008);Kiani and Shadlen (2009). Additionally, normative models, which include drift diffusion, evidence-accumulation, and race models Vickers and Smith (1985); Usher and McClelland (2001); Gold and Shadlen (2002); Mazurek et al. (2003); Krajbich and Rangel (2011); Towal et al. (2013), have been extended to understand the computations underlying confidence judgment Kiani et al. (2014); van den Berg et al. (2016). Although parsimonious, most of the previous experimental studies are highly restricted and limited to perceptual choices made solely on the basis of the accumulation of sensory evidence and before individuals perform an action - i.e., an action is generated only after a decision is made. In these studies, confidence is construed as reflecting the effective amount of sensory evidence at decision time, which is not adequate to account for the subjective choice certainty in complex decisions. Mostimportantly, theymeasureconfidenceatorafteradecisionismade - i.e., post-decision confidence judgment. Although it is challenging to define and measure confidence prior to decision, confidence has a time course that can affect behavior before a decision is made. What if you are not certain? 35 The current study focuses on what has been missing from previous research - “pre-decision confidence” and how it manifests as distinctive behavior in dynamic decisions that evolve while acting. These types of decisions are made in dynamic and complex environments, in which the value and the availability of the options can change with time and previous actions, entangling decision with action selec- tion. Confidence should be state- and time- dependent and reflect all the factors that affect our belief that a given action is better than the alternatives. Here, we adopted this enriched view to explore how confidence emerges in decisions requiring reaching to targets with uncertainty. Confidence was modeled as re- flecting the degree of subjective belief that a potential action is more desirable than its alternatives. We hypothesized that confidence about what is the current best action affects not only the final choice, but also any action taken before the final choice. To test this hypothesis, we designed a “reach-before-you-know” ex- periment in which individuals were instructed to perform rapid reaches to one or two potential targets presented simultaneously in both hemifields. To elucidate the computations underlying confidence, we modeled the task within a recently developed computational theory Christopoulos et al. (2015); Christopoulos and Schrater (2015). It is based on the idea that decisions are made through a contin- uous competitionbetween neuronal populations thatplan individualactions tothe available goals, while dynamically integrating information into a common currency - named relative desirability - to bias the competition. The desirability reflects the belief about the quality of the action and acts as weighted factor on each in- dividual action. The neuronal population that first exceeds a response threshold dictates the reaction time and the selected target. The competing population that did not exceed the threshold contributes to the computation of the confidence; the closer the “losing” population to the threshold the lower the confidence about the selected option. When the activity of the losing population is not completely sup- pressed, reaches are aimed towards an intermediary location between the targets. Because desirability is time- and state- dependent, confidence can change in-flight in the presence of new incoming information. The model predicts a direct association between target certainty with approach directionofthereachesandreactiontime. Whenbothtargetsareequallyprobable, the competition between the two populations is frequently a close call, which means that the net evidence supporting the selected action is weak and we should be less confident about the current best action. This results in slower reaction times and spatially averaged movements to an intermediary location between the What if you are not certain? 36 potential goals. On the contrary, when one of the targets is assigned with higher probability, the competition is biased to the likely target. In this case the net evidence supporting the selected action is strong and therefore we should be more confident about the current best action. This results in faster reaction times and more direct reaches to the selected target. Therefore, the approach direction and the reaction time can be considered easy-to-measure proxies for choice confidence. The model also makes an interesting prediction about the association between reaction time, approach direction and goal uncertainty. In particular, it predicts that the longer it takes to initiate an action, the more likely it is that the losing population will still be active at the movement onset, resulting in lower confidence about the selected option and spatially averaged movements. Hence, reaction time and approach direction are not fully mediated by the target probability, but they are influenced by the confidence about the current best option. Consistent with the model predictions, individuals adopted a spatial averaging behavior to compensate for the goal location uncertainty. Although this behavior has been reported before Hudson et al. (2007); Gallivan et al. (2011); Chapman et al. (2010), the pattern of compensation is better described as buying more time for decisions. When people are uncertain about the current best option, they delay the decision both by moving towards an intermediary location between the targets and by having a longer reaction time. In contrast, when they are certain about the best option, they initiate movements quickly and aim directly to the selected option. In line with the model predictions, trial by trial reaction time was correlated with the approach direction regardless of the target probability. Longer reaction times were often associated with weak accumulated information about the current best option (i.e., strong competition between the desirabilities of the actions). This might suggest that the brain learns to use decision time as a proxy for confidence judgment (see also Fetsch et al. (2014); Hanks et al. (2011); Kiani et al. (2014)). 2.3.2 The risk of conflating evidence accum ulation with pre-decision confidence A recent study, which was conducted in parallel with our work, explored whether pre-decision confidence exists in perceptual judgment tasks Dotan et al. (2018). The experimental paradigm was inspired by the classical Shadlen-Newsome mo- tion direction detection task in which sensory evidence was accumulated to a What if you are not certain? 37 critical level to yield a perceptual decision MN and WT (2001). The participants were asked to detect the direction of a number of arrows (i.e., left vs. right) that were presented sequentially and while they were moving their fingers from the bot- tom of the screen to a response button on the top-right or top-left corner of the screen. According to this study, the direction of the movement trajectory captures the ongoing accumulation of evidence, whereas the movement speed continuously reflects the momentary degree of confidence. Despite the similarities between our work with this study, there are fundamental differences on the experimental pro- cedures. Dotan et al. study assessed the relationship between online evidence integration and measures of post-decision confidence through movement param- eters, forcing subjects to continuously move during the presentation of evidence. Movement parameters that were correlated with both the post-decision confidence measure and the evidence were thought to reflect pre-decision confidence. As a way to assess pre-decision confidence, this study has two major confounds. First, the most salient result is that the movement parameters close to decision time are more correlated with the post-decision confidence - which means their method is insensitive to the pre-decision confidence at the beginning of movement, which is the focus of our approach. Second, this study is deeply related to the “change of mind” paradigm in perceptual decisions Resulaj et al. (2009); van den Berg et al. (2016), whichisalsoanevidenceintegrationparadigmthatrevealspre-decision ev- idenc e state through the motion parameters. Dotan et al. study found that when evidence was inconsistent (reversed direction), participants followed evidence by reversing directions more often, which reduces velocity by necessity. The problem is that trials with inconsistent evidence are also a) more likely wrong b) slower be- cause participants track evidence with trajectories and c) have lower post-decision confidence. Because trajectory changes are completely predictable from the ev- idence, which predicts both lower post-decision confidence and average velocity, the correlation between movement speed and confidence is likely a confound of the paradigm, which forces users to continuously move during incoming evidence, rather than complete a natural movement when ready. Both our approach and purpose are different - our goal was to assess the role of pre-decision confidence on action planning and action execution. The experimen- tal paradigm used in Dotan et al. study is not capable of predicting the effects of confidence on action planning, as no planning is allowed - i.e., no information about the direction of the arrows was revealed prior to movement initiation. We What if you are not certain? 38 also intentionally excluded evidence so that trial-to-trial changes in trajectory pa- rameters would be driven by natural fluctuations in confidence. We found that the relationship between reach parameters and pre-decision confidence is invariant to target probabilities, validating that our approach is not confounded by evi- dence. In addition, we interpreted our findings through a computational model of confidence in action planning that predicts our results. In contrast, the Dotan et al. study empirically conflates evidence and confidence, and does not provide for a theoretical correction needed. Hence, we believe our study makes an impor- tant contribution while avoiding difficult confounds present in perceptual decision tasks. 2.3.3 F rom signal-detection theory to evidence accum ula- tion to desirabilit y comp etition Over the past several years, two prevailing theories have been extensively used to study the mechanisms of confidence judgment in decision making; the signal- detection theory (SDT) Green and Swets (1966); Macmillan and Creelman (2005) and the evidence accumulation models (EAMs) Stone (1960); Ratcliff (1978); Rat- cliff and Rouder (1998); Pleskac and Busemeyer (2010); Donkin et al. (2011). Ac- cording to SDT theory, a choice is made (e.g., “motion right” vs. “motion left” in a random dot motion task) by comparing a decision variable (DV) against a criterion. The choice confidence is determined by the distance of the DV to the criterion. When the evidence strongly supports one option over the other, the distance is larger and the degree of choice confidence is greater Gold and Shadlen (2001); Hebart et al. (2016). Despite the important contribution of SDT to under- standing the mechanisms underlying decision making and confidence judgment, it can predict neither the time it takes to make a decision nor the effects of decision time in choice confidence as reported by a series of experimental studies (including our work) Kiani and Shadlen (2009); Ratcliff and Starns (2009). To overcome this inherent limitation of the SDT theory, EAMs were proposed to model a variety of decision tasks. EAMs conceive decision making as a process of noisy accumulation of evidence in favor of the different available options. A decision is made when evidence in favor of one option becomes sufficiently strong (for a review see Rat- cliff et al. (2016)). The main advantage of EAM over SDT is that it is capable of explaining the association between choice and RT across many domains including perceptual decisions Ratcliff and Rouder (1998); Gold and Shadlen (2001); Smith What if you are not certain? 39 and Ratcliff (2004); Tavares et al. (2017), value-based decisions Mormann et al. (2017); Krajbich and Rangel (2011); Philiastides and Ratcliff (2013), recognition memory tasks Ratcliff (1978); Ratcliff et al. (2004) and go/no-go tasks Gomez et al. (2007); Trueblood et al. (2011). To account for confidence in two-choice deci- sions, the evidence is accumulated by two independent counters LaBerge (1962). The counter that first reaches the amount of evidence required to make a decision determines the choice and the reaction time. The confidence judgment is deter- mined by the balance-of-evidence - i.e., the difference in the accumulated evidence between the two counters (the smaller the difference, the lower the degree of confi- dence about the selected option) Vickers (1979, 2001); Beck et al. (2008); Pleskac and Busemeyer (2010). Although EAMs have been successfully used to model a variety of cognitive and perceptual decision tasks, they do not include a mecha- nism for generating motor behavior (e.g., reaches, saccades), with the exception of a recent study that augments the drift diffusion model with an action system Lepora and Pezzulo (2015). However, this embodied model is limited in that it involves only one accumulator, and uses a simplified action model to generate trajectories with constant velocities. Because of that, it cannot make predictions on how pre-decision confidence emerges and how it is associated with behavioral measurements, such as reaction time, approach direction and movement speed. Overall, while EAMs are sufficient for decisions, in which individuals first make a choice and then generate an action to implement the choice, modifications are required for decisions while acting. Building on the evidence accumulation models, we designed a neurodynamical framework that includes circuity for generating reaching movements during the decision-making process. It employs a mechanism to accumulate and integrate information from disparate sources (i.e., spatial location of the target, target prob- ability, effort-cost) dynamically and while acting. However, it is quite different from the traditional EAMs. The “accumulators” compete based on the relative desirability of the alternative actions, instead of the accumulated sensory evidence in favor of one option over the others. Desirability is related to the action and provides a more general measure to evaluate an alternative, since it includes infor- mationnotonlyabouttheoptionitself, butalsotheactionrequiredtoachievethat goal. Analogous to the “balance-of-evidence” concept used in EAMs to determine post-decisionconfidence Vickers(1979), themomentarydegreeofpre-decisioncon- fidence is determined by the “balance-of-desirability” at any time and state - i.e., the difference between the relative desirability values of the alternative actions. What if you are not certain? 40 Our theory does not assign populations of neurons to the alternative options. In- stead, the alternative actions emerge within a distributed neuronal population by integrating information from multiple sources. Consequently, it can easily handle not only binary decisions, but also decisions between multiple competing goals. Traditional EAMs have also been extended to handle decisions with more than two choices in perceptual and value-based decisions Roe et al. (2001); Krajbich and Rangel (2011); Smith (2016), although there is an ongoing debate about the right way to generalize these models in multialternative decision tasks (different extensions of the sequential sampling models can lead to significant different be- havioralandneurobiologicalproperties)LeiteandRatcliff (2010);Ditterich(2010); Krajbich and Rangel (2011). Importantly, the proposed neurodynamical theory is not limited by the serial order assumption that action planning begins only after a decision is made. Instead, the competing options are continuously evaluated be- fore and after the movement onset, and a decision may be updated while acting in the presence of new information. Because of that, it can measure the momentary degree of confidence before and during an action is taken to implement a choice. On the other hand, EAMs measure confidence at or after a decision is made. Fi- nally, the most important difference between the EAMs and our theory is that we can make predictions not only about the decision process, but also about the spatial and the temporal characteristics of the reaching movements. By integrat- ing the neurodynamical framework with stochastic optimal control theory, we can simulate motor behavior (i.e., reaches) from the movement initiation to the final goal location. Hence, we can make predictions on how confidence influences not only reaction time (i.e., action planning), but also approach direction and speed of movement (i.e., action generation) at any given time and state. 2.3.4 Motor a v eraging v ersus visual a v eraging h yp othesis for action selection The key point of our theory is that the brain plans multiple actions in parallel that compete for selection, and this competition continues into execution. Although a growing body of experimental studies provide evidence in favor of parallel plan- ning of competing actions Basso and Wurtz (1998); Glimcher et al. (2005b); Cisek (2007); Rangel and Hare (2010); Chapman et al. (2010); Gallivan et al. (2011); Cisek (2012); Gallivan et al. (2015, 2016), other studies argued against this hy- pothesis suggesting that decision and action are separate processes - i.e., planning What if you are not certain? 41 and execution of action occur after a decision is made Friedman (1953); Tversky and Kahneman (1981); Fodor (1983); Pylyshyn (1984); Padoa-Schioppa and Assad (2006); Padoa-Schioppa (2011); Haith et al. (2015b). According to this theory, the spatial averaging behavior observed in dual-target trials does not necessar- ily reflect “motor averaging” - i.e., simultaneous planning of multiple competing single-target actions - but it could be equivalently interpreted as evidence of “vi- sual averaging” across the locations of the targets - i.e., planning and execution of a single action towards a weighted averaged target location Cisek and Kalaska (2010); Cisek (2012); Gallivan et al. (2015). The visual averaging hypothesis could explain the spatial averaging behavior and some aspects of action selection and reach timing. For instance, it could be argued that reaction time is shorter in the unequiprobable trials because individuals aim more often directly to the likely tar- get, instead of estimating and then moving to a weighted average location between the targets. However, the visual averaging hypothesis is insufficient to explain the association between approach direction and RT - i.e., RT increases with reaches aimed to an intermediary location regardless of the target probability. If decision and action were two separate cognitive processes, RT would be a function of the time required to estimate the average location between the potential targets and the time required to initiate an action towards this location. In this case, there is no mechanism to explain the effect of approach direction to RT. This effect can be modeled only within two competing modules that accumulate and inte- grate sources of information in favor of the two options (see an analogous case for perceptual decisions in Kiani et al. (2014)). In the supporting information I, we provide direct evidence that people not only plan in parallel multiple competing single-target actions, but also they execute a weighted average of these individual actions. In particular, we found that the peak of the movement speed measured from the reach initiation to the goal onset, is correlated with the approach di- rection of the movement regardless of the target probability. This suggests that movement speed can also be used as a proxy of confidence. When people are less confident about the best current action, they move slower towards an intermedi- ate location between the potential targets (see supporting information I for more details). The action competition hypothesis is also in apparent conflict with a recent study arguing that planning and initiation of an action are mechanistically independent Haith et al. (2016). According to this study, reaction time does not reflect the time at which the competition between the parallel planned actions is resolved What if you are not certain? 42 - i.e., there is no causal relationship between planning and initiation of actions. Instead, reaction time is determined by an independent initiation process. It is likely that action initiation occurs at a fixed delay after the action planning. However, this study did not account for goal location uncertainty or multiple competing goals. Instead, the individuals had to perform center-out reaches to one of eight peripheral targets arranged in a circle, and therefore they did not need to generate multiple actions that compete for selection. Overall, our findings provide further evidence in favor of the affordance competition hypothesis suggesting that the process of deliberating between different actions emerges via a continuous competition between these actions. 2.4 Materials and Metho ds 2.4.1 P articipan ts Seven right-handed (20-30 years old, 4 men and 3 women) individuals with normal or corrected-to-normal vision participated in this experiment study. The appropri- ate institutional review board approved the study protocol and informed consent was obtained based on the Declaration of Helsinki. 2.4.2 Exp erimen tal setup A rough sketch of the experimental setup used in this study is shown Fig. 2.1. Participants were seated facing a Phantom Premium 1.5 Haptic Robot (Sensable Technologies, MA) and a computer display, aligned so that the midline of their body was in line with the center of screen and robot. The workspace of the phantom haptic robot forms a hemisphere approximately 30 cm in radius. The participants selected a comfortable position and inserted the right index finger into the endpoint of the tip of the robotic manipulandum. The distance d subject from the head of the participants to the finger starting position measured along the y axis was about 0.30 m. This distance was slightly varied between participants, since we did not use a chin rest or any other restraining device. Hence, there was some movement of the head relative to the screen, but was minimal since the participants were instructed to remain stationary throughout the experiment. The distance from the finger starting position to the screen display d display was about 0.35 m and was calibrated at the beginning of each session. What if you are not certain? 43 The participants were trained to perform rapid reaching movements using the roboticmanipulandum. Thereachingmovementswereperformedinthehorizontal plane and translated into movements of a small cursor circle (1.5 cm diameter) in the vertical plane of the computer screen - i.e., reaches towards the screen moved the cursor to the top of the screen, while left and right mapping was preserved. This experimental set up allowed for high temporal and spatial resolution of the hand and finger position as well as a mean to create haptic feedback or altered movementdynamicsforfutureexperiments. Controlofthephantomrobotandthe experimentwereimplemented usingthe OpenHaptics driversprovidedbySensable technologies, andtheSimulationLaboratory(SL)andReal-TimeControlSoftware Package Schaal (2007) as well as other custom psychophysics software. Control and recording of the phantom state were performed at 500 Hz. 2.4.3 Exp erimen tal paradigm At the start of each trial participants were required to move the cursor to the starting position, located at the origin of our coordinate system, Fig. 2.2A. A fixation cross was then presented at the center of the screen and the participants were instructed to fixate for a short period of time ( ¯ t = 1500 ms, σ t = 300 ms). During the final 300 ms of fixation, either a single cue was presented on the upper- left or upper-right of the screen or two cues were presented simultaneously in both sidesofspace. Cueswerepresentedasunfilledcircleswith 3cminradiusonawhite background. After the fixation offset (go-signal) the participants had to initiate a rapid reaching movement. Once the cursor exceeded a certain trigger threshold (i.e., a virtual wall in the x−z plane; red discontinuous line in Fig. 2.2A), the single cue or one of the two cues was filled-in black indicating the actual location of the goal. If the participants brought the cursor to the cued target within 1.0 s the trial was considered successful. Trials in which the participants responded before the go-signal or arrived to the cued target after the allowed movement time were aborted and were not used for further analysis. The distance between the origin and the midpoint of the two targets was d reach = 0.20 m. The target separation distance-i.e., distancebetweenthetargetandthemidpoint-wasd separation = 0.15 m. The trigger threshold distance - i.e., distance of the virtual wall from the origin - was d threshold = 0.05 m, Fig. 2.2C. Individuals were familiarized with the task by running a set of training trials that included reaches to single and two targets. Once they felt ready and comfortable What if you are not certain? 44 with the experimental setup, the actual experiment started. Each participant performed 3 reaching sessions (one training and two tests). The training session involved 40 trials, which were excluded from the analysis, followed by two test sessions with 80 trials each (2× 80 = 160 trials). The first test session involved reaches to one (40% of the trials) and two (60% of the trials) targets. In the single-target trials, the cue was shaded blue and was presented equiprobably to the left or right visual field (top row in Fig. 2.2A). In the two-target trials, the cues were also shaded blue and had equal probability of filling-in after the movement onset (bottom row in Fig. 2.2A). The second test session was similar to the first one with the only difference that one of the cues was always assigned with higher probability in the two-target trials. The “likely” cue was shaded green and had 80% probability of being the correct target, while the alternative cue was shaded red and had 20% probability. The set of target configurations is illustrated in Fig. 2.2B. Individuals were not informed what the coloration indicates and learned the association during the experiment. 2.4.4 Beha vioral data analysis Cubicinterpolatingsplineswereusedtosmooththereachtrajectoriesandcompute the velocity of the movements. The initial approach direction was measured from the direction of the main axis of the covariance ellipse that describes the spatial variation of the cursor from the movement initiation to the goal onset. Reaction time was defined as the time at which the reach velocity exceeded 5% of the maximum velocity. 2.4.5 Neuro dynamical framew ork In the current section, we briefly describe the architecture of the computational frameworkusedtomodelthereachingexperiment. ReaderscanrefertoChristopou- los et al. (2015); Christopoulos and Schrater (2015) for more details. The frame- work combines dynamic neural field (DNF) theory with stochastic optimal control (SOC) theory and includes circuitry for perception, expected outcome, selection bias, effort cost and decision making. Each DNF simulates the dynamic evolution of firing rate activity of a network of 181 neurons over a continuous space with local excitation and surround inhibition. The functional properties of each DNF are determined by the lateral inhibitions within the field and the connections with otherfieldsinthearchitecture. Theprojectionsbetweenthefieldsaretopologically What if you are not certain? 45 organized - i.e., each neuron i in a field drives the activation at the corresponding neuron i in the other field. The activity of a DNF evolves over time under the influence of external inputs, local excitation and lateral inhibition interactions as described by Eq. (2.1) τ ˙ u(χ,t) =−u(χ,t)+h+S(χ,t)+ ∫ w(χ−χ ′ )f [u(χ ′ ,t)]dχ ′ (2.1) where u(χ,t) is the local activity of the DNF at the position χ and time t, and ˙ u(χ,t) is the rate of change of the activity overtime scaled by a time constantτ. If there is no external input S(χ,t), the field converges over time to the resting state H from the current level of activation. The interactions between the simulated neurons in the DNF are given via the kernel function w(χ−χ ′ ), which consists of both local excitatory and inhibitory components, Eq. (4.2.4). w(χ−χ ′ ) = c exc e − (χ−χ ′ ) 2 2σ 2 exc −c inh e − (χ−χ ′ ) 2 2σ 2 inh (2.2) where c exc ,c inh ,σ exc ,σ inh describe the amplitude and the width of the excitatory and the inhibitory components, respectively. We convolved the kernel function with a sigmoidal transformation of the field so that neurons with activity above a threshold participate in the intrafield interac- tions, Eq. (2.3). f(u(χ)) = 1 1+e −β(u(χ−)) (2.3) The architectural organization of the framework is shown in Fig. 2.5. The “spatial sensoryinput”fieldencodestheangularrepresentationofthecompetinggoalsinan egocentric reference framework. The expected outcome for reaching to a particular direction centered on the hand position is encoded by the “expected outcome” field (see Christopoulos et al. (2015) for more details). In trials with equiprobable targets, the neuronal activity of the populations selective for these targets is about What if you are not certain? 46 the same (blue Guassian distributions). However, in trials in which one of the targets is more likely than the alternative, the activity of the neuronal population selective for the “green” cue is higher than the activity of the populations which is tuned to the “red” cue. The outputs of these two fields send excitatory projections (green arrows) to the “reach planning” field in a topological manner. The “reach cost” field encodes the effort cost required to implement a sequence of actions towards a particular direction at any time and state. The output of this field sends inhibitory projections (orange arrow) to the reach planning field to penalize high- effort actions. The activity of the reach planning field at a given state x t is sum of the outputs of the fields encoding the location of the target u loc , the expected outcome u outcome and the estimated reach cost u cost , corrupted by additive noise ξ which follows a Normal distribution. S action (x t ) = η loc u loc (x t )+η outcome u outcome (x t )−η cost u cost (x t )+ξ (2.4) where η loc , η outcome and η cost scale the influence of the spatial sensory input field, the expected outcome field and the reach cost field, respectively, to the activity of actionplanningfield. Thevaluesofthemodelparametersaregiveninthesupport- ing information II. The normalized activity of the action planning field describes the “relative desirability” of each policy π i - i.e., it reflects how “desirable” it is to move towards a particular direction ϕ i with respect to the alternative options. Each neuron in the reach planning field is linked with a stochastic optimal con- troller. Once the activity of a neuron i exceeds a threshold γ, the controller i is triggered and generates an optimal policyπ ∗ i - i.e., sequence of actions towards the preferred direction of the neuron i - which is given by minimizing the following cost function: J i (x t ,π i ) = (x T i −Sp i ) T Q T i (x T i −Sp i )+ T i −1 ∑ t=1 π i (x t ) T R(x t ) (2.5) where the policy π i (x t ) is a sequence of actions from t = 1 to t = T i to move to- wards the direction ϕ i ; T i is the time required to arrive at the position p i ; p i is the goal-position at the end of the movement and is given as p i = [rcos(ϕ i ),rsin(ϕ i )], What if you are not certain? 47 where r is the distance between the current location of the hand and the location of the cue which is tuned by the neuron i. Additionally, x T i is the state vector at the end of the movement, whereas the matrix S picks out the actual position of the hand and the goal-position p i at the end of the movement from the state vector. Finally, Q T i and R define the precision- and the control- dependent cost, respectively. For more details about the optimal control model used in the frame- work see the supporting information in Christopoulos et al. (2015); Christopoulos and Schrater (2015). The first term of Eq.( 2.5) describes the current goal of the controller - i.e., move the hand at a distance r from the current location, towards the preferred direction ϕ i of the neuron i. The second term describes the cost (i.e., effort) required for executingthepolicyπ i (x t ). Let’snowassumethatM neuronsareactiveatagiven time t (i.e., the activity of M neurons is above the threshold γ). The framework computes and executes a weighted average of theM individual policiesπ ∗ i to move the hand from the current state x t to a new one, Eq. (2.6). π min (x t ) = M ∑ i ν i (x t )π ∗ i (x t ) (2.6) whereν i (x t ) is the normalized activity of the neuroni (i.e., the relative desirability value) at the state x t . Because the desirability is time- and state- dependent, the weighted mixture of the individual policies produces a range of behavior, from winner-take-all (i.e, direct reaching to a target) to spatial averaging. To handle contingencies, such as perturbations (e.g., changes on the number of targets, target probabilities, expected rewards, etc) and effects of noise, the frame- work implements a widely used technique in stochastic optimal control known as “receding horizon” Mayne et al. (2000); Goodwin et al. (2005). In particular, the framework executes only the initial portion from the sequence of actions for a short period of time k (k = 10 in our study) and then recomputes the individual optimal policies π ∗ i (x t+k ) from time t +k to t +k +T i and remixes them. This approach continues until the hand arrives to one of the targets. What if you are not certain? 48 2.4.6 Data A v ailabilit y The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request. Reaching task under effort uncertainty 49 Chapter 3 Reac hing task under effort uncertain t y Abstract Determining how the nervous system continuously chooses and shapes movement in a dynamic environment is a central question of neuroscience. While there is a growing consensus that prediction of an action’s reward occurs in parallel with action planning and across many neural structures, the evidence on how action related costs, like physical effort, influence decision-related activity is mixed. The prevailing neuroeconomic framework asserts that costs are handled similarly to expected rewards and fully integrated into a decision value before the action is planned and executed. However, many recent behavioral results in humans and neural recordings in rats and monkeys undermine the assumption that action costs and expected rewards are combined into a single unified representation that is upstream of any planning or action. Alternative action-based theories of decisions argue that planning and valuation, including estimation of action costs, occur concurrently. Because action planning is a time consuming process, estimation of action costs is likely to be as well, and wouldresultindifferentbehaviorthanpredictedbytheprevailingframework. Here we present a human reaching choice task designed to probe how known actions costs are dynamically integrated into movement choices under initial uncertainty. Logistic regression models of subject’s choices indicated sensitivity to reward was relatively symmetric across targets, but sensitivity to effort was not. Additionally, Reaching task under effort uncertainty 50 in some cases, movements were initially biased towards targets with higher effort, later in the trial shifting their bias towards targets with lower effort. Functional principal component analysis on the reaction time distributions in tri- als with no choice were influenced by additional preparation time, but not the expected effort. In choice trials, both preparation time and expected effort are ob- served to significantly influence reaction time distributions. For the right target, the interaction between its expected effort and preparation is also significant, and opposite in impact to its main effect. This asymmetry in effects of action costs on the targets was another consistent feature in our results. These results suggest a time and choice dependent process of integrating action costs into an ongoing decision. 3.1 In tro duction One of the underlying assumptions of neuroeconomic theory is that rewards gained from an action, and the costs incurred performing it, are opposite in sign, but otherwise comparable as quantities for the purposes of decision making (Kool and Botvinick, 2018). While it has been understood for some time now that there are significant asymmetries in sensitivities to costs and rewards ( Kahneman and Tversky, 1979), it is still held that these quantities are estimated alongside all other relevant decision factors, and integrated into a single decision metric before an action is selected, planned and executed (Padoa-Schioppa, 2011). The final estimate of the anticipated expected reward minus cost for an action, its total subjective utility, is argued to be computed and represented in the prefrontal cortex (OFC, vmPFC) (Padoa-Schioppa, 2007). Therepresentationofapotentialaction’sutilityintheprefrontalcortexisassumed to capture the full behavioral desirability of an action, and is further assumed to be the authoritative, final integration of expected utility. All movement planning or behavior elicited by the selected action is assumed to be strictly downstream from this representation. However, there are several lines of evidence that undermine a number of these assumptions. First, there is now ample evidence that movement planning occurs in parallel with valuation and selection of actions. Second, the assumption of a central, unified representation of utility is not consistent with observations of neural activity reflecting valuation and decision variables throughout cortical and Reaching task under effort uncertainty 51 subcortical regions, often before decision onset. And finally, studies investigating physical effort or other action related costs in decisions have noted many instances in which subjects’ choices are not consistentwith the utilitymaximization assump- tion. The first and second of these objections led to the development of an alternative framework of neural decision making that argues choices occur between actions rather than outcomes (Gold and Shadlen, 2001). In this view, decisions arise through a ‘competition’ between the representations of potential actions plans, mediated by mutual inhibition mechanisms, and biased by the input from other cortical regions computing specific contingencies ( Cisek, 2007). Many studies including neural recordings, human psychophysics, pharmacological inactivation studies, TMS induced inactivation studies have found evidence supporting this view. However, these action-based theories do not have an immediate explanation for how the processes involved in action cost estimation might give rise to the incongruous behavior observed in a number of studies. 3.1.1 The cost of effort Part of the problem is that it is very difficult to define what “effort” means to the central nervous system. In the classic utility based view of decisions, “effort” is a type of negative utility that is incurred in the act of pursuing a desired outcome. But, this effort might be physical exertion based effort, or cognitive effort like interpreting difficult language or waiting patiently through a delay. There is substantial evidence that at least one of these types of costs, the cost of delay, is handled by a distinct neural subsystem (Klein-Flügge et al., 2015). Additionally, while there are some non-linearities to the transformations, the costs of delay are integrated into decisions “as-if” they were negative utilities according to behavioral and neural data. Now is the same true about the process of estimating the physical effort involved in an action? It is often implicitly assumed that the physical muscle exertion required to perform an action is an important factor in its perceived effort. This is not often argued as the sole optimization criteria, but that when making choices aboutmovements, thereissomekindofinherentaversivenesstomarginalincreases in exertion. That is to say, if two actions achieve the same outcome, but have a Reaching task under effort uncertainty 52 significant difference in associated effort, people should strongly prefer the less effortful option. This is an intuitively appealing rule, and to a first approximation, is supported by studies of choice. In many conditions where outcomes are roughly equivalent, subjects will prefer the less effortful option ( Schweighofer et al., 2015). It is also observed to be more specifically true in a number of situations, especially in the long term; people adapt their reaches (Huang et al., 2012) (Izawa et al., 2008) and gait (Anderson and Pandy, 2001) into nearly optimal configurations with only some practice. But studies with decision task that required more active and deliberative control over their movements have not observed such neat metabolic optimality. Care- fully designed reaching experiments noted that even when forced to explicitly pre-practice a lower-effort, but less familiar option, subjects continued to reach directly through the most effortful region of a repelling force field ( Kistemaker et al., 2010). Further perturbations to the dynamics and visual feedback indicated that subjects were sensitive to the kinematic parameters of their movements (like the curvature of their hand path), but were not sensitive to dynamic parameters (like the full inertia of their movement) (Kistemaker et al., 2014). Another set of experiments similarly observed that subjects consistently planned routes through a task that were explicitly and obviously more effortful, but conceptually simpler or more immediate (Rosenbaum et al., 2014). Studies including uncertainty about effort have also found diverse and conflicting results. One study designed an economic task (lottery choice) that was matched in risk profile to a movement task; yet many subjects showed a reversal in risk pref- erence between the two tasks. In another task, experiments noted that subject‘s Another study found that subjects’ effort preferences where not consistent as the risk profile of a game was shifted ( Berniker et al., 2013). A similar study observed inconsistencies between subject’s risk preferences in an economic (lottery) task and a probability matched physical choice task (Berniker et al., 2013). Interestingly, in highly trained an skilled endurance athletes, deliberative focus on their movements have been shown to decrease energetic efficiency ( Brick et al., 2013). Taken together, these results strongly suggest that the impact of action related costs, like physical effort, are not well described as a ‘negative utility’ that is fully integrated into action values before execution. Reaching task under effort uncertainty 53 3.1.2 An ticipating effort costs Neither the ‘good’-based view, nor the action-based view address the specific com- plications of how action costs are estimated and integrated into an ongoing de- cision. Ultimately both these views argue that some form of the action cost is associatively learned and stored for recall. But this leads to the objection “How can action costs be estimated before an action is planned?” For the action-based theories, this is a less critical objection, because it is argued that actions are planned in parallel with valuation. This would still suggest that action cost estimates would change with time, since we know that movement plan- ning is a time-dependent process and that precision of movement increases with preparation time (Churchland and Shenoy, 2007). Given that both decreases in preparation time and the presence of competing action plans increases variabil- ity in movement (Oostwoud Wijdenes et al., 2016), it seems similarly likely that cost estimation is also time-dependent and begins slightly after action planning is initiated. But for the good based theory, this is a somewhat serious problem because it has been shown that a cached value which includes action costs is insufficient to explain observed behavior (Hollon et al., 2014). It is also problematic because any particular desired outcome might be achieved through any number of possible movements; how would all of these continuously changing possibilities be summa- rized in a single representation of cost? In the original description, each potential outcome is said to be assessed in terms of all expected reward and cost, “indepen- dent of sensorimotor contingencies” (Padoa-Schioppa, 2011). This is argued as a sort of sensorimotor equivalent to menu-invariance, in that the spatial arrange- ment of irrelevant options should not effect the individual valuation of outcomes. But this is directly contradicted by a number of studies that provide clear evidence irrelevant distractors (Welsh et al., 1999) (Sailer et al., 2002) or the proximity of compatible movements influences decisions between actions ( Burk et al., 2014) (Barton et al., 2015). 3.1.3 Reac hing decisions ab out effort The disagreement between these two frameworks, the good-based view and the action-based view, is primarily about the dynamics of action selection. The serial Reaching task under effort uncertainty 54 nature of the good-based view implies that the estimation and integration of an- ticipation action costs must occur before the planning or initiation of movement. But given enough time, or analyzing only the final outcomes, we would expect similar predictions of choice from the two models. In order to distinguish between these frameworks, an experimental task must force subjects to act while they are deciding. A number of experiments using human subjects have successfully used this kind of “go before you know” design to probe the time course of decisions under different types of uncertainty or risk ( Gallivan and Chapman, 2014) (Gallivan and Chapman, 2014). In these studies, subjects are presented with one or more options that are associated with simple movements like a saccade, reach or other short arm movement (Barton et al., 2015). At the beginning of the trial, certain information about the outcomes’ reward value or probability is withheld; only after the subject has initiated movement is the full information about each outcome provided. In a related previous study, we applied this experimental paradigm to a reaching taskwithinitialuncertaintyaboutrewarddistribution(Christopoulosetal.,2017). The initial movements made by the subjects reflected a reward density weighted combination of actions, such that subjects’ movement trajectories bent towards targets with higher probability. These early deviations also coincided with shifts ofreactiontimes,afeaturespecificallypredictedbyaction-basedmodelsofdecision making. In the present study, we consider an adaptation of this experimental protocol that is designed to test if action related costs, like effort, are integrated into an ongoing decision in the same manner as rewards. To modulate the effort associated with particular reaching targets, we introduce an experimental manipulate that shrinks theeffectivehitradiusofthetargetwhileretainingthesamevisualradius. Because of the time restrictions on the task, subjects must increase their accuracy without decreasing the speed of their movement. In general, there is a well studied trade off between the velocity of movement and its accuracy known as Fitt’s law (Fitts, 1954). To overcome this constraint, and move more accurately at the same speed or quicker, requires additional co- contraction of muscles to increase the endpoint impedance; and the amount of this co-contraction in a reach has been confirmed to be inversely proportional to the target size (Gribble et al., 2003). The additional complication is that increased Reaching task under effort uncertainty 55 co-contraction leads to increased levels of noise in neuromotor commands, counter- acting some of the decrease in endpoint variability (Selen et al., 2007). This positive feedback loop of increasing co-contraction has been shown to lead to observable muscle fatigue even in tasks with relatively little movement or muscle force (Missenard et al., 2008) (Van Dieën et al., 2003). Furthermore, a number of studies have noted that subjects are able to overcome the speed-accuracy tradeoff, given higher incentives or greater anticipated reward (Manohar et al., 2015). We believe that these results indicate that manipulation of physical effort by modulating the size of targets is a well justified approach. Using this method of controlling effort also has a number of practical advantages; unlike a movement through a force field, the total amount of muscle force a subject can generate is much less important. This helps to reduce the difference in challenge of the task across subjects with different bodies and capabilities. Finally, we introduce another important manipulation in our experimental task. Atthebeginningofthetrial, thesubjectsarepresentedwiththepotentialreaching targets for a short period of time before they are permitted to initiate movement. In the following task we vary the amount of time given from almost no time, to approximately two seconds, enough to easily observe both targets. Varying the amount of preparation time, across trials with different initial configurations of target effort, allows us to examine how these action costs influence an ongoing decision. 3.2 Metho ds The primary purpose of this experiment is to study how costs incurred during movement are estimated and integrated into an ongoing decision between actions. To do so, we start with an adaptation of the “go before you know” experiment design used in a number of similar movement choice tasks Chapman et al. (2010); Gallivan et al. (2011); Christopoulos et al. (2017). In this design, a human subject is presented with one or more target choices which require slight variations of a reaching movement to complete. But, initially only partial information about target reward and difficulty is presented. The remaining information is provided to the subject only after they initiate movement toward one of the options, and a time limit on each trial means they must act and choose quickly. Reaching task under effort uncertainty 56 During the initial period of uncertainty, subjects must integrate whatever infor- mation is available, and initiate an action, but also prepare to update this action as full information is provided. The information available to subjects during the initial trial phase influences the latency, speed and direction of the subject’s move- ment. Since the focus of this study is expected effort, we designed a manipulation to the reaching targets designed to increase the perceived effort associated with suc- cessfully reaching a particular target. For most trials, a subject is successful is successful if they are able to reach a target, and stay within a small ‘hit radius’ for a short time while not moving quickly. In certain trials, we change the target hit conditions to either increase or decrease the size of target’s radius used to de- termine trial success. We additionally, increased or decreased (respectively) the maximum speed permitted while in the target to register a success. This means, in the case of the smaller radii targets, the subject must move as quickly as a normal trial, while increasing accuracy and The second aspect of the estimation of expected effort we aim to test is its time dependence. For this, we introduce a further manipulation, where the time given toasubjectaftertheinitiallyincompleteinformationisprovided, butbeforemove- ment is permitted, is varied across trials. This means that subjects will have some trials with no time to observe the targets before initiating a movement, and some trials subjects will have about 2 seconds to process the initial information before moving. 3.3 Exp erimen t In this experiment, subjects make repeated reaching movements towards virtual targets presented using a computer display. To capture the movements of their arms during reaching and provide physical feedback to the subjects, we use a Phantom haptic robot as a manipulandum (Sensable Technologies, Boston MA). The phantom is a small, two link robotic arm with three controlled joints and a roughly hemispherical workspace with a radius of approximately 25 cm. A small plastic piece on the endpoint provides an adaptor for the subject’s right index finger. Control of the robot and data recording is performed at 500 Hz, using specialized robotic control software (Schaal, 2007). We constructed a stand for the Phantom that held the robot vertically, maximizing the workspace available Reaching task under effort uncertainty 57 for reaching in the horizontal plane. A diagram of the layout of the subject and manipulandum may be found in figure 3.1b. To display the targets, feedback and other information to the subject we used an NVIS SX111 (NVIS Technologies, Reston VA) head mounted display, also called a virtual reality or VR headset. Using custom psychophysics software ??, we created a virtual video-game like environment, which displayed in stereoscopic 3D, the targets as illuminated spheres, another sphere representing the position of the subjects’ finger, and a background and shadows designed to provide ap- propriate depth cues. The helmet, an NVIS SX111 included a set of headphones (Sennheisser HD-25II) that we used to play audio cues and other feedback during the experiment. The full experiment is composed of a total of four sets (“sessions”) of approximately 150 reaching movements each, performed by each subject across two different days. i i ii ii iii iii iv v iv success v failure ‘go’ cue 100 - 20000 ms max. 1000 ms ~1500 ms i) fixate ii) present iii) initiate v) finish iv) threshold (a) L R (b) Figure 3.1: Diagram of the sequence of ev en ts in eac h reac hing trial (A); during (i) fixation the sub ject is fixated on a cen tral cross, (ii) in presen tation, one or t w o targets app ear and their coloring indicates the relativ e lev el of difficult y . A t the end of presen tation, a cue is giv en to start mo v emen t (iii), and once the sub ject’s finger has crossed the threshold line (dashed ligh t red line in adjoining figure), the targets c hange color to rev eal the asso ciated p oin ts (iv). Sk etc h of a top do wn view (B) of the exp erimen tal setup, (a) the p ositions of the virtual targets, blac k L and R dashed circles. Reaching task under effort uncertainty 58 3.3.1 T rial Structure Each trial is composed of four sections; the first is a ‘fixation’ period, in which subjects’ finger is at the starting position and they are instructed to fixate their gaze on white cross, displayed in the center of the visual field, for approximately 1.5 seconds. This time t fixation is randomly sampled from a normal distribution with σ fixation = 0.15 s to prevent subjects from anticipating its length. In the second section, in addition to the white cross, one or two targets are dis- played as darkened spheres (2.5 cm in diameter, 35 cm from eyes); this is the ‘presentation’ section and varies in length between 150 milliseconds and 2 sec- onds. The start of the third section (‘initiation’) is marked by disappearance of the white fixation cross and the short sound of a drum, indicating that the subject is free to initiate movement. The subject must then reach their arm forward towards the targets, and once their finger has moved 5 cm from the starting position, the final section (‘movement’) begins, and the spheres are illuminated with color to reveal the points associated with each target. If the subject is able to bring their finger to one of the target spheres within 1 s, the screen displays to the subject the number of points earned and a pleasant auditory tone is played. Otherwise, if the subject has not reached one of the targets within 1 s, a falling auditory tone is played and a message is displayed indicating a ‘time out’ failure. If the subject moves from the starting position before the start of the initiationsection, a auditorycue is givento indicate a ‘false start’ and the next trial is started without penalty to the subject. The virtual targets are present along a slightly curved line that is near the edge of the manipulandum’s workspace. This curve stretches d separation distance from the midline on each side, and we parameterize the target locations along the line by their relative angle θ (Figure 3.1b). In single target trials, targets may appear anywhere along the target line. But, in two target trials, targets only appear at one of θ∈ 35,40,45. 3.3.2 Session Structure The full experiment is structured to take place across two sets of two sessions that take place on separate days. On the first day, the subjects perform protocol “A” and “B”; in “A” there are only single target trials, in which the position of the Reaching task under effort uncertainty 59 target is varied laterally and the number of points and the presentation time also varied. The second session, “B” introduces two target trials, while retaining a small number of single target trials randomly mixed in. In the two target trials, the points are varied across the left and right target are varied, while also still varying the amount of presentation time. On the second day, subjects perform protocol “C” and “D”; “C” is almost identical to protocol “A” except that the variation in target precision is introduced. While in previous sessions, the initial darkened color of the targets was a constant dark blue, targets are now sometimes displayed as a lighter green or a lighter blue (though still not illuminated). These variations in color represent a change in the required precision of final movement needed to successfully “reach” these targets. In the earlier, ‘normal’ case, the subject was required to bring their finger within the target radius (r size = 2.5 cm, and remain there with a velocity below 1.0 ms −1 for at least 100 ms to be a successful trial. The lighter green and blue targets are visually displayed as identical in size to the normal targets, but have a target ‘hit radius’ of r size = 6.0 cm and 2.0 cm respectively. The subjects are informed at the beginning of the session that new types of targets will be introduced, but they are not explained how the targets differ. Finally, in protocol “D”, we combine the r size manipulation with two target trials. This means that subjects are presented with two targets which may be different initial colors (meaning different target hit radii), and also become illuminated with a secondary color that indicates point value during the final movement phase. As mentioned previously, this results in a large number of permutations of trial conditions. To reduce the number of trials necessary, biased the randomization towards combinations of trials which were most informative and eliminated certain comparisons. Trials with large differences in both points and target hit radii that favor the same target are not as informative. Each of these four sessions required 150 successful actions (approximately 15 - 25 minutes) to be considered complete; meaning that subjects will perform approxi- mately 600 movements in total. 3.3.3 Data Analysis The primary focus of this experiment is the integration of expected action-related effortinamovementtaskwithuncertainty,andsecondly,ifthisisatimedependent Reaching task under effort uncertainty 60 process. This means there are two important questions that must be addressed in the analysis; first, which decision factors are reflected in subject’s choices, and second if there is an interaction with these factors and the presentation time (t presentation ). The former is studied through subject’s final choices, while for the latter movement features from the initial trial phase 3.3.3.1 Choice factors analysis In every trial with two targets, the subject is free to reach for either left or right target, even if it is the lower point value target. Subjects are encouraged to earn as many points as possible, but we do not expect subjects to choose the highest value target each time. Using the machinery of logistic regression (generalized linear models) and observed choices, we can estimate the relative sensitivities to experimental variables. Specifically, we model the probability of the subject choosing the right side target using a logistic function, where the fitted parameters are the number of points, target size, and other experimental conditions. In addition to the modeling of subject‘s final choices, the identity of the subject’s select target during the trial is also of interest. We applied the same logistic regression model from the final choices on each of these inferred selected targets duringeachofthetrialphases. Thedirectionofthelaunchangle(definitionbelow) is used to infer the selected target in the initial phase of the trial. And the lateral displacement of the subject‘s hand crossing the target reveal threshold is used to infer the selected target during the ’threshold’ phase; if the subject’s hand is left of the horizontal midline the select target is left, and vice-versa for the right. The final position of the subject’s hand during the trial, regardless of trial outcome is used to infer the selected target for the final phase. Finally, subject’s choices during trials that were not successful are also of interest. We again applied a logistic regression model, this time to model the probability of a successful trial, to infer which factors are most predictive of subject trial success. Similarly, it is possible to succeed in these trials while picking the target with lower pointvalue, thesetypesofchoicesareparticularlyofinterestfordeterminingwhich factors are ‘traded off’ in a rapid decision. To study this, we define a term called ‘efficiency’ which is the fraction of total points earned by a subject on each trial. Reaching task under effort uncertainty 61 Since this is no longer a binary variable, this was modeled as mixed linear model rather than a logistic function. 3.3.3.2 Reaction times One of the most important features extracted from the reaching trajectories is the amount of time after the ‘go’ signal is given before the subject has initiated movement. There are a number of different methods available, but we computed reaction times by finding the first time point the subject’s finger speed exceeds 5% of the maximum speed for that trial. An alternative method of computing the reaction time, in which the earliest peak in the velocity profile is used in place of the maximum or tallest peak, was considered but it did not change the overall results of the analysis. As with any movement related feature, there may be a considerable amount of variability, even within subject. Furthermore, behavioral experiments and compu- tational models have shown that changes in decision processes often result in skews or shifts in the tail of the distribution (Rousselet and Wilcox, 2019; Baayen and Milin, 2010). This means assuming a normal distribution and performing a test on the difference of sample means or even sample medians ( Miller, 1988) would be inappropriate. One common approach is to assume an ex-gaussian distribution (or similar) and use the fitted parameters in statistical analysis ( Whelan, 2008). However, even if the reaction time distribution for a simple, known task is well summarizedinaparametricmodel,decisionprocessesarelikelytodistortandcom- bine these simple distributions. Instead, there are two different non-parametric methods we used to compare reaction time distributions. For the first, we com- puted the empirical cumulative distribution function for each relevant grouping of trials (i.e. each condition for each subject) and then performed a two sample Kolmogorov-Smirnov test across the desired comparison. The Kolmogorov-Smirnov statistic reports the probability that a a set of samples has been drawn from a specified, and used on an empirical cumulative distribu- tion it makes almost no assumptions about the shape of the data. However, this test can only report that two distributions are different in some way, and cannot be effectively used to draw further conclusions about the shifts in distributions. Reaching task under effort uncertainty 62 The Kolmogorov statistic was computed for the data from all subjects individu- ally, and then the reaction time distributions combined using a technique called “Vincentizing” (Ewart and Ross, 1980) and a test performed on the combined distributions. 3.3.3.3 F unctional densit y analysis of reaction times The second approach we used to analyze reaction time distributions was a func- tional decomposition using the toolkit of functional data analysis. This second ap- proach is necessary because non-parametric techniques like the Kolmogorov test have limited statistical and descriptive power. Combining data across subjects is possible using ad-hoc methods, but can compound the problem of noisy data. Since there are some constant and random differences across subjects, we want to look for modes of variation across subjects under different conditions. The ap- proach we took is based of the case study in Ramsay and Silverman (2007) where the approach was used to study variation in response times of children with and without ADHD. The first step of this approach is to group trials by a condition and subject and then fit the distribution using a functional data model. Functional data models use a finite set of basis functions, here B-spline polynomials, and a set of linear parameters to approximate a set of continuous data while enforcing some func- tional constraints such as smoothness or periodicity. Here we chose a B-spline basis to represent the log transformed reaction time distributions, which enforces a non-negative constraint on the fitted data Ramsay and Silverman (2007). Then, we applied functional principal component analysis (FPCA) on the set of functional data models (Wang et al., 2015). Using the computed principal component scores, we performed multivariate analysis of variance using a modified Hotelling’s T 2 test, with respect to the variables of interest. This procedure is also known as a 50-50 MANOVA and was designed to handle cases of highly collinear observations. Here it is used to test linear models of an experimental condition and its interaction with the presentation time (t presentation ). While the traditional approach to MANOVA assumes normality of the dependent variables, it is not a strictassumptionofthe50-50MANOVAmethod. However, itisstillsusceptibleto outliers, so the principal component scores were tested for outliers using Rosner’s test (Rosner, 1983). Reaching task under effort uncertainty 63 While these MANOVA models may be used to test for the significance of an in- teraction, it is more difficult to interpret the effects indicated by the model. We adapted and extended some of the techniques outlined in (Langsrud, 2002) (Wang et al., 2015) and Ramsay and Silverman (2007), to visualize the influence of the experimental condition on the reaction time (or other variable) distribution. Us- ing the same principal component scores, we performed a reverse regression on the component scores to determine the magnitude of the coefficients of the experi- mental conditions. These coefficients were multiplied by the principal vectors and added to the mean distribution and plotted alongside the mean distribution. 3.3.3.4 Initial mo v emen t and launc h angle The reaching trajectories collected in this experiment were sampled at 500 Hz and are at most 1.25 seconds long. Velocity signals in these trajectories were filtered using a low-pass Butterworth filter using a 10 Hz cutoff. The most important phase of the movement from an experimental perspective is the period of time immediately after reaction until the hand has passed the threshold to reveal target points. We used this definition to extract the initial phase, from the moment of reaction time (see earlier definition) until the frame after the subject’s hand has cross the threshold. Using this section of movement we computed the magnitude of the initial hand velocity, denoted as v initial . Also from the same section of movement, we computed the ϕ launch by finding the mean velocity over the segment and then computing the heading angle, and using arctan of the x and y velocity components. ϕ launch = arctan(v x /v y ). The angles are measured relative to the vector pointing directly at the target plane, meaning that a movement directly to the center is 0 degrees, a movement directly to the left target is−35 ◦ . The functional density analysis approach describe above for reaction times is re- peated on the launch angle distributions. Grouping trials by subject, presentation time, and the sizes of targets, we compute the principal components of varia- tion across these distributions and perform the modified 50-50 MANOVA test on the resulting reduced dimensionality set of scores. The visualization process to highlight the modes of variation in distributions is also repeated. Reaching task under effort uncertainty 64 3.3.3.5 Threshold and final mo v emen t The position and velocity of the subject’s hand immediately as they pass the threshold is also an important feature to study the underlying decision. We com- puted the lateral deviation of the hand at threshold (the x coordinate of the trajectory at t threshold ), and used this to infer the target most likely to be selected at threshold. The mean magnitude of the subject’s hand velocity was computed for the 100 ms following t threshold and the remaining time left in the trial, v final . Finally, we also computed the maximum hand speed across each trial. 3.3.4 Sub jects A total of 20 subjects, 6 females and 14 males, 23 - 33 years of age (mean 28.3) wererecruitedtoparticipate inthisexperiment, andprovidedinformedconsentfor their participation. Subjects were all able bodied and had normal or corrected-to- normal vision, and all subjects were right handed as determined by the Edinburgh handedness survey (Oldfield , 1971). The protocols and design of the experiment were reviewed and approved by the IRB of the University of Southern California. 3.4 Results There were 20 total subjects recruited to participate in this experiment, however 1 subject performed below the performance threshold (less than 50% success rate) and was excluded from the analysis. The remaining 19 subjects performed 615 movements on average. During debriefing with subjects after the experiment was completed, all subjects correctly identified the target colors that indicated the higher and lower effort condition. However, when asked to identify the specific change on those trials, only one subject correctly guessed the method of varying the the target hit radius. In the following analysis and discussion, we will use p and r with subscripts to represent the points and target radius, respectively associated with each target. Reaching task under effort uncertainty 65 3.4.0.1 T rial success mo del Thisfirststepintheanalysiswastoassesstheoveralperformanceofthesubjectsat thetask, andidentifytheexperimentalparametersthatinfluencesubject’ssuccess. We applied the technique of logistic regression to model the probability of success across all subjects. All non false-start trials were used to fit a model using the target sizes, selected target, presentation time, trial sequence index, and previous outcome as parameters; the ranges of all these parameters were normalized before fitting. Nested random effects for the subject and session were included in the model, and we assessed the model fit using a likelihood ratio test against a reduced model containing only the random effects ( p10 −3 ). However, the confusion matrix indicates slightly less confidence in the predictions of the model; overall the model correctly predicts success or failure in 86.3%, and the model’s specificity is 87.7 %. This means that the model is reasonably good at predictingsuccessinthesetrials, butthatthereisstillafairamountofrandomness unexplained in the model. In a logisitic regression model, the coefficients represent the marginal change in the log odds ratio with respect to the parameter in question. Since the model is pre- dicting success, any positive coefficient indicates that increases in that parameter result in increases to the probability of success, and likewise, negative coefficients indicate that increasing that parameter will decrease the probability of success. The value for the constant intercept means that the overall mean fitted accuracy is 79.93%. Starting from the insignificant parameters, the accuracy is not significantly im- pacted by second target in two target trials, nor the difference in target radii. Accuracy is slightly higher (7.09%) in trials where the subject attempted to reach the right target. And similarly, the trial sequence index and a previous successful trial both increase predicted accuracy, suggesting subjects improved over time and are influenced by recent performance. While normalizing data before fitting, we also converted increases or decreases in target size into separate binary variables. From the coefficients for these param- eters, that smaller target radius led to a decrease in accuracy by -21.77% and a larger target led in a increase in accuracy by 17.16%. Reaching task under effort uncertainty 66 T able 3.1: Logistic mo dels and fitted co efficien ts for trial success prediction targets: (1, 2) 1 2 p maximum 0.005 0.007 0.004 (0.006) (0.008) (0.007) T final 0.260 ∗∗∗ 0.148 ∗∗ 0.218 ∗∗ (0.061) (0.071) (0.086) n targets −0.149 (0.107) r small −1.052 ∗∗∗ −0.911 ∗∗∗ −0.627 ∗∗∗ (0.098) (0.106) (0.164) r large 2.127 ∗∗∗ 1.806 ∗∗∗ 1.833 ∗∗∗ (0.136) (0.154) (0.163) t presentation −0.030 −0.350 ∗∗∗ −0.261 ∗∗ (0.082) (0.097) (0.112) n trial 0.687 ∗∗∗ 0.694 ∗∗∗ 0.636 ∗∗∗ (0.104) (0.124) (0.143) θ target −0.765 ∗∗∗ −0.680 ∗∗∗ −0.991 ∗∗ (0.131) (0.124) (0.458) T n−1 0.701 ∗∗∗ 0.876 ∗∗∗ 0.276 ∗∗∗ (0.068) (0.080) (0.098) Constant 1.382 ∗∗∗ 0.995 ∗∗∗ 1.602 ∗∗∗ (0.311) (0.270) (0.512) Observations 11,238 6,834 4,859 Log Likelihood −3,727.870 −2,622.996 −1,885.515 Note: ∗ p<0.1; ∗∗ p<0.05; ∗∗∗ p<0.01 Reaching task under effort uncertainty 67 We also grouped the data into single target and two target trials, fitting the same logistic success model as before. There are several things to note, first is that smaller targets have a larger impact on single target trials (-20.9%) than two target trials (-10.63%). Yet, the increase in target sign resulted in no difference between the single target and two target trials. 3.4.0.2 Efficiency mo del To determine the impact of trial conditions within successful choice trials, we computed the fraction of total possible points a subject earned in each trial. This feature was termed ‘efficiency’, as we used a set of linear mixed models to predict the efficiency across all subjects and trials. We found that far fewer of the trial conditions were reliable predictors of efficiency than of trial success. Consistent with the success model, but lesser in magnitude, we observed that increased pre- sentation time (t presentation ) and a higher trial sequence index (later in the session) were both associated with increased efficiency, although only about a 1% and 2% increase respectively. Neither the difference in size between the two targets, nor the sizes of the individual targets in each trial were reliable predictors of effi- ciency. Similarly, the amount of points associated with each target did not have a large impact; the number of points for the left target did have a significant, but miniscule (< 1%) impact on efficiency. 3.4.1 Logistic c hoice regression In just under half of the trials each subject performs, two targets are presented and the subject is free to chose either target. The two targets are always laterally separated from the midline by the same distance; but the number of points and hit radius of each target varies indepedently. We used the results of these trials to create a model of subject choice using logistic regression. For all of the models we will discuss, we modeled the probability of choosing the right target, so that positive coefficients are associated with increases in probability of choosing right. The first models presented included data from all subjects and included random intercepts to model subject and session variability. The results of this regression, and several additional models using an inferred selected target are display both in figure 3.2 and table 3.2. The choice outcome model uses only successful trials, while the final choice model uses the position of Reaching task under effort uncertainty 68 ● ● ● ● *** *** ● ● ● ● . * * ● ● ● ● *** *** ● ● ● ● * ● ● ● ● . ** *** ● ● ● ● ● ● ● ● . *** *** θ target n trial t presentation p left p right r left r right −2 0 2 −2 0 2 −2 0 2 −2 0 2 logistic coefficient trial phase ● ● ● ● initial threshold final choice Figure 3.2: Plot of the logistic mo del co efficien ts found from fitting the sub ject data to the exp erimen tal parameters listed on top of eac h v ariable’s plot. The mo del w as fit four times using; the sub ject’s successful c hoices, c hoices from all trials including time outs (final), and then the inferred selected target through the initial and threshold phases of the trials. The final t w o computed from the mo v emen t launc h angle and the lateral p os ition and threshold, resp ectiv ely . the subject’s hand on the final frame of the trial, success or failure, and uses the closest target as the ‘selected’ target. The threshold model uses the same approach to infer the selected target, but uses the subject’s hand position immediately after crossing the threshold line. Finally, the initial phase infers the selected target using the direction of the initial movement angle (ϕ launch ). f(x) = L β p L p+ L β r L r+ R β p R p+ R β r R r+β t t present +β n n trial (3.1) Figure 3.3: The logisitic regression mo del of the probabilit y of c ho osing the righ t target, using the exp erimen tal conditions as parameters. Eac hβ represen ts a fitted co efficien t, the sup erscripts L and R refer to the left and righ t targets resp ectiv ely , p and r indicate the n um b er of p oin ts and target size, resp ectiv ely . Finally , n trial is the trial sequence index and t present is the presen tation time. Figure 3.2 provides a good overview of the results. The coefficients for left and right points (p left andp right ) are opposite in sign and similar in magnitude, though Reaching task under effort uncertainty 69 10−20% greater for the right side. A trend that is interesting but only marginally significant is in the target size coefficients; both the left and right target size coefficients increase through trial phases, but start on the opposite side of zero. This means that a larger target is making the subject lean away from that option, rather than towards it. Also important to note is that this effect disappears by the threshold phase on the experiment. T able 3.2: Logistic c hoice mo dels and fitted co efficien ts cross trial phases trial phase initial threshold final c hoice p left − 0.115 0.130 − 2.830 ∗∗∗ − 2.703 ∗∗∗ (0.108) (0.110) (0.130) (0.141) p right 0.050 − 0.0002 3.394 ∗∗∗ 3.362 ∗∗∗ (0.096) (0.098) (0.118) (0.129) r left 0.325 − 0.366 ∗ − 0.583 ∗∗∗ − 1.034 ∗∗∗ (0.203) (0.213) (0.208) (0.225) r right − 0.408 ∗ 0.263 0.745 ∗∗∗ 1.167 ∗∗∗ (0.209) (0.220) (0.217) (0.234) t presentation − 0.045 − 0.052 − 0.026 − 0.016 (0.042) (0.042) (0.046) (0.051) n trial − 0.167 0.256 ∗∗ 0.032 0.024 (0.112) (0.113) (0.126) (0.141) θ target − 0.657 0.791 ∗ 1.129 ∗∗ 1.061 ∗∗ (0.403) (0.405) (0.450) (0.497) Constan t 0.708 − 0.886 ∗ − 1.392 ∗∗∗ − 1.335 ∗∗∗ (0.455) (0.461) (0.457) (0.503) Observ ations 4,660 4,652 4,669 3,965 Log Lik eliho o d − 2,782.033 − 2,761.920 − 2,312.939 − 1,929.020 Note: ∗ p< 0.1; ∗∗ p< 0.05; ∗∗∗ p< 0.01 Since there is considerable inter-subject variability, both in movement and choice preferences, we repeated the process of logistic choice modeling for subject indi- vidually. The coefficients found from these models are plotted in figures 3.4a and 3.4b. Reaching task under effort uncertainty 70 T able 3.3: Logistic c hoice target size in teraction mo dels in teraction t yp e none r left ∗t r left :t r right ∗t r right :t p left − 0.135 ∗∗∗ − 0.135 ∗∗∗ − 0.134 ∗∗∗ − 0.134 ∗∗∗ − 0.133 ∗∗∗ (0.007) (0.007) (0.007) (0.007) (0.007) p right 0.168 ∗∗∗ 0.168 ∗∗∗ 0.170 ∗∗∗ 0.170 ∗∗∗ 0.172 ∗∗∗ (0.006) (0.007) (0.006) (0.007) (0.007) r left − 0.172 ∗∗∗ − 0.200 ∗∗∗ − 0.162 ∗∗∗ − 0.182 ∗∗∗ (0.038) (0.053) (0.038) (0.036) r right 0.195 ∗∗∗ 0.199 ∗∗∗ 0.239 ∗∗∗ 0.104 ∗ (0.039) (0.039) (0.040) (0.055) t presentation 0.001 − 0.103 0.282 ∗∗ − 0.366 ∗∗ − 0.576 ∗∗∗ (0.055) (0.156) (0.118) (0.163) (0.121) n trial 0.028 0.022 0.045 0.026 0.022 (0.140) (0.141) (0.140) (0.141) (0.141) θ target 0.021 ∗∗ 0.021 ∗∗ 0.020 ∗∗ 0.021 ∗∗ 0.021 ∗∗ (0.010) (0.010) (0.010) (0.010) (0.010) r left : t presentation 0.029 − 0.076 ∗∗∗ (0.040) (0.029) r right : t presentation 0.102 ∗∗ 0.159 ∗∗∗ (0.043) (0.031) Constan t − 1.348 ∗∗∗ − 1.247 ∗∗ − 2.156 ∗∗∗ − 1.095 ∗∗ − 0.657 (0.505) (0.525) (0.468) (0.518) (0.465) Log Lik eliho o d − 1,929.066 − 1,928.808 − 1,935.750 − 1,926.196 − 1,928.006 Akaik e Inf. Crit. 3,878.131 3,879.616 3,891.500 3,874.392 3,876.012 Ba y esian Inf. Crit. 3,940.984 3,948.754 3,954.352 3,943.530 3,938.864 Note: ∗ p< 0.1; ∗∗ p< 0.05; ∗∗∗ p< 0.01 Reaching task under effort uncertainty 71 In the first, figure 3.4a, the coefficient for the right target points is plotted against the coefficient for the left target points for each subject. Again, since we are modeling the probability of selecting the right target, we expect the right points coefficient to be positive and the left points coefficient to be negative. And as is evident from the plot, this is indeed the case; additionally, the subject’s points fall very close to the line of equivalence. Including all subjects, a linear regression finds a slope of -1.03. If we exclude the two subjects who are highly leveraged points, (‘20cf‘ and ‘c8e8‘), the slope becomes -1.438. In figure 3.4b, the coefficient for the size of the right target is plotted agains the size of the left target for each subject. In contrast to the points coefficients in figure 3.4b, there is no immediately obvious trend to the subjects here. The dashed gray line is included to denote the hypothetical trend-line that would be the result of subjects being equally sensitive to changes in target size on either side. However, while most subjects are in upper right quadrant (relative to the origin), a number of subjects’ position on the plot indicates that a smaller sized target actually increases the likelihood being chosen. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0fed 20cf 22a6 297f 2c6e 2dbe 35d6 42ff 519e 536d 59db 5c63 62bb 8894 b079 c8e8 d04f e2f6 fc2f 0 1 2 3 4 −4 −3 −2 −1 0 left points ( L β p ) right points ( R β p ) model accuracy ● ● ● ● ● 0.70 0.75 0.80 0.85 0.90 (a) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0fed 20cf 22a6 297f 2c6e 2dbe 35d6 42ff 519e 536d 59db 5c63 62bb 8894 b079 c8e8 d04f e2f6 fc2f −0.5 0.0 0.5 −1.0 −0.5 0.0 0.5 left radius ( L β r ) right radius ( R β r ) (b) Figure 3.4: Plots of the co efficien ts found b y fitting logistic regression mo dels on sub ject’s c hoice of target, using trial conditions as predictiv e parameters. Since the mo dels are fitted to the probabilit y of c ho osing the righ t target, p osi- tiv e co efficien ts indicate that marginal increases of the asso ciated v ariable result in an increased probabilit y of selecting the righ t target. In ( a ), the co efficien ts asso ciated with righ t target p oin ts are plotted against the co efficien ts for left target p oin ts for eac h sub ject. The gra y line sho ws a simple linear mo del fitted to the relationship b et w een the left and righ t side cofficien ts. Plotted in ( b ) are the co efficien ts for the left and righ t target sizes. The dashed gra y line indicates a h yp othetical trend line of equal sensitivit y to c hanges in the righ t and left target size. Reaching task under effort uncertainty 72 Now the same process is applied to interpret the coefficients for r left and r right , which represent the hit radius of the left and right targets specifically. Larger targets are relatively easier to reach successfully than small targets, so we would expect that coefficients for r right to be positive values and coefficients for r left to be negative values. 3.4.1.1 Sub ject lateral bias ● ● ● ● ● ● ● ● ● ● ● 297f 2dbe 42ff 519e 536d 59db 62bb b079 c8e8 e2f6 fc2f 297f 2dbe 42ff 519e 536d 59db 62bb b079 c8e8 e2f6 fc2f 0.00 0.25 0.50 0.75 1.00 Right Preference Binomial Test p <10 −x ● a ● a ● a 3 6 9 Figure 3.5: Plot of the sub jects’ estimated lateral bias, determined using a binomial test on all trials with equal p oin ts and tar- get sizes. In a number of trials with similar ex- perimental designs, a slight lateral bias is reported in movement and choice to- wards the right side. The reasoning be- ing that most people (and all subjects in the current experiment) are right- handed and thus the movement to the right target is lower in inertia (largely elbowandwristvs. shoulderandelbow for the right), that this will be chosen more. To test for the presence of a bias in choice, we performed a binomial test against the null hypothesis of the sub- jects’ having no preference for either target, i.e. each side has a 50% probability of being selected when points and size are equal. The results of this binomial test are plotted in figure 3.5, where only the subjects with a significant ( p < 0.05) result are plotted. The x coordinate of the point is the proportion of trials in which the right target was selected, and the size of the point indicates the significance level of the result. We found 5 of the subjects, out of 11, who had a significant lateral preference, actually choose the left target more frequently. Reaching task under effort uncertainty 73 0.2 0.4 0.6 t reaction percent t presentation < 0. 5 s < 1. 0 s < 2. 0 s (a) 0.2 0.4 0.6 t reaction percent r target 6 cm 3 cm 2 cm (b) 0.2 0.4 0.6 t reaction percent t presentation < 0. 5 s < 1. 0 s < 2. 0 s (c) 0.2 0.4 0.6 t reaction percent Δr targets R>>L R>L R==L R<L R<<L (d) Figure 3.6: Plots of the cum ulativ e distribution of reaction times across a n um b er of trials conditions; the Vincen tized distributions com bined across all sub jects are plotted in the thic k lines, while the cum ulativ e distribution curv es for individual sub jects are plotted b ehind in lo w opacit y . Plot ( a ) sho ws the reaction times from single target (non-c hoice) trials across the differen t lengths of presen tation time, plot ( c ), sho ws the same for t w o target trials. Plot ( b ) sho ws the reaction times in single target trials conditioned on the size of the presen ted target, while ( d ) sho ws t w o target trials conditioned on the difference in size b et w een the t w o targets 3.4.2 F unctional densit y analysis 3.4.2.1 Reaction times As we discussed in the methods section, while reaction time distributions have been used widely to many successes, the shape of the distributions, particular under manipulations, can make the statistical analysis of effect difficult. To at- tempt to isolate the effects of experimental conditions across subjects and through imperfect measurements, we borrowed and combined several methods from the Reaching task under effort uncertainty 74 toolbox of functional data analysis. The conditional reaction distributions across subjects and trial conditions were fit using functional spline representations, then log-transformed and the spline basis coefficients decomposed into principal com- ponents. A specialized variation of the MANOVA model was fit to the component scores using the experimental conditions as variables. We applied this process to: all single target trials, conditioned by the size of the target in the trial, and all two target trials, first conditioned on the amount of presentation time and difference in target sizes, then conditioned on the presentation time and size of the left target, and finally presentation time and right target size. The summarized results from this MANOVA model may be found in table 3.4. We used the first 5 principal components, which contained at least 99% of the total variation in each case. The results from the single target model indicate that only the presentation time has a significant consistent impact on subject’s reaction times and the size of the target does not, similarly the interaction between the two variables is not significant. For two target trials, both the presentation time and the difference in size of the targets had a significant impact on reaction distributions. However, only the right target size showed an interaction with presentation time, that was absent in the left target size model. In the second part of the functional analysis, we performed a reverse regression on the principal components scores fitting for the trial conditions of interest. We visualized these results by using these regression coefficients as weights for the principal eigenvectors and combining the result with the mean distribution vec- tor. Plotted with the mean distribution, these illustrate the marginal shift in the reaction time distribution caused by these variables. Figure 3.7 shows the result of this process on the reaction time distributions. The mean distribution is plotted in solid gray in the background, while the dashed light blue line shows the impact of increased presentation time and the dashed dark blue line shows the increase in target size. The dashed green line shows the impact of the interaction between the two variables; note that partially in ∆r targets (center left) and significantly in r right (far right), the interaction effect is opposite in direction and comparable in magnitude to the main effect. This means any increase in reaction time caused by an increase in right target size is negated by an increase in presentation time. Reaching task under effort uncertainty 75 r target Δr targets r right time (normalized) mean presentation time main effect interaction Figure 3.7: Visualization of the functional principal comp onen ts decomp osi- tion of reaction time distributions, the dashed lines represen t the shift in dis- tribution asso ciated with an increase in the asso ciated v ariable, while the gra y solid line represen ts the mean reaction time distribution. The plot ( a ) sho ws the results using the difference in target sizes, while ( b ) and ( c ) sho w the results from the decomp osition conditioned on the size of the left and righ t target size resp ectiv ely . Single targets Df Hotelling’s T appro x F Pr(>F) t presentation 1 0.266 10.9 7.23×10 −8 *** r target 1 0.0119 0.49 0.743 t presentation : r target 1 0.0165 0.677 0.609 residuals 167 Df n um 1, den 1 T w o target trials t presentation 1 0.315 12.9 3.71×10 −9 *** ∆r targets 1 0.145 5.93 0.000176 *** t presentation : ∆r targets 1 0.025 1.02 0.397 t presentation 1 0.24 9.83 3.76×10 −7 *** r left 1 0.134 5.48 0.000365 *** t presentation : r left 1 0.0251 1.03 0.395 t presentation 1 0.203 8.31 3.95×10 −6 *** r right 1 0.0874 3.58 0.0079 ** t presentation : r right 1 0.0587 2.41 0.0516 . residuals 167 Df n um 1, den 1 T able 3.4: Results from the 50-50 MANO V A test on the principal comp onen ts of reaction time distributions. 3.4.2.2 Initial launc h angle The same process of functional principal component analysis performed was per- formed on the distributions of initial launch angles, and the results are presented in figure 3.8. Note that because single target trials are not fixed in location like Reaching task under effort uncertainty 76 the two target trials are, inspecting the distribution of launch angles across these trials is not meaningful. In two target trials, one of the most commonly observed trajectories is to move directly forward initially toward the midpoint of the two targets. However, a feature noted in our previous study and a number of similar studies, the distribution of launch angles is not centered around 0 degrees (directly forward). Even when reaching toward the midpoint, the biomechanics of the arm favor a small curve to the right. As with the reaction time distributions, the conditioning of launch angle distri- butions on the difference in target sizes, left and right target sizes show a similar pattern. The difference in target size plot shows roughly a combination of the left and right target size effects, with the right target size having a dominant effect. The pattern of effects in the right target size plot mirrors the patten in the same reaction time plot, and suggests a similar interpretation. Marginal increases in the presentation time or the right target size actually lead to shift towards the left target, as indicated by the light and dark blue dashed lines. However, the interaction of presentation time and target size, causes a shift back to the right. Δr targets r left r right initial movement angle mean presentation time main effect interaction Figure 3.8: Visualization of the functional principal comp onen ts decomp osi- tion of reaction time distributions, the dashed lines represen t the shift in dis- tribution asso ciated with an increase in the asso ciated v ariable, while the gra y solid line represen ts the mean reaction time distribution. The plot ( a ) sho ws the results using the difference in target sizes, while ( b ) and ( c ) sho w the results from the decomp osition conditioned on the size of the left and righ t target size resp ectiv ely . Reaching task under effort uncertainty 77 3.5 Discussion Inthisexperimentweexaminedhowchangesinactioncostsimpactthemovements of a subject completing a reaching choice task under uncertainty. The two most important manipulations in the experiment were the change in target hit radius, used to increase the effort associated with that target, and the variation of time in which the subject is presented with the potential targets. During this period of time, thetargetsaredisplayedbutthe“go”cuehasnotbeengiven; thesubjectcan observethetargetlocation(s)andthecoloringsthatindicatesthetargets’hitradii, and they may begin planning and choosing an action but may not initiate one. In our analysis, we first assessed which experimental conditions influenced subject’s choicesandoverallsuccessrate. Then,wemodeledmovementanddecisionfeatures in the early phases of the trial under varying time constraints. 3.5.0.1 Regression against sub jects‘ c hoice Analysis of subject’s selected targets on choice trials indicated that subjects have a somewhat variable sensitivity to the magnitude of reward, but the sensitivity is largely symmetric. That is, marginal increases in left target points led to an increase in the probability of choosing left, at almost an identical rate that increas- ing right target points led to an increase in selecting the right target. We further confirmed this by performing a binomial test on only trials with equal points and target size on both sides, finding no evidence for a significant bias toward either side. While this is not entirely surprising, it provides evidence that any implicit lat- eral bias arising from biomechanics only marginally impacts choice. A number of studies using a similar design, including a related previous study (Christopoulos et al., 2017) have noted the effects of a biomechanical bias in the choices and spatial distributions of reaching trajectories (Gallivan and Chapman, 2014) (Schweighofer et al., 2015). This biomechancial bias arises from smaller moment of inertia gen- erated when reaching to the right with the right arm (and in general these studies have focuses on right handed subjects). But, our results here suggest that this implicit bias does not cause the advantaged option to dominate. This is also largely consistent with our findings regarding the impact of target sizes on subject’s choice. In figure 3.4 we plotted the fitted coefficients found by Reaching task under effort uncertainty 78 modeling the choices of each subject individually. In contrast to the coefficients associated with target points (figure 3.4a), the coefficients relating to target sizes displayed no immediate pattern and were consistently much smaller in magnitude than the target points coefficients. The nearly straight line, completely up and to the left of the origin in figure 3.4a, indicates clearly that the marginal increase of points for either target results in a consistent increase in the likelihood of selecting that target. And the wide spread of coefficients in figure 3.4b, into nearly every quadrant, indicates that subjects did not treat equally the marginal changes in difficulty associated with each target. 3.5.0.2 Ev olution of preferences o v er time The data from all subjects was combined and used to fit a single mixed effects logistic regression model, using random intercepts for both subject and session. We performed this regression using the choices in all successful trials, but also used the final position of the subject’s hand to infer the intended target on failed trials. Inferring the selected target may be done in a number of different ways, but we found that a simple shortest distance to target metric was sufficient and other methods did not change the results of analysis. This process was also repeated using the subject’s hand position at the target reveal threshold, and using the initial launch angle, using the heading of the initial velocity vector to infer selected target. Again, other approaches to inferring the intended subject choice during the trial are possible, and earlier in the trial there might be justifications for a more sophisticated approach. However, we found that methods like prediction by clustering trajectories did not significantly impact the analysis and added unnecessary complexity. Results from these model may be found in 3.2. The results of fitting the logistic regression models of choice to the different phases ofthetrialarealsopresentedinthe figure 3.2. Themoststrikingresulthighlighted by this plot is the trend across phases for left and right target size coefficients. For both, the initial values indicate a reversal of preference, meaning that in the very beginning of the trial, subjects’ movements bend towards rather than away from targets that require greater effort. This directly contrasts the results from a earlier related study, where subject’s movements bent towards targets with higher expected reward (Christopoulos et al., 2017); if action costs were equivalent (but Reaching task under effort uncertainty 79 opposite) to expected reward, we would expect subject’s movements to bent away from higher cost options. While this effect is not especially large, and reverses to trend in the opposite di- rection for the remainder of the trial, the trend of the θ target coefficient reflects a similar effect. In two target trials, there are three different values of θ target ; for larger values, the targets are spaced further apart (see figure 3.1b for a diagram). From the success model 3.1, we know that increased θ target results in a slight de- crease in trial success, and seems to exaggerate the biomechanical advantage of reaching right over left. Indeed, we see that for both successful and unsuccessful choices, the increased eccentricity (θ target ) of targets resulted in an increased like- lihood of choosing the right target. Yet, as with the r left and r right coefficients, it exhibits a reversal between the initiation of the movement and crossing the movement threshold. Other studies have used similar dynamic reaching choice tasks to investigate the evolution of an ongoing decisions. In one particular study that examined reaching choices between two options that differed in energetic demands, it was observed that subjects rapidly integrated this information if the energetic difference was biomechanical in nature (i.e. a movement involving more or less active joints). However, the energetic cost was integrated slower if it was dependent on particular planning details (i.e. the length of the path in planning space) (Cos et al., 2014). Another study which aimed to determine how these effort costs influence changes in action plans “mid-flight”, reported that the level of effort changed the thresh- old for changing plans. However, the effort cost did not seem to influence the initial movement direction, but exerted influence on the processing of information after commitment to the initial decision (Burk et al., 2014). Additionally, the de- sign of the experiment specifically restricted initial movements toward the target midpoints, forcing subjects to initiate towards a particular target and switch later. Our results, and the results from these related experiments all point to a delayed time course for the estimation of at least some components of action related costs. Another study even suggests that this might apply to any type of costs; using another variation of the reaching decision task they observed that losses were consistently estimated slower than potential gains (Chapman et al., 2015). While the authors of the previously discussed experiments also conclude that these effects are due to the competition between potential actions rather than outcomes, Reaching task under effort uncertainty 80 in one set of studies the authors reached a differing interpretation of subjects’ behavior concerning the initial “spatial averaging” of movement plans. In a slight variation subjects were forced to perform reaches at two movement speeds, and it was reported that the spatial averaging behavior only occurred for slower paced movements. Thiswasarguedtobeconsistentwithnormative(or‘optimal’)models of choice, because the slower movement constraints increased the relative value of moving to the midpoint of the targets and “hedging your bets” (Wong and Haith, 1;Haithetal.,2015a). But, thisexplanationdoesnothaveamechanismtoexplain the coupling between the initial movement direction, velocity and reaction time. Additionally, there is strong evidence that movements of different velocities have distinct neural representations (Flash and Hochner, 2005). 3.5.0.3 Reaction times and v elo cit y profiles Many studies, including a related previous study using this apparatus, have found that shifts and other distortions of reaction time distribution reflect parts of the ongoing decision process. In the present study, we applied a form of functional principal component analysis to the observed reaction time distributions. This approach was chosen because it provides a non-parametric method of isolating the common modes of variation in reaction time distributions across subjects. The decomposition into principal components was used for two methods of analysis; first a MANOVA test against the trial conditions of interest 3.4, and second in a visualization using the results of a reverse regression on the trial conditions 3.7. There are several important features within the FPCA results. First, presentation time is seen found to be significant in each of the MANOVA tests, and from the both the FPCA visualizations and the cumulative distribution plots it is clear that they result in a forward shift and decrease in variability. This is consistent with what a number of other studies (Churchland and Shenoy, 2007) have reported, and consistent with the description of movement preparation as a time consuming process. Note that while the MANOVA results for the single target size comparison and the two target, left target size comparison, presentation time is found to be signif- icant but appears to show no change in the visualization plots: figure 3.7 left and center right plots. This is because while the changes in presentation time result in differences in variability within subjects, but no strong consistent trend across Reaching task under effort uncertainty 81 subjects. The cumulative distribution plots provide further evidence of this fea- ture; while there are differences in these conditions within subjects, the combined means across subjects show no difference. These results are important not only because they show the limitations of analyz- ing the shifts in these variables’ distributions through general purpose statistical methods, but also because they show a surprising result. The lack of consistent impact of presentation time or target size on the single target trial reaction time distributions is actually in line with the results of the other analyses here; for single target trials, i.e. trials with no choice, manipulations of the effort (target size) have no impact. This is also observed in fitting the velocity models, where we observe no significant impact of the target size on single target trials. TheasymmetrybetweentheFPCAresultswhenconditionedonlefttargetsizeand right target size are an unexpected result that appears throughout the analysis. There are additional hints of this asymmetry; in the logistic regression results 3.4, therighttargethasaslightlylargerandsignificantimpactintheinitialtrialphase. Additionally, in the linear model results fitted to hand velocities, only the right target size has a significant impact on the early movement velocity. And finally, both the principal component decompositions of reaction time and initial launch angle showed starkly different patterns of activity. From the results of the FPCA visualization, we can see an additional feature of thisasymmetry. Notonly doestherighttargetsizecauseadifferenteffectthanthe right target size (darker dashed line in figure 3.7), but also the interaction between presentation time and the target size is statistically significant and opposite in magnitude to the main effect. This means that any shift caused by an increase in the right target size is negated by an increase in presentation time; a trial with a larger right target and a long presentation time, has a slower reaction time, and bends back toward the right target. Similarly, in the plot of the launch angle principal components (figure 3.7), increases in the right target size shift the distribution of launch angles to the left, but increasing presentation time shifts the distribution back to the right. Reaching task under effort uncertainty 82 3.5.0.4 “Lean righ t, lo ok left?” One possible explanation for these unexpected results is a strategy of “lean right, look left”. In this strategy, the subjects anticipate the presentation of the targets, by biasing their preparation to the right target, but priming their attention to the location of the left target. From the underlying mean launch angle distribution 3.8 we can see that there is a slight underlying deviation or “lean” to the right target. But, given additional presentation time, subjects have enough time to inspect both targets, and initiate a movement that more accurately integrates the available information. Since there are many indications that subjects ultimately find the left target more effortful to reach, initiating a movement in its direction while you are still waiting for full information is a way of “hedging your bets”. The shift towards the left target associated with a larger right target size further supports this view; when the right target is an even more certain option, more resources (time, effort) may be spent investigating alternatives. And the underlying rationale is similar to the principle behind “elimination-by-aspects” (Tversky, 1972); instead of accumulat- ing costs and rewards for both options, subjects are looking to eliminate the left option based on a particular feature. Throughout the analysis of single target trials, there are very few observable changes to the subjects’ movement, even though these targets are known by the subject to be more “effortful” - and thus have a lower success rate. These results indicate that the lower success rate from an effortful option is perceived differ- ently than an option that has an equivalent success rate that is only determined by chance and not the subject’s actions. If the probability of failure due to a lack of effort and the probability of failure due to a probabilistic outcome would per- ceived the same, both could be incorporated into a single expected reward value. In the previous related study focusing on probabilistic rewards, initial movements reflected the distribution of expected reward. But this was not observed in this experiment; initial movements were bent towards targets with effectively lower success rates and modulation of movement ‘vigor’, which is commonly noted in actions with varied rewards (Choi et al., 2014) (Mazzoni et al., 2007), is absent in this case. Reaching task under effort uncertainty 83 3.5.0.5 Minimizing effort b y minimizing motor noise? Another possible interpretation of the results we observed here is given in (Harris and Wolpert, 1998), where the authors demonstrated that minimizing the mag- nitude of motor commands was equivalent to minimizing motor noise, given the signal-dependent scaling of noise levels in neural commands. This motor noise is alsoresponsibleforthevariabilityinmovementthatmustbecontrolledforsubjects to be successful in the present experiment, meaning that the observed movements could be shaped by a cost function attempting to minimize noise while fulfilling the task success criteria. This possibility is particularly plausible at a neural level since estimating the level of noise in a motor command requires only the motor command itself, while fully estimating energetic costs would require much more information about neuromuscular state which is not readily available to the central nervous system (Sparrow, 2000). The delayed time course of action costs’ impact in an ongoing decision, the ob- served strategy of “lean right, look left”, and the results from single target trials both provide evidence of overlapping, semi-hierarchical decision making systems in the brain. The lack of change to movements in single target (non-choice) trials, to the changes observed in choice trials suggest that the estimation of action costs is not necessary or simply not integrated into the movement decision. Only in the case of conflict, when there are multiple potential options, with no dominant choice, does action cost estimation play an important role. A conflict-dependent hierarchy of decision systems, in which a fast, reactive and largely unconscious sys- tem is aided by a slower, deliberative and conscious system, has been long consid- ered (and criticized) by theorists, but never seriously in the context of movements. This possibility offers a potential explanation for unifying the diverging descrip- tions of the serial ‘good’-based model and the action-based models of decision making. A dual neural field model of action selection 84 Chapter 4 A dual neural field mo del of action selection Abstract A common assertion in decision theory is that all dimensions of evidence are integrated into a higher level abstraction that represents the total predicted value or utility of a potential action. Further, it is argued that choices between actions are made at this level of abstraction; only after transforming the selected outcome into a plan for movement. While this view has been consistent with observations of many types of choices in animals and humans, there is an increasing amount of evidence that contradict the underlying assertions. Neural activity associated with both movement preparation and action valuation has been observed throughout cortical and subcortical areas long before action selectionorinitiation; inactivationofregionsassumedtobemotorspecificresultin distortions of choices, but not deficits in motor performance. Other cortical areas, often associated with action cost estimation, have been observed to representing relative differences between actions and sometimes acting independently of choice. Neurodynamic models based on alternative, action-based theories of decision mak- ing have been successful in reproducing neural and behavioral choice data in a number of different movement related tasks. These models make accurate predic- tions about rapid decisions with uncertain distributions of reward, but a number of recent studies have raised questions about how action related costs, like effort, A dual neural field model of action selection 85 are integrated into the decision process. Specifically, these observations raise the possibility that frontal cortical regions associated with action cost estimation only participate in ongoing decisions if necessary. In this paper we describe a novel modification of an action-based neurodynamic model that explores the possibility of a partially decoupled action cost estimation process. The predictions of the model are compared to a recent related human reaching study, and the predictions of traditional neuroeconomic models. 4.1 In tro duction Historically, the study of decisions has been primarily concerned with choices that exist at a high levels of abstraction. There are many underlying social, historical and practical reasons that the choices most studied as decisions involve lotteries, survey questions or hypothetical purchases. This has structurally embedded a world view in which there is a strictly feedforward flow of control from a decision making agent to the chosen movements. Several important implicit assumptions made by this world view have become explicit doctrine in the prevailing descriptions of neural decision making. The first assertion is that a single unified representation of a potential action’s expected gain is represented within the frontal cortices For decisions that are structured in the form initial state→ question→ response→ final state, these are reasonable assumptions that have not limited the current theories in making accurate predicts about neural behavior and human choices. However, the assumptions of serial structure of decisions are problematic when considering choices that are dynamic in nature. The serial structure essentially describes the world as “turn-based”, any transition period between the an agent reaching a decision and the resulting outcome state is ignored. But, for decisions aboutmovement, weashumans(aswellasotheranimals)arealmostalwaysinthis transitionary period between selecting an action and experiencing the outcome. The continuously changing environment means that the movements needed to enact a desired goal may shift in the time it takes to reach a decision. And this leads to another problematic assertion that is made in the most prevalent descriptions of neural decision making; in these theories, the process of planning movements is said to occur only after a final decision on desired outcome has A dual neural field model of action selection 86 been made Padoa-Schioppa (2011). If this decision requires a full accounting of expected rewards and costs, how can the action related costs, like physical effort spent in motion, be estimated when no movement has yet been planned? A common explanation for this conundrum is that action cost values are asso- ciatively learned through experience and are recalled during the action valuation process. This description is consistent with the evidence on how the expected reward of actions is rapidly estimated and integrated into behavior. There is even evidence to support this description in relation to costs associated with de- lay (Klein-Flügge et al., 2015). But, a recent study has reported strong evidence that cached values describing action values and costs are insufficient to describe observed behavior in rodents (Hollon et al., 2014). Moreover, studies investigatinghowhumansubjectsandanimalstrade-offphysical effort for reward and task constraints have found evidence that does not easily conform with the predictions of utility maximization. Observational studies, on tasks that are naturalistic (for humans), have noted many situations in which people reliably and knowingly choose an effortful, but conceptually simple route overaphysicallylessdemandingoneRosenbaumetal.(2014). Andwhileanumber of studies on different types of movement have noted that subjects do trend toward optimality with respect to effort over time Huang et al. (2012) (Taniai and Nishii, 2015) (Anderson and Pandy, 2001), other studies focusing on choice task with more active control have found distinctly non-optimal behavior Kistemaker et al. (2010). In a related previous study, human subjects were made to perform reaching move- ments towards targets with varying levels of associated effort and initial uncer- tainty about reward distribution. Surprisingly, subjects were observed to move initially closer to targets with higher associated effort, but this effect was dimin- ished by allowing subjects longer to deliberate before starting movement. Taken together, these observations make a compelling argument that action related costs like physical effort are not well described as a “negative utility” that is directly combined with expected gains during decisions. In this paper, we introduce a modification to recent action-based decision making models that attempts to explain these incongruous results. These action-based models are proposed alternatives (Cisek, 2007) to the most commonly cited ‘good’ basedview, andareheavilyinspiredbyecologicalandevolutionaryunderstandings A dual neural field model of action selection 87 ofnervoussystems. Manybehavioralstudiesinhumansandneuralrecordingsfrom primates and rodents have found strong support for these action-based views, but they do not easily explain the unique results in our previous study. The mechanism we propose here is a secondary decision circuit, arranged semi- hierarchically, that is responsible for estimating action related costs, but is only incorporated into ongoing decisions when they cannot be resolved through ex- pected gain alone. This circuit participates in action selection principally through inhibiting the activity associated with the less desirable action, rather than in- creasing the activity of the desired action. In the following section we outline the various sources of evidence that inform these design choices, and then we de- scribe the mathematical formulation and computational implementation details. We then present simulation results from the proposed modified model, as well as simulated predictions of the unmodified decision model and from the prevailing serial based framework. 4.1.1 Conflict dep enden t estimation of action costs There are several lines of evidence that inform the design and functional choices of the frontal cortex action-cost estimating field. While a majority of the research into decision making systems in the frontal cortex has focused on the contributions of the OFC and the vmPFC, the anterior cingulate cortex has also long been associated with decision related processes (Devinsky et al., 1995). Like many other cortical regions, these first clues about function came from reports of neurological dysfunction and legion studies. However, since the advent of imaging techniques such as fMRI, the ACC has been specificallyassociatedwithtwoprocessesthatareofparticularinterest. Thefirstis the ACC’s role in “conflicting monitoring”; many imaging studies have noted that the ACC is particularly active when decisions or judgements are made between closely competing alternatives (Pardo et al., 1990) (Botvinick et al., 1999). The second putative role of the ACC is representing the trade-off between effort and reward. This is supported by evidence from rodent studies using lesions and pharmacological inactivations that have reported decreases in willingness to expend mental or physical effort to gain reward after inactivation of the ACC (Hosking et al., 2014) (Rudebeck et al., 2006). In humans, one fMRI based study found that regions of the ACC encoded the anticipated level of effort, as well A dual neural field model of action selection 88 Figure 4.1: Sk etc h of the information flo w through neural structures in the prop osed ‘dual field’ mo del . The t w o neural circuits or ‘lo ops’ form the basis of this mo del, where in terconnected patc hes of cortex through sev eral regions are used to represen t a con tin uum of features. F or the motor con trol lo op, these are features lik e the desired mo v emen t in planning space, represen ted in PPC and SMA, while the plan in motoneuron space is represen ted in M1. In the fron tal lo op, the regions in the prefron tal cortex represen t features related to the v aluation and an ticipated costs, with the A CC. action specification action valutation motor control loop frontal inhibitory loop MT OFC vmPFC PPC M1 V1 ITS SMA BG ACC - as the net gain (reward minus effort), and also encoded the difference between the chosen and unchosen options in a trial (Klein-Flugge et al., 2016). Another investigation using neural imaging found that “behavioral apathy” was tied to decreased connectivity between the ACC and the SMA (Bonnelle et al., 2016). And another particularly relevant neural imaging study reported that activity in the ACC did correlate with anticipated effort, its estimate was not always integrated together with the other decision factors (Burke et al., 2013). The observation, reported in the last study mentioned, that the action cost signal from the ACC is not necessarily integrated and reflected in the other decision sig- nals observed, raises an interesting possibility. Given its association with conflict monitoring, does the ACC only contribute its estimate of action related costs in decisions with conflict? A dual neural field model of action selection 89 4.1.2 Inhibition-only in terv en tions Another distinct feature of the model we present here is the assertion that the ‘frontal’ circuit in our model only projects inhibitory connections to the decision circuits. The idea that the frontal cortex is responsible for inhibiting certain types of otherwise automatic behaviors is certainly not a recent one (Cummings, 1993) (Smith, 1992). As far back as the infamous case of Phineas Gage, scientists and physicians have found evidence that “self-control”, the capacity to resist immedi- ately attractive, but ultimately detrimental actions, is critically dependent on the prefrontal cortex (Ridderinkhof et al., 2004). In almost all descriptions of cortical functions, the frontal cortex has been contin- ually associated with the suite of “executive” functions that include inhibiting un- desirable reflexes. Impulsivity in adolescents, gamblers and neurological patients has been linked to underdevelopment or damage to the regions in the prefrontal cortex. And while there has been some evidence to dispute that the frontal cor- tex’s role in cognitive control is specifically mediated by inhibitory mechanisms, motor control seems to be the exception in which the frontal cortex acts directly through mechanistic inhibition (Aron, 2016). Recent studies have reported more evidence that directly points to an thalamo- cortical loop originating from the frontal cortex that is specifically tasked with inhibitory control (also referred to as “stopping control”) (Aron et al., 2007). A series of studies used trans-cranial magnetic stimulation (TMS) to disrupt ongo- ing processes during choice tasks; disruption in the lateral prefrontal cortex was associated with deficits in choices that required competition resolution Duque et al. (2012). And when TMS disruption was focused on the medial frontal cortex the ability to suppress actions associated with less desirable options was markedly reduced (Duque et al., 2013). Additionally, while some of these areas within the frontal cortex contain direct excitatory projections to motor neurons in the spinal chord (and these are uniquely prominent in humans), they generate a significantly smaller amount of activity in descending motor pathways, relative to the primary motor cortex (Boudrias et al., 2005) (Maier et al., 2002). Finally, another source of evidence supporting the partial decoupling of the frontal cortical circuits from the active motor decision circuits comes from different types of lesion studies. When lesion damage in humans is focused on the ACC, the result is typically a global slowing and increased response variability rather than A dual neural field model of action selection 90 inflexibility of control Holroyd and Yeung (2012) Stuss et al. (2005), which is consistent with an inability to inhibit competing alternatives. Damage focused on the orbitofrontal cortex, which is usually assumed to be the location of final ‘utility’ calculation, was observed to disrupt the correct choice of action, but left the ability to choose the right outcome intact Camille et al. (2011). In another recent carefully designed study, rodents were trained in a two-phase choice task and recordings from the OFC predicted the changes in learning from trial to trial, but did not predict within trial choice well. Using optogenetic inactivations, the researchers demonstrated that the OFC was not necessary for active choices, but wasnecessaryfor learning. Together, webelievetheseresultssupporttheideathat the frontal cortex’s primary mode of participating in active movement decisions is through an inhibitory attentive spotlight that enables an already active action by silencing the alternatives. 4.2 Metho ds 4.2.1 Design Motiv ations The most important philosophical question in designing explanatory or predictive models is deciding which level of abstraction to use, and the enormously com- plex nature of nervous systems makes this question especially difficult. Since the primary focus of study here is the dynamics of the neural processes that govern movement decisions, any proposed model must make specific predictions compara- ble to human or animal experimental data. This includes overt behavioral features such as choice ratios, reaction times, spatial distributions of movements and also neural recordings; and the need to make quantifiable comparisons to neurophysi- ological data sets a lower bound on the scale of abstraction for modeling. While neurophysiological studies collect electrical signals from individual neurons, it is widely accepted that these neurons participate in a population based repre- sentation. These contiguous populations of neurons that have activity correlated with specific behavioral tasks or computational functions; each individual neuron’s activity uniquely tuned to a specific variation of the task or function Tehovnik and Lee (1993) Graziano (2016). This means it is sufficient to model the activity of a population of neurons as a single continuous variable, which is the approach used in dynamic neural fields. In addition to simulating the electrical activity of A dual neural field model of action selection 91 neurons within a population, dynamic fields are also able to reproduce numerous computational functions important for action selection, which makes them ideal for making predictions about overt behavior and neural activity. 4.2.2 Mo del Arc hitecture The basic architecture of this model is influenced by the long history of model- ing the neuromuscular systems as a control loop, using a collection of individual modules to represent the primary brain and nervous regions involved in motor con- trol. In some cases these are meant to directly represent, specific brain regions, but usually modules representing part of the cortex are more loosely mapped to neuroanatomical regions. This is because, in comparing studies from primates, rodents and humans the functional and anatomical regions do not correspond ex- actly, and also because the computational function the module represents made be spread in a continuum across cortical regions. At the center of the model are two neural fields, which are a set of dynamical systems equations that reproduce the behavior of neuron populations recorded during sensorimotor tasks. One neural field represents an action planning field, where each neuron is tuned to a specific variation of a particular movement plan. This field is meant to represent the functions performed by the cortical regions in the thalamocortical motor circuit (Alexander and Crutcher, 1990), which connects planning area in the parietal cortex (for reaching this would be the parietal reach region) and the supplementary motor area to the motor cortex and basal-ganglia circuitry. Functionally, the motor circuit has been observed to prepare and hold the patterns of activity related to potential actions, and here it serves the same function in our model. The second neural field modeled represents a secondary, interconnected decision circuit corresponding to the basal ganglia-thalamocortical loop through the frontal cortex. In the model, as well as anatomically, this loop is homologous to the motor circuit loop through the basal ganglia (Nambu, 2008). In the brain, this loop includes the prefrontal regions of cortex like the OFC and vmPFC, which are associated with the hedonic, economic and net valuation of actions (Peters and Buchel,2010)(Padoa-Schioppa,2007). Inourmodel, wefocusonthecontributions oftheACCwhichisassociatedwiththeestimationofactioncosts, andcomparison between potential actions with respect to action costs (Hosking et al., 2014). A dual neural field model of action selection 92 Figure 4.2: Diagram of information flo w in a ’go o d’ theoretic mo del. π 0 π 1 π 179 . . . 180 π 180 GOODS VALUE FIELD Action execution ) w = [w 1 w 2 ...w M ] Weighted average of policies M 0 SPATIAL SENSORY INPUT FIELD ACTION PLANNING FIELD ACTION COST FIELD 179 1 0 Controller Controller Controller Controller A sketch of the information flow through the whole brain as conceptualized by this model is given in figure 4.1 and a diagrammatic representation of the specifically modeled components in given in 4.2. The basic outlines of the computational model are inspired by Christopoulos et al. (2015) and Christopoulos et al. (2017), which are descendants of the theory and computational model outlined in Cisek (2007). Sensory information, such as the stream of visual information originating from the occipital cortex (V1) is projected through successive cortical regions, extracting increasingly abstract features. This information is processed by the dorsal path- way into spatial and proprioceptive representation of features that enable interac- tion, and along the ventral pathway into semantic features that inform valuation Goodale and Milner (1992). Although the computations necessary to perform these sensorimotor transformations have been demonstrated using dynamical neu- ral fields ( Sandamirskaya and Conradt, 2013), we do not model them explicitly here. For simplicity, these transforms are performed implicitly in the simulation; the visual information the center of targets and the reward value indicated by the target color are passed directly as parameters to the stimulus arriving at the action planning field. A dual neural field model of action selection 93 4.2.2.1 Motor planning field The transformed features of the visual targets, i.e. their position, size and color- ing, induce activity in the motor planning field. If a target is presented in visual space, the neuron representing a reach toward the target’s position in space re- ceives simulation; and if the coloration of the target indicates a high reward value, this stimulation is increase proportional to the expected reward. The output of this neural field is used to determine which action is executed; once a neuron’s activity surpasses an internal threshold, the specific action parameters represented by that neuron are passed to the movement controller. If a number of neurons are activated, the sum of all active neurons parameter, weighted by their activation level is used to determine the output movement. 4.2.2.2 F ron tal con trol field The same preprocessed sensorimotor features that are given as input to the motor planning field are also passed as input to the frontal field. However, two more sources of input are provided to the frontal field; one is an estimate of anticipated actioncosts associated with eachneuron’s plan, whichisanegativeinput. Andthe second is a manually specified excitatory stimulus that we refer to as the “frontal bias”; this small amount of additional stimulation is used to simulate attentional bias driven by top-down processes. In the same manner as the motor planning field, the output of the frontal field is determined by the neuron or small number of neurons with the highest activity. And as in the motor planning field, the properties of the neural field discussed ear- lier create a winner-take-all dynamic that ensures only a small number of neurons can maintain high levels of excitatory. The most active neurons in the field are used to determine the shape of the inhibitory curve that is projected back to the motor planning field. Theinhibitionthatisgeneratedbythefrontalfieldandfedintothemotorplanning field is modeled by the formula ( 4.3), and plotted in figure 4.3. This function is centered on the neuron with the highest level of activity (u), where the value of the curve is zero, meaning that the highest activity neurons are not inhibited. Neurons further away from the center of activation receive increasing amounts of inhibition, approaching a horizontal asymptote in either direction. The effect A dual neural field model of action selection 94 75 50 25 0 25 50 75 neuron index 1.0 0.8 0.6 0.4 0.2 0.0 0.025 0.05 0.1 0.25 Figure 4.3: Plot and equation describing the shap e of the mo del’s inhibitory output pro jected from the fr on tal field to the motor field. f inh (x) = min ( 1 sinh 2 ω(x−ϕ) −1,0 ) of this inhibition is analogous to an “attentional spotlight”, which increases the relative gain of the attended features by decreasing the activation of everything that is not at the center of the curve. From the plotted curves in 4.3, we can see that the parameter ω controls the width of the central zero-inhibition region of the function.’ Modulating the value of this parameter means that the overall level of inhibition to all but the central neurons is increased. The resulting change in the inhibitory curve from a wide bandwidth toanarrow one maybe thoughtofas increasedattentionalfocus. Since the contributions of the frontal field and its inhibitory projections are conflict dependent in this model, we determine the value of the parameter ω using the following formula: ω = αexp− σ(u motor ) µ(u motor) The term within the exponential function, the standard deviation of the signal in the motor field, divided by its mean, ( σ(u) µ(u) ) is sometimes referred to as the Fano factor and has been used as an approximate measure of signal to noise in particle and neural applications (Eden and Kramer, 2010). In this formula α is A dual neural field model of action selection 95 a free parameter that depends on the number of the neurons in the field and intended shape of the inhibition curve. We used the following method, we start by finding the greatest value of ω for which ∑ n i f inh (i) < ϵ, where ϵ is some desired tolerance. This is the maximum value of ω and is the value at which there is effectively no inhibition. When there is a single patch of active neurons, variance in the motor field is relatively high, and relatively constant activity spread across many neurons results in lower variance. This means that ω shrinks the inhibitory bandwidth when a single movement plan dominates. 4.2.3 Sim ulations In the following we will be considering either very simple perceptual and action decisions, and also a specific psychophysics experiment designed to probe the dy- namics of the early phase of action. For full explanation of the protocols of this experiment see 4. In brief, this experiment has subjects reaching towards one of two visual targets presentedtothemonascreen. Theinitialappearanceofthetargetsindicatedtheir position, and a coloring indicating the accuracy required to “hit” the target, i.e. the “effort” of the action. A short period of time was provided with both targets in view before the subject was allowed to initiated movement, and the length of this time was varied from 0 to 2 seconds. Once the subject had initiated the movement, the reward points associated with each target were indicated; these points were earned if the subject reached that target before the allotted time expired. Essentially, the subject must start moving towards one of two targets, with some information about the “effort” of each action, and no information about reward until in motion; then update their plans and finish their motion once the reward information is available. 4.2.4 Dynamic neural fields Dynamic neural fields are a system of differential equations developed to model theneuronalactivitythatperformsanumberofcomputationalfunctionsnecessary for decision making. In the formulation used here, there are two main dynamical variables modeled at a set number of points along a continuum; these discrete points are referred to as ‘neurons’ but actually represent small populations of neurons. A dual neural field model of action selection 96 The first variable, neuronal activity, u(x), loosely corresponds to the level depo- larization of the neurons’ cellular membrane. The time evolution of this variable is given by the following formula; τ ˙ u(χ,t) =−u(χ,t)+h+S(χ,t)+ ∫ w(χ−χ ′ )f [u(χ ′ ,t)]dχ ′ There are three important sources of input to each of these neurons; the first type of input is external input which originates from another part of the model, or is determined manually. The second type of input is recurrent self-excitation, which is an input proportional to the output firing of each neuron. This type of connectivity is termed ‘recurrent’ and allows sufficiently strong neural activity to sustain itself, albeit briefly, even when the original stimulus is removed. An important function of this recurrent excitation is to act as a type of short term working memory for sensorimotor tasks (Johnson et al., 2009). The final type of connectivity is a distance dependent lateral-inhibition, that is inhibitory connections which connect each neuron to all others, but with stronger connections to the nearest ones. This type of lateral inhibition is a well known and well studied neural phenomenon, and computational mechanism (Amari, 1977). A simpler form of lateral inhibition, uniform in strength across neurons, was intro- duced as a mechanism to model winner-take-all dynamics in leaky accumulator type models (Usher and McClelland, 2001). When the activation of one neuron, or cluster of neurons, becomes sufficiently greater than another active group, the increase in inhibition on the less active group lowers the group’s output, decreasing the amount of inhibition to the more active group, causing it to become even more active, completing the positive feed- back loop. This positive feedback loop is halted when the inhibition driven by the most active neurons has silenced the activity of all the others, reaching a steady state determined by the size of the input stimulus and the exact parameters of the lateral inhibitory connections. The latter two types of input, the recurrent self-excitation and lateral inhibition, have some compounding effects, in that both the recurrent activation and lateral inhibitory connections contribute to the winner-take-all dynamics. But also, their combined actions act to smooth noisy input and enhance ‘contrast’. Neurons with similarly activated neighbors will restrain each others‘ activity, but if an A dual neural field model of action selection 97 active neuron only has active neighbors in one direction it will receive less lateral inhibition and naturally cause ’edge’ neurons to have slightly higher activity. In our model, the recurrent self-excitation and lateral inhibition between neurons is implemented by convolving the output of the neural field with a difference of gaussians kernel. This combines both the excitatory and inhibitory functions into a single computation: w(χ−χ ′ ) = c exc e − (χ−χ ′ ) 2 2σ 2 exc −c inh e − (χ−χ ′ ) 2 2σ 2 inh The sensory input manually specified to the model is “cosine tuned”, meaning that the activation of a neuron that responds strongest to input at position ϕ, is given by cosθ−ϕ, where θ is the current center of activity. This type of tuning has been observed in neural recordings of the motor cortex during graded movements (Amirikian et al., 2000), and has also been show n to be a computationally optimal scheme for reducing error (Todorov, 2002). 4.3 Results Figure 4.4 and figure 4.5 show diagnostic plots of all the dynamical variables that are included in the simulations of the dual field model. Each plot shows the evolution of the variables for each neuron, with the identity, or position in the neural field, as the y coordinate and the time steps one the x axis. In the upper right hand corner, the plot of stimuli shows the cosine tuned, manually specified inputs to the model that are used to represent the reaching targets in the original study. The motor field and sensory field activity plots (the middle column) show theactivity(modelvariableu)overtime,withtheblackdotsindicatingallthetime points of a neuron exceeding threshold activity. The output from each of these fields is shown in the right most column, with the dashed white line indicating the neuron that represents the active movement controller (for the motor field) or the center of the inhibitory curve (for the frontal field). Finally, the shape of the frontal inhibition curve is shown in the bottom left hand corner. A dual neural field model of action selection 98 -90 -60 -30 0 30 60 90 Stimulus -90 -60 -30 0 30 60 90 motor field activity 0.2 0.1 0.0 0.1 0.2 x coordinate -90 -60 -30 0 30 60 90 frontal field activity fixate presentt = 0.0 s threshold t = 1.0 s 0 100 200 300 400 conflict fixate presentt = 0.0 s threshold t = 1.0 s -90 -60 -30 0 30 60 90 0.25 0.30 0.35 0.40 0.45 y coordinate Figure 4.4: An example sim ulation result of the prop osed mo del p erforming a simplified v ersion of a c hoice task. In the sim ulated trial, targets are initially presen ted at -40 and 40, after the mo v emen t is initiated and crosses threshold, the target at 40 is extinguished. The plots of motor field activit y and fron tal field activit y sho w the neuronal activit y o v er time (with y ello w b eing the highest activit y lev el), and the blac k p oin ts represen t activit y that exceeds the threshold. The plots of the fields’ output (righ t most plots) sho w the output activit y , for the motor field the dashed line indicates whic h attac hed optimal con troller is activ e, and for the fron tal field the output indicates the cen ter of the fron tal inhibition. The fron tal inhibition (b ottom left) sho ws the early conflict in the motor field increasing activ ation of the inhibitory curv e un til the motor field stabilizes. 4.3.1 Reaction time sim ulations We performed a series of simulations on the proposed model using scenarios based off the experiment described in Chapter 3, in each of the follow sets consisted of 1000 simulations. First a series of trials were conducted using a single stimulus, that represented one of the reaching targets in the aforementioned experiment. These trials were used to determine the baseline distribution of reaction times generated by the model. Then a series of two target trials were conducted using several variations of ex- perimental parameters. First, we simulated trials with two stimuli of nearly equal strength, which represented two reaching targets that had equal valuations of util- ity, but still included some variation. Second, we performed a series of trials in which one stimuli were of unequal magnitude, but again slightly varying the A dual neural field model of action selection 99 -90 -60 -30 0 30 60 90 Stimulus -90 -60 -30 0 30 60 90 motor field activity 0.2 0.1 0.0 0.1 0.2 x coordinate -90 -60 -30 0 30 60 90 frontal field activity fixate present t = 0.0 s threshold t = 1.0 s 0 10 20 30 conflict fixate present t = 0.0 s threshold t = 1.0 s -90 -60 -30 0 30 60 90 0.25 0.30 0.35 0.40 0.45 y coordinate Figure 4.5: An example sim ulation of the prop osed mo del p erforming a v ersion of the exp erimen t with only a single target (non-c hoice). Note that in this trial, the fron tal bias is initially placed in the cen ter, this is mean t to recreate the effect of the instructed visual fixation that o ccurs in the related exp erimen t. 0.25 0.50 0.75 reaction time (s) A 0.25 0.50 0.75 B 0.25 0.50 0.75 C 0.25 0.50 0.75 D 0.25 0.50 0.75 E Figure 4.6: Results from the reaction time sim ulations; the plot (A) sho ws the observ ed distribution of a h uman sub ject in the related exp erimen t from single target trials and (B) sho ws the distribution from single stim ulus (non-c hoice) trials, while (B) sho ws sim ulations from c hoice trials with similar size stim uli, (C) sho ws the results from trials with unequal stim uli and finally (D) sho ws the results from trials with similar stim uli but including a fron tal bias term. difference in magnitude and identity (side) of the higher valued target. In the ex- periment some targets had smaller hit radii and thus required slightly more effort to accurately reach; here the differences in effort are directly integrated into the valuation of the targets. Thefinalsetofsimulationsweperformedusedtwostimulirepresentingtwotargets, but introduced an additional frontal bias term. This frontal bias is intended to represent the attentive bias of subjects; in the results of the experiment we noted a strategy of preferentially attending to a particular target at the beginning of the trial, while preparing movements to both. To recreate this strategy in our model, A dual neural field model of action selection 100 we included a small stimulus input to the frontal field that was position at the target initially attended to. This meant that the side with the initial frontal bias would have a slight advantage, but only in trials in which there was enough early conflict to engage the frontal inhibition circuit. The results of the reaction time simulations show in figure 4.6 show results that are consistent with the subject’s behaviors noted in the previous experiment (see Chapter 3). The plot furthest to the left shows the reaction time distribution from a human subject in the experiment, part of the top 5 subjects were used to tune any free parameters in this model. The trials in which there was no choice, (single target trials) are shown in the second plot from the left and have the fastest and earliest peaked distribution. This is also in line with the results from the human subjects, althoughthesubjects’distributionofreactiontimeswasgenerallyheavier tailed. The heavy tails of reaction time distributions in subjects The remaining plots shown in figure 4.6 show the result from choice (two target) trials. Themiddleplotshowtrialswithequalcosttargets, trialswithunequalcosts and finally trials with equal costs but including an initial lateral bias. Again these results were largely consistent with the distributions observed in the human sub- ject study, although again, the tails of the distributions were still slightly lighter. The differences in distributions were statistically evaluated using a Kolmogorov- Smirnov test (Massey, 1951) on the observed cumulative distributions, this tests against the null hypothesis that an observed distribution was drawn from a ref- erence distribution (in this case the compared distribution). All the tests were found to be significant at the p < 0.05 level. This test is specifically chosen be- cause mean and median tests are often not informative when used on reaction time distributions (Baayen and Milin, 2010). 4.4 Discussion In the current paper, we introduced a novel variation of a neurodynamic model of action based decisions making. There are two important distinguishing features to this model; the first is that there are two parallel systems that are perform- ing nearly simultaneous decisions using similar processes but with distinct roles. Secondly, while the primary, ‘motor circuit’, system is involved in every decision, the secondary, ‘frontal circuit’, system is only integrated into the decision mak- ing process on-line in the case of conflict in the primary system, and even then A dual neural field model of action selection 101 only acts through inhibition. The most important predictions this model makes concern the temporal dynamics of decisions, in particular how actions are selected and updated in the presence of uncertain information about potential gains or expected costs. This model was designed with the results of a previous experiment in mind and was able to reproduce some of the unexpected features of behavior noted in the subject’s movements. Specifically, there appeared to be selective integration of action related costs in the early phases of movement in the reaching task. In the reaction time simulations we performed based on this experiment, using the models proposed here, we observed several important features that were consistent with the human subject data. First, the addition of a second stimulus, recreating the choice conditions in the experiment resulted in a rightward shift and widening of reaction time distribu- tion. Second, when the stimuli were of unequal magnitude, representing targets of different value, we observed that the reaction time distribution narrowed, but remained wider tailed and thicker than the non-choice trial distributions. This in- dicates that even when one stimuli (target) is of significantly higher value, other, unchosen target still influences the initial movement. The final and most interesting feature that was reproduced in our model came in the final set of simulations. In this set of simulations we included an additional stimulus, only to the frontal field which represented a kind of attentional bias. Since this stimulus was only provided to the frontal field it did not directly impact themotorfieldatthebeginningofthetrial. Onlyoncetherewassufficientconflict, i.e. sustained activity from both stimuli in the motor field, does the inhibition generatedbythefrontalfieldbecomeintegratedintothemotorfield. Theresulting reaction time distribution was distinct from the other two sets of simulations using two stimuli and importantly different than the simulations with unequal valued targets. It also shows shifts or part of the distribution to the left and to the right that we observed in a functional PCA analysis on reaction times. This is an important result because it is consistent with how we observed subjects to react to trials in which targets were presented with different levels of associated effort. If subjects estimated the effort as an action cost, and combined it with the expected reward of the target, the resulting reaction time distribution would look like that in (C) where the stimuli representing the targets were explicitly modeled as different in magnitude. A dual neural field model of action selection 102 4.4.0.1 Limitations Oneofthebiggestlimitationsinbuildingcomputationalmodelsofneuralfunctions is the number of free parameters and ballooning complexity. Very simply models like the drift diffusion or leaky accumulator models are able to recreate some very specific, constraint instances of behavior, but modeling experiments that allow for more diverse responses necessitates more complicated models. This is a distinct problem with neural field models, because there are a fair number of parameters which must be tuned to match experimental data and also many decisions that are partly arbitrary, partly educated guess. For example, the number of neurons used to represent the continuum of parameters must be chosen and there are no hard principles to inform this choice other than simulation and opinion. This also means that using these kind of models for comparison against experimental data requires adaptation of some of the structural parameters. This limitation is certainly one that effects our reaction time simulation. There are a number of time constant parameters in this model that are originally derived from fitting neural observed neural phenomena ( Coombes et al., 2014), but are many steps removed from natural parameters. For reaction times specifically, we adjusted parameters to match the distributions from a combination of the top 5 subjects of the related experiments. 4.4.0.2 T estable predictions The most important role of any computational model is to create falsifiable pre- dictions that may be tested in experiments. There are two areas in which this model makes distinct predictions that may also be relatively easy to incorporate into a human subject experiment. The first involves the impacts of having many actions available to achieve a desired goal, and the second is in tying the frontal circuit loop to subject’s overt attention. One of the implications of the connectivity in this model is that a desired out- come that is associated with many possible actions will trigger the formation of many competing action plans. If two targets or potential goals are presented, with roughly equal reward, but one goal is associated with only one possible ac- tion, while there other is associated (through training) with many similarly costly options, are they assigned different values? By traditional assessments of utility A dual neural field model of action selection 103 these options should be roughly equivalent, even potentially slightly favoring the option with more associated actions. However, in our model, the decision is a competition between the actions and not the outcomes. This experiment may be extended further by considering different configurations of possible movements to the presented targets. If two roughly equally reward targets are presented, but with different sets of associated actions that have a distribution of different action costs, how do these many potentially irrelevant paths of action impact a decision? An experiment with a similar design to the effort based reaching study referenced in this paper reported that subjects rapidly integrated the relative effort between two options into a free choice Cos et al. (2014). However,thisexperimentusedmovementsthatwerepracticedandfamiliar to subjects, if an experiment manipulated these movements such that the motions remain similar, but the action costs are not, are these costs still rapidly estimated? This suggestion is familiar to the idea of “choice paralysis” that has been studied in decision psychology, but has not been studied in the context of movements (Huberetal.,2012). Intheusualconception, anabundanceofpotentiallydesirable outcomesresultsinincreaseddelaysanddecreasedconfidenceinformingadecision (Reutskaja et al., 2018) (Huber et al., 2012). However, in this case the number of outcomes to choose from remains small, only the number of possible actions to achieve them increases. Our model here predicts that the increased number of relevant movements will lead to increases in latency to perform actions and a shift in choice preference towards outcomes with only one or few associated movements. This is a relatively counter-intuitive possibility which arises from the model and makes a strong testable prediction. An alternate line of experimentation might consider using eye-tracking to infer the subject’s focus of attention and compare the resulting activity over time to that of the frontal neural field. This requires the assumption that the subject is attending to whatever is in the center of their visual field, but in a rapid choice task with mainly visual cues, this is likely a justified assumption. A possible experiment might force the subject to maintain different points of fixation during the early parts of the trial, and could test the “lean right, look left” explanation posed in the related human subject study. A dual neural field model of action selection 104 4.4.0.3 Dual decision making Themodelproposedherecombinesanaction-baseddynamicneuraldecisionmodel with a secondary interacting loop that acts like an attentive spotlight to inhibit undesired actions. This dual process structure is philosophically inspired by many descriptions of decision making that emphasize a fast, reactive, hedonic-focused system, coupled with a slower, deliberative and more economically rational system Glimcher et al. (2005a). The habitual or reflexive system is the set of unconscious and mostly autonomic processes that are responsible for the majority of our basic coordinated behaviors, and is likely very similar across mammals. The results from the model here indicate that this conceptual delineation between decision systems is not only intuitive but also a potentially a useful approach to building models to simulate human choices. Conclusions 105 Chapter 5 Conclusions 5.1 Discussion In the preceding chapters, we have explored the dynamics of reaching under un- certain conditions from several perspectives. First in a human experiment and dy- namical model that interaction action valuation and planning under uncertainty. Second, we extended the experimental design to manipulate the amount of per- ceived effort associated with an action. And finally, the results The results of this investigation into action cost led to results unexplained by any of the current theories of neural decision making. 5.1.1 Summary of exp erimen ts and results In Chapter 2, we introduced a variation of a ”go-before-you-know” experimental protocol that was specifically designed to investigate the imitating of movements under reward uncertainty. This design forced subjects to initiate a reaching move- ment towards targets with only partial information about the distribution of re- ward. The most important part of the observed results was a deviation of the initial reaching trajectory that was consistent with a reward probability weighted combination of motor plans. Another critical feature of this deviation was that it coincided, on a trial-by-trial basis with shifts in reaction time distributions. Both of these features were found to be consistent with the results of an under- lying competition between actions described by theories of action-based decision making. Conclusions 106 The experiment outlined in Chapter 3 extended this experimental protocol to include manipulations of effort in the movements required to reach particular tar- gets. By modulating the effective hit radius of the targets, subjects needed to perform movements at the same speed with higher accuracy, necessitating a sig- nificant change in the muscle contractions needed to increase endpoint accuracy while containing the noise generated by larger control signals. In contrast to the previous experiment, we noted that subjects’ initial movements appeared to devi- ated towards, rather than away from the more effortful options. Further analysis revealed evidence for a asymmetry and time-dependence to these effects; across several of the features studied, shifts in the right target resulted in more signifi- cant changes that were negated by the interaction between additional movement preparation time. The results of the experiment in Chapter 3 suggested an explanation of a ”lean right, look left” strategy, in which subjects internally prepared both movements, but slightly favored the right. But, at the moment the targets appeared, their visual attention focused on the left target, only switching over to the right target if the trial featured a longer preparation period. This strategy inspired the dynamical model of decision making described in Chap- ter 4. Based on the framework of action-based decision making and dynamic neural field theory, we introduced a secondary neural field into the model which only participates in a decision when there is conflict between options. It is in- tended to represent a simplification of the frontal thalamocortical loop through basal ganglia, and based on evidence from a number of different lines of research, we restrict the contribution of this field to be inhibition only. In short, the frontal field 5.1.1.1 Limitations and criticisms A constant difficulty in the designing and analysis of human subject studies is the large amount of inter-subject variability. An experimental task must give sub- jects freedom enough to make natural, representative choices, but the wide range in ability and biomechanics across subjects it difficult to ensure the same condi- tions are being tested. In the design and piloting of the experiments in Chapter 2 and 3, many different factors were found to have subtle influences in subject performance. For some subjects, using the robotic manipulandum when reaching Conclusions 107 was reported to be ”unnatural” and clearly caused impacts on normal reaching motions. Other subjects found no such difficulty and were able to quickly feel comfortable making accurate reaching movements. Yet other subjects discovered unintentional features of the experimental setup (”hacks”) that allowed them to improve their performance. One example of this was abusing the workplace limits of the robot; a subject found that making extreme movements toward the targets, being stopped by the workplace limits and then making a short corrective movement to the target was an effective strategy. It should be noted that this is hard to describe as opti- mal by most formulations using kinematic or dynamic movement parameters, but nonetheless was described as ”the right strategy” by the subject during debriefing. While this is an extreme example, it was one of a number of stable strategies that appeared throughout the two experiments. A visualization of how subjects’ movement strategies from the experiment is shown in figure 5.1. The existence of these common, stable, but relatively niche strategies in these experiments poses a problem, albeit a potentially informative one. In real world scenarios, it is common to find clusters of stable strategies. Examples of this may be found throughout the world of athletics, where even at the highest levels of achievement there are generally a number of archetypes, none of which completely dominates. After years and years of bent-elbow strokes being taught as the correct form of freestyle swimming, Vladimir Morozov, swimming for the University of Southern California, posted the fastest ever 50 meter split time using a straight- arm ”windmill” technique which had been coached out of generations of younger swimmers. The same coaches, trainers and researchers had also been advising three strokes between breaths in freestyle for many years, only for Katie Ledecky to become the most decorated female swimmer in history while breathing almost every stroke. The conclusion that different bodies make different techniques or strategies in movement preferable should not be surprising to researchers or anyone who has participated in physical activity. But it does pose a difficult problem for the study of the coordination of movements and strategies. There are certainly situations thatmaybedesignedtoengineerascenarioinwhicheverypersonwouldexperience thesamesetoftrade-offs,andperceivetheincentivesinasimilarenoughmannerto Conclusions 108 Figure 5.1: Heterogeneous clustering of sub ject’s mo v emen t strategies. Plot- ted are the results of a t-SNE manifold trained on the v elo cit y profiles collected from single target trials in the exp erimen t describ e in Chapter 3; the result is an em b edding space whic h groups similar tra jectories together. Eac h color rep- resen ts a differen t sub ject, and just from insp ection it is clear there are man y sub jects with similar o v erlapping strategies, and others who a quite idiosyn- cratic. identify generic principles influencing choice. The many discoveries and successful theories of movement sciences can attest to this 1 . However, many of these experimental situations are somewhat contrived and di- vorced from natural environments; in the majority of the reaching tasks investi- gated, including the ones considered here subjects are usually restrained. Moving 1 One only needs to bro wse a short s ection of the bibliograph y b elo w to confirm this. Conclusions 109 back towards naturalistic tasks with less constraints in the environment, it is in- evitablethatmanydifferentbutinternallyconsistentpatternswillariseinsubjects. This itself is a problem for the theories of utility maximization or optimal control, since it strongly implies that people are not that concerned with global minima. It has even been argued that the ability to find and form stable, ”good-enough” strategies is in fact one of the most important and overlooked features of biological systems of decision making (Loeb, 2012). Yet the analytical techniques and experimental methodologies to reason about and draw conclusions from these kinds of strategy clusters is relatively underde- veloped. There are certainly many methods available to cluster these data and find patterns in an unsupervised manner, but all of these involve decisions that are at worst arbitrary and in the best case already defined by some principle. But, the increasing prevalence and availability of cheap, personal, wearable sensors, ensures that the data sets will only become larger and more heterogeneous. A parallel problem to this is that is discussed in Chapter 4 is the growing com- plexity of computational models needed to represent and simulate these behaviors. Many of these dynamical models also exhibit these kinds of heterogeneous, sta- ble ”strategies”, but are referred to as attractors or limit cycles. These different points of stability formed by the same equations mean that one model may be ca- pable of producing several different types of diverging behavior ( Izhikevich, 2004) (Izhikevich and Edelman, 2008). Ultimately, this is problem of heterogeneous or meta-stable behaviors may not be avoided. It is one of the desired properties of dynamical systems; many different behaviors may be generated from simple systems using only slight changes in their parameters. Nervous systems are by their nature hugely parallel and complex systems, and while there will be pieces that will be describable in closed form or neatly compact ways, much of it will not be intuitive or explainable in concrete terms. Going forward it will be necessary to embrace this complexity, but also develop new tools and statistical methodologies that better able to handle and draw conclusion from this kind of large heterogeneous data. 5.1.2 Dual and bac k again The model we introduced in Chapter 4 may be described as a kind of ”dual pro- cess” model and it is inspired by and philosophically connected to many historical Conclusions 110 descriptions of decisions and the mind. The most ancient roots of these ’dual’ the- ories trace back to the mind-body dualism of classical philosophy, which centered around a sharp delineation between the mind or soul and the physical body. In this conception, the body exists within the material world, and the mind exists in the immaterial world; through a process mediated by the divine, the immaterial mind is able to direct the material body. Through the ages, the requirement of divine intervention has slowly been relaxed, and slowly more agency ascribed to the body itself. However, the fundamental idea that there is a singular, centralized form of agency that is ultimately ’in charge’ of behavior persists to this day. While there was some time in which the behaviorists and materialists argued for a banishment of internal states and other immaterial elements, different forms of du- alism have been a perpetually renewing source of ideas. The constantly recurring theme of dual systems that appears in psychological, spiritual, philosophical, and colloquial explanations suggests that there is indeed something of this structure, or at very least, that it is a productive model for practical purposes. The most relevant, recent renewal of dualist theories of psychology is the descrip- tion of System 1 and System 2 outlined by Kahneman and Tversky; System 1 acting quickly, through heuristics and requiring little conscious effort, while Sys- tem 2 acts slower and diverts mental resources to make explicit decision. These two parallel systems also mirror the descriptions of the different valuation systems describedintheeconomicliterature; anautomaticandassociativelylearnedPavlo- vian system, and a deliberative and rule based goal-directed system (Kable and Glimcher, 2009). And this delineation between an automatic, bounded subcon- scious process that performs most decisions, while a secondary conscious decision process that performs high level economic-type choices that are framed by the automatic lower level processes is also reflected in the somatic marker hypothesis (Bechara et al., 2000). Adualprocessstructurehasalsobeenproposedinthelearningofbehaviorsaswell. It has been shown convincingly that in the process of motor learning for simple tasks, there are two interaction systems at work; and again, one acts quickly (but forgets quickly) and the other is slower, more deliberative but generates long-term memories (Lee and Schweighofer, 2009). More recent work has even identified patterns activity in the prefrontal cortex and striatum that extend the current model of dopamine gated TD-type learning, proposing that there are two interacting learning systems, with one ’training’ the other (Wang et al., 2018). We Conclusions 111 believethealloftheseexamplesofwhichreflectthedualarrangementofprocessing systems, and make a clear case that intelligent systems are formed from tangled semi-hierarchies that produce complex emergent behavior. 5.1.3 The need to unify c hoice and con trol As we have outlined, the study of ”choices” that decision making has centered on, and the study of movement coordination have existed in parallel worlds. Largely, it has been unnecessary or impossible to study the intersection of these processes; technologically, the capability to observe neuronal dynamics and movement in enough quantifiable detail is only a recent accomplishment. But, for a number of reasons, the conceptual division between our choices and our movement is becom- ing an impediment to addressing many important challenges. The first and foremost of these concerns is our growing need to understand how to recognize dysfunctions in our movement and decision systems, and then to under- stand how to treat them effectively. From the discussion in the preceding chapters, it should come as absolutely no surprise that disorders of movement, disorders of motivation, and disorders of decisions have many overlapping underlying causes. This includes many of the most prominent and neurological disorders, such as Parkinson’s disease, Huntington’s disease, and stroke induced dysfunction. The same systems are also implicated in a number of less debilitating, but more prevalent disorders that are not usually connected with movement. One of the most common and under-treated issues of mental health are disorders of mood or apathy, and studies have identified potential mechanisms that propose malfunc- tions in assessing the effort necessary to take action as a root cause ( Cléry-Melin et al., 2011). A recent imaging study has even reported that poor connectiv- ity between the SMA and the ACC as a distinct feature of patients experiencing behavioral apathy (Bonnelle et al., 2016). All of these conditions mentioned are some of the most challenging problems fac- ing health professionals today, and treatments or interventions remain expensive and often unreliable. For the treatment of stroke, in particular, the potential for preventative and acute treatment is limited, and so the need for long term reha- bilitation and assistive care is only likely to grow. Being able to determine what is effortful for patients, what is effective as treatment, and how robotic systems Conclusions 112 might be able to help in recovery all depend on computational models to move forward (Krebs et al., 2009) (Hogan et al., 2006). Similarly, prosthetics and brain computer interfaces are becoming increasingly a practical possibility for restoring function for amputees and even augmenting human performance further into the future. However, to be truly useful and effective they need to be seamless and ”effortless” to control. There is still active debateoverwhichsignalsfromthebrainarethemostappropriatetodriveexternal effectors ( Andersen et al., 2004), and from the discussion in the previous chapters it should be clear this is not an easy answer. Additionally, the recurring theme of hierarchical structures and predictive control in the brain and biological systems strongly implies that any external prosthetics or interfaces to the nervous system should follow these principles. One final area of research that stands to benefit from closer study of the mecha- nisms and processes underlying movement decision is the field of artificial intelli- gence and robotics. In the years intervening years between the first experiments describedhereandthelast, therehasbeenamarkedrevivalininterestin”artificial intelligence” (A.I.) and ”machine learning” (M.L.) not just in academia but also throughout industry and popular culture. This is driven by the notable advance- ments across a number of fields, achieved by a modern form of neural networks known as ”deep learning” (Krizhevsky et al.). While, as many have pointed out, these systems only bare a passing resemblance in practice to actual neurons, the basics of their architecture and function are unquestionably inspired by the study of biological nervous systems 2 However, one of the areas of computer science in which the new techniques related to deep reinforcement learning have not made the significant breakthroughs some have expected is robotics. Returning to the movement-first framing we discussed in the introduction, it should not be so surprising that a field with roots in code breakingandgametheoryhasfoundmoresuccessinplayingChess, GoorStarcraft than walking or simply grasping an object (Levine et al., 2017). As we have argued elsewhere here, the underlying reason here is important; games like chess or computer strategy games have very explicitly defined objectives, discrete states that are often fully visible, and a closed off world with little to no dynamics. 2 The consistency whic h ideas from neuroscience inspire groundbreaking approac hes in com- puter science (p erceptrons, con v olutional nets, reinforcemen t learning, etc.) is arguably reason enough for an y tec hnologist to tak e the study of nerv ous systems seriously . Appendices 113 Current approaches to robotic control are very sensitive to changes in the shape of cost functions or the dynamics of the physical plant. This is especially problematic in the context of learning (or continuously updating) movement policies or control for the purpose of interactive perception. In both of these cases, uncertainty plays an important role, in both the processes of sensorimotor control, and also in the specifics of evaluating costs and rewards. This often means cost functions must be hand designed and tuned to the demand of specific tasks and hardware. Of course, we are not the first to point out these structural problems with the expectations of such rule and optimization based autonomous systems. The stark disparity between the success of decision making systems tackling high-level prob- lems like Chess or logical reasoning, and the struggles with low-level perception and control problems were most famously articulated by Hans Moravec. Accord- ing the Moravec and his contemporaries, the most important underlying discovery in A.I. and robotics research was that the ”hard problems were easy, and the easy problems were hard”. Skills assumed to be the peak of human intelligence, like ab- stract reasoning or mathematics were the easiest to replicate (and improve upon) in artificial systems, while seemingly basic skills like retrieving a specific object were impossibly difficult. Studying how these biological decision systems are able to handle variability and uncertainty in their environments and their movements, in the face of ambiguous costs and constraints is critical to bringing these properties to robotic systems. But, it is more than just robustness in dynamic environments or the appearance of natural movement that is necessary for progress in robotics. In workplaces around the world, robots and autonomous systems are working in closer and more direct coordination with humans, a trend that is widely agreed to continue for the foreseeable future. For these robots to be safe and effective collaborators, they must be able to anticipate and interpret the motions and intentions signaled by movements of the people in their environment. Bibliography 114 Bibliograph y Alexander GE, Crutcher MD (1990) Neural representations of the target (goal) of visually guided arm movements in three motor areas of the monkey. Journal of neurophysiology 64:164–178. Alexander RM (1997) A minimum energy cost hypothesis for human arm trajectories. Biological Cybernetics 76:97–105. Allais M (1953) Le comportement de l’homme rationnel devant le risque: critique des postulats et axiomes de l’école américaine. Econometrica: Journal of the Econometric Society pp. 503–546. Amari S (1977) Dynamics of pattern formation in lateral-inhibition type neural fields. Biological Cybernetics 27:77–87. Amirikian B, Georgopoulos AP, Georgopulos AP (2000) Directional tuning profiles of motor cortical cells. Neuroscience research 36:73–79. Andersen RA, Musallam S, Pesaran B (2004) Selecting the signals for a brain–machine interface. Current Opinion in Neurobiology 14:720–726. Anderson FC, Pandy MG (2001) Dynamic optimization of human walking. Journal of Biomechanical Engineering 123:381–390. Aron AR, Durston S, Eagle DM, Logan GD, Stinear CM, Stuphorn V (2007) Converging Evidence for a Fronto-Basal-Ganglia Network for Inhibitory Control of Action and Cognition. The Journal of Neuroscience 27:11860–11864. Aron AR (2016) The Neural Basis of Inhibition in Cognitive Control. The Neuroscientist 13:214–228. Baayen RH, Milin P (2010) Analyzing reaction times. International Journal of Psychological Research 3:12–28. Bibliography 115 Balakrishnan J, Ratcliff R (1996) Testing models of decision making using confidence ratings in classification. J. Exp. Psychol. Hum. Percept. Perform. 22:615–633. Barnard GA (1946) Sequential Tests in Industrial Statistics . Supplement to the Journal of the Royal Statistical Society 8:1–26. Barton KS, Chapman CS, Wolpert DM, Gallivan JP, Flanagan JR (2015) Action plan co-optimization reveals the parallel encoding of competing reach movements. Nature Communications 6:1–9. Basso MA, Wurtz RH (1998) Modulation of neuronal activity in superior colliculus by changes in target probability. The Journal of Neuroscience 18:7519–7534. Bechara A, Damasio H, Damasio AR (2000) Emotion, decision making and the orbitofrontal cortex. Cerebral Cortex 10:295–307. Beck JM, Ma WJ, Kiani R, Hanks T, Churchland AK, Roitman J, Shadlen MN, Latham PE, Pouget A (2008) Probabilistic Population Codes for Bayesian Decision Making. Neuron 60:1142–1152. Berger JO (2017) Sequential Analysis In The New Palgrave Dictionary of Economics, pp. 1–3. Palgrave Macmillan UK, London. Berniker M, O’Brien MK, Körding KP, Ahmed AA (2013) An examination of the generalizability of motor costs. PLoS ONE 8:e53759. Bernoulli D (1954) Exposition of a New Theory on the Measurement of Risk. Bernstein N (1967) The Coordination and Regulation of Movements. Pergamon Press, Oxford . Bode HR, Heimfeld S, Koizumi O, Littlefield CL, Yaross MS (1988) Maintenance and regeneration of the nerve net in hydra. American Zoologist 28:1053–1063. Bonnelle V, Manohar S, Behrens T, Husain M (2016) Individual Differences in Premotor Brain Systems Underlie Behavioral Apathy. Cerebral Cortex 26:807–819. Botvinick M, Nystrom LE, Fissell K, Carter CS, Cohen JD (1999) Conflict monitoring versus selection-for-action in anterior cingulate cortex. Nature 402:179–181. Bibliography 116 Boudrias MH, Belhaj-Saïf A, Park MC, Cheney PD (2005) Contrasting Properties of Motor Output from the Supplementary Motor Area and Primary Motor Cortex in Rhesus Macaques. Cerebral Cortex 16:632–638. Brick N, MacIntyre T, Sport MCIRo, 2014 (2013) Attentional focus in endurance activity: new paradigms and future directions. Journal of the American Statistical Association 7:106–134. Budd GE (2008) The earliest fossil record of the animals and its significance. Philosophical Transactions of the Royal Society B: Biological Sciences 363:1425–1434. Bullock TH (1959) Neuron Doctrine and Electrophysiology . Science 129:997–1002. Burk D, Ingram JN, Franklin DW, Shadlen MN, Wolpert DM (2014) Motor Effort Alters Changes of Mind in Sensorimotor Decision Making. PLoS ONE 9:e92681–10. Burke CJ, Brünger C, Kahnt T, Park SQ, Tobler PN (2013) Neural integration of risk and effort costs by the frontal pole: only upon request. Journal of Neuroscience 33:1706–13a. Camille N, Tsuchida A, Fellows LK (2011) Double Dissociation of Stimulus-Value and Action-Value Learning in Humans with Orbitofrontal or Anterior Cingulate Cortex Damage. Journal of Neuroscience 31:15048–15052. Chapman CS, Gallivan JP, Wong JD, Wispinski NJ, Enns JT (2015) The snooze of lose: Rapid reaching reveals that losses are processed more slowly than gains. Journal of Experimental Psychology: General 144:844–863. Chapman CS, Gallivan JP, Wood DK, Milne JL, Culham JC, Goodale MA (2010) Reaching for the unknown: Multiple target encoding and real-time decision-making in a rapid reach task. Cognition 116:168–176. Choi JES, Vaswani PA, Shadmehr R (2014) Vigor of Movements and the Cost of Time in Decision Making. Journal of Neuroscience 34:1212–1223. Christopoulos VN, Bonaiuto J, Kagan I, Andersen RA (2015) Inactivation of Parietal Reach Region Affects Reaching But Not Saccade Choices in Internally Guided Decisions. Journal of Neuroscience 35:11719–11728. Bibliography 117 Christopoulos V, Bonaiuto J, Andersen RA (2015) A Biologically Plausible Computational Theory for Value Integration and Action Selection in Decisions with Competing Alternatives. PLoS Computational Biology 11:e1004104. Christopoulos V, Enachescu V, Schrater P, Schaal S (2017) What if you are not certain? A common computation underlying action selection, reaction time and confidence judgment. bioarXiv . Christopoulos V, Schrater PR (2015) Dynamic Integration of Value Information into a Common Probability Currency as a Theory for Flexible Decision Making. PLoS Computational Biology 11:e1004402. Churchland MM, Shenoy KV (2007) Delay of movement caused by disruption of cortical preparatory activity. Journal of neurophysiology 97:348–359. Cisek P (2006) Integrated Neural Processes for Defining Potential Actions and Deciding between Them: A Computational Model. The Journal of Neuroscience 38:9761–9770. Cisek P (2007) Cortical Mechanisms of Action Selection: The Affordance Competition Hypothesis. Philosophical Transactions: Biological Sciences 362:1585–1599. Cisek P (2012) Making decisions through a distributed consensus. Current Opinion in Neurobiology 22:927–936. Cisek P (2019) Resynthesizing behavior through phylogenetic refinement. Attention, Perception, & Psychophysics 26:1–23. Cisek P, Kalaska JF (2005) Neural Correlates of Reaching Decisions in Dorsal Premotor Cortex: Specification of Multiple Direction Choices and Final Selection of Action. Neuron 45:801–814. Cisek P, Kalaska JF (2010) Neural Mechanisms for Interacting with a World Full of Action Choices. Annual Review of Neuroscience 33:269–298. Cléry-Melin ML, Schmidt L, Lafargue G, Baup N, Fossati P, Pessiglione M (2011) Why Don’t You Try Harder? An Investigation of Effort Production in Major Depression. PLoS ONE 6:e23178. Cole KJ, Abbs JH (1987) Kinematic and electromyographic responses to perturbation of a rapid grasp. Journal of neurophysiology 57:1498–1510. Bibliography 118 Coombes S, Graben Pb, Potthast R, Wright J (2014) Neural Fields Theory and Applications. Springer. Cos I, Duque J, Cisek P (2014) Rapid prediction of biomechanical costs during action decisions. Journal of neurophysiology 112:1256–1266. Cummings JL (1993) Frontal-Subcortical Circuits and Human Behavior. Archives of Neurology 50:873–880. Darwin C (1859) On the Origin of Species OUP Oxford. De Martino B, Fleming SM, Garrett N, Dolan RJ (2012) Confidence in value-based choice. Nature Neuroscience 16:105–110. Deacon TW (1990) Rethinking mammalian brain evolution. American Zoologist 705:629–705. Devinsky O, Morrell MJ, Vogt BA (1995) Contributions of anterior cingulate cortex to behaviour. Brain 118 ( Pt 1):279–306. Diedrichsen J, Shadmehr R, Ivry RB (2010) The coordination of movement: optimal feedback control and beyond. Trends in Cognitive Sciences 14:31–39. Ditterich J (2010) A Comparison between Mechanisms of Multi-Alternative Perceptual Decision Making: Ability to Explain Human Behavior, Predictions for Neurophysiology, and Relationship with Decision Theory. Frontiers in neuroscience 4:1–24. Donkin C, Brown S, Heathcote A, Wagenmakers E (2011) Diffusion versus linear ballistic accumulation: different models but the same conclusions about psychological processes ? Psychon Bull Rev. 18:61–69. Dorris MC, Glimcher PW (2004) Activity in Posterior Parietal Cortex Is Correlated with the Relative Subjective Desirability of Action. Neuron 44:365–378. Dotan D, Meyniel F, Dehaene S (2018) On-line confidence monitoring during decision making. Cognition 171:112–121. Duque J, Labruna L, Verset S, Olivier E, Ivry RB (2012) Dissociating the Role of Prefrontal and Premotor Cortices in Controlling Inhibitory Mechanisms during Motor Preparation. The Journal of Neuroscience 32:806–816. Bibliography 119 Duque J, Olivier E, Rushworth M (2013) Top–Down Inhibitory Control Exerted by the Medial Frontal Cortex during Action Selection under Conflict. Journal of Cognitive Neuroscience 25:1634–1648. Eden UT, Kramer MA (2010) Drawing inferences from Fano factor calculations. Journal of neuroscience methods 190:149–152. Edwards W (1954) The theory of decision making. Psychological bulletin 51:380–417. Ellsberg D (1961) Risk, Ambiguity, and the Savage Axioms . The Quarterly Journal of Economics 75:643–669. Erlhagen W, Schöner G (2002) Dynamic field theory of movement preparation. Psychol Rev. 109:545–572. Erwin DH, Laflamme M, Tweedt SM, Sperling EA, Pisani D, Peterson KJ (2011) The Cambrian conundrum: early divergence and later ecological success in the early history of animals. Science 334:1091–1097. Ewart TAC, Ross BH (1980) On appropriate procedures for combining probability distributions within the same family. Journal of Mathematical Psychology pp. 136–152. Ferrell W (1995) A model for realism of confidence judgments: implications for underconfidence in sensory discrimination. Percept. Psychophys. 57:246–254. Fetsch CR, Kiani R, Newsome WT, Shadlen MN (2014) Effects of Cortical Microstimulation on Confidence in a Perceptual Decision. Neuron 83:797–804. Fitts PM (1954) The information capacity of the human motor system in controlling the amplitude of movement. Journal of Experimental Psychology 47. Flash T, Hochner B (2005) Motor primitives in vertebrates and invertebrates. Current Opinion in Neurobiology 15:660–666. Fodor J (1983) Modularity of Mind: An Essay on Faculty Psychology. MIT Press, Cambridge, MA. Foote AL, Crystal JD (2007) Metacognition in the Rat. Current Biology 17:551–555. Bibliography 120 Friedman M (1953) Essays in Positive Economics University of Chicago Press. Gallagher M, McMahan RW, Schoenbaum G (1999) Orbitofrontal cortex and representation of incentive value in associative learning. The Journal of Neuroscience 19:6610–6614. Gallivan JP, Chapman CS, Wood DK, Milne JL, Ansari D, Culham JC, Goodale MA (2011) One to Four, and Nothing More: Nonconscious Parallel Individuation of Objects During Action Planning. Psychological Science 22:803–811. Gallivan JP, Chapman CS (2014) Three-dimensional reach trajectories as a probe of real-time decision-making between multiple competing targets. Frontiers in neuroscience 8:215. Gallivan J, Barton K, Chapman C, Wolpert D, Flanagan J (2015) Action plan co-optimization reveals the parallel encoding of competing reach movements. Nat Commun. 6. Gallivan J, Chapman C (2014) Three-dimensional reach trajectories as a probe of real-time decision-making between multiple competing targets. Front Neurosci. 8. Gallivan J, Logan L, Wolpert D, Flanagan J (2016) Parallel specification of competing sensorimotor control policies for alternative action options. Nat Neurosci. 19:320–326. Gibson JJ (2014) The Ecological Approach to Visual Perception Classic Edition. Psychology Press. Glimcher PW, Dorris MC, Bayer HM (2005a) Physiological utility theory and the neuroeconomics of choice. Games and Economic Behavior 52:213–256. Glimcher P, Dorris M, Bayer H (2005b) Physiological utility theory and the neuroeconomics of choice. Games Econ Behav. 52:213–256. Gold J, Shadlen M (2002) Banburismus and the brain: decoding the relationship between sensory stimuli, decisions, and reward. Neuron 36:299–308. Gold JI, Shadlen MN (2001) Neural computations that underlie decisions about sensory stimuli. Trends in Cognitive Sciences 5:10–16. Bibliography 121 Gomez P, Ratcliff R, Perea M (2007) A model of the go/no-go task. J Exp Psychol Gen. 136:389–413. Goodale MA, Milner AD (1992) Separate visual pathways for perception and action. Trends in Neurosciences 15:20–25. Goodwin G, Seron M, de Dona J (2005) Constrained control and estimation: an optimisation approach Springer, London, UK. Graziano MSA, Webb TW (2017) From Sponge to Human: The Evolution of Consciousness In Evolution of Nervous Systems, pp. 547–554. Elsevier. Graziano MSA (2016) Ethological Action Maps: A Paradigm Shift for the Motor Cortex. Trends in Cognitive Sciences 20:121–132. Green D, Swets J (1966) Signal detection theory and psychophysics John Wiley and Sons Inc. Gribble PL, Mullin LI, Cothros N, Mattar A (2003) Role of Cocontraction in Arm Movement Accuracy. Journal of neurophysiology 89:2396–2405. Haith AM, Huberdeau DM, Krakauer JW (2015a) Hedging Your Bets: Intermediate Movements as Optimal Behavior in the Context of an Incomplete Decision. PLoS Computational Biology 11:e1004171–21. Haith A, Huberdeau D, Krakauer J (2015b) Hedging your bets: intermediate movements as optimal behavior in the context of an incomplete decision. PLoS Comput Biol. 11:e1004171. Haith A, Pakpoor J, Krakauer J (2016) Independence of movement preparation and movement initiation. J Neurosci. 36:3007–3015. Hampton RR (2001) Rhesus monkeys know when they remember. Proceedings of the National Academy of Sciences of the United States of America 98:5359–5362. Hanks T, Mazurek M, Kiani R, Hopp E, Shadlen M (2011) Elapsed decision time affects the weighting of prior probability in a perceptual decision task. J Neurosci. 31:6339–6352. Harris CM, Wolpert DM (1998) Signal-dependent noise determines motor planning. Nature 394:780–784. Bibliography 122 Hebart M, Schriever Y, Donner T, Haynes J (2016) The relationship between perceptual decision variables and confidence in the human brain. Cereb Cortex 26:118–130. Hikosaka O, Bromberg-Martin E, Hong S, Matsumoto M (2008) New insights on the subcortical representation of reward. Current Opinion in Neurobiology 18:203–208. Hogan N (2003) An organizing principle for a class of voluntary movements. The Journal of Neuroscience 4:2745–2754. Hogan N, Krebs HI, Rohrer B, Palazzolo JJ, Dipietro L, Fasoli SE, Stein J, Hughs R, Frontera WR, Lynch D, Volpe BT (2006) Motions or muscles? Some behavioral factors underlying robotic assistance of motor recovery. The Journal of Rehabilitation Research and Development 43:605–14. Holland LZ, Holland ND (1999) Chordate origins of the vertebrate central nervous system. Current Opinion in Neurobiology 9:596–602. Hollon NG, Arnold MM, Gan JO, Walton ME, Phillips PEM (2014) Dopamine-associated cached values are not sufficient as the basis for action selection. Proceedings of the National Academy of Sciences 111:18357–18362. Holroyd CB, Yeung N (2012) Motivation of extended behaviors by anterior cingulate cortex. Trends in Cognitive Sciences 16:122–128. Hosking JG, Cocker PJ, Winstanley CA (2014) Dissociable contributions of anterior cingulate cortex and basolateral amygdala on a rodent cost/benefit decision-making task of cognitive effort. Neuropsychopharmacology 39:1558–1567. Huang HJ, Kram R, Ahmed AA (2012) Reduction of Metabolic Cost during Motor Learning of Arm Reaching Dynamics. Journal of Neuroscience 32:2182–2190. Huber F, Köcher S, Vogel J, Meyer F (2012) Dazing Diversity: Investigating the Determinants and Consequences of Decision Paralysis. Psychology and Marketing 29:467–478. Hudson TE, Maloney LT, Landy MS (2007) Movement planning with probabilistic target information. Journal of neurophysiology 98:3034–3046. Bibliography 123 Izawa J, Rane T, Donchin O, Shadmehr R (2008) Motor Adaptation as a Process of Reoptimization. Journal of Neuroscience 28:2883–2891. Izhikevich EM (2004) Which Model to Use for Cortical Spiking Neurons? IEEE Transactions on Neural Networks 15:1063–1070. Izhikevich EM, Edelman GM (2008) Large-scale model of mammalian thalamocortical systems. Proceedings of the National Academy of Sciences 105:3593–3598. Janssen P, Shadlen MN (2005) A representation of the hazard rate of elapsed time in macaque area LIP. Nature Neuroscience 8:234–241. Johnson JS, Spencer JP, Luck SJ, Schöner G (2009) A dynamic neural field model of visual working memory and change detection. Psychological Science 20:568–577. Kaas JH (2006) Evolution of Nervous Systems: A Comprehensive Reference; Four-Volume Set Academic Press. Kable JW, Glimcher PW (2009) The neurobiology of decision: consensus and controversy. Neuron 63:733–745. Kahneman D, Tversky A (1979) Prospect theory: An analysis of decision under risk. Econometrica: Journal of the Econometric Society . Kepecs A, Uchida N, Zariwala H, Mainen Z (2008) Neural correlates, computation and behavioural impact of decision confidence. Nature 455:227–231. Kiani R, Corthell L, Shadlen M (2014) Choice certainty is informed by both evidence and decision time. Neuron 84:1329–1342. Kiani R, Shadlen M (2009) Representation of confidence associated with a decision by neurons in the parietal cortex. Science. 324:759–764. Kistemaker DA, Wong JD, Gribble PL (2014) The cost of moving optimally: kinematic path selection. Journal of neurophysiology 112:1815–1824. Kistemaker DA, Wong JD, Gribble PL (2010) The central nervous system does not minimize energy cost in arm movements. Journal of neurophysiology 104:2985–2994. Bibliography 124 Klein-Flugge MC, Kennerley SW, Friston K, Bestmann S (2016) Neural Signatures of Value Comparison in Human Cingulate Cortex during Decisions Requiring an Effort-Reward Trade-off. The Journal of Neuroscience 36:10002–10015. Klein-Flügge MC, Kennerley SW, Saraiva AC, Penny WD, Bestmann S (2015) Behavioral modeling of human choices reveals dissociable effects of physical effort and temporal delay on reward devaluation. PLoS Computational Biology 11:e1004116. Kool W, Botvinick M (2018) Mental labour. Nature Human Behaviour 2:1–10. Krajbich I, Rangel A (2011) Multialternative drift-diffusion model predicts the relationship between visual fixations and choice in value-based decisions. Proc Natl Acad Sci U S A. 108:13852–13857. Krebs HI, Volpe B, Hogan N (2009) A working model of stroke recovery from rehabilitation robotics practitioners. Journal of NeuroEngineering and Rehabilitation 6:605–8. Kringelbach ML (2005) The human orbitofrontal cortex: linking reward to hedonic experience. Nature Reviews Neuroscience 6:691–702. Krizhevsky A, Sutskever I, neural GHAi, 2012 Imagenet classification with deep convolutional neural networks. papers.nips.cc . Kuhn TS (2012) The structure of scientific revolutions University of Chicago press. LaBerge D (1962) A recruitment theory of simple behavior. Psychometrika 27:375––396. Langsrud Ø (2002) 50–50 multivariate analysis of variance for collinear responses. The Statistician pp. 305–317. Lee JY, Schweighofer N (2009) Dual Adaptation Supports a Parallel Architecture of Motor Memory. The Journal of Neuroscience 29:10396–10404. Leite F, Ratcliff R (2010) Modeling reaction time and accuracy of multiple-alternative decisions. Atten Percept Psychophys. 72:246–273. Lepora N, Pezzulo G (2015) Embodied choice: how action influences perceptual decision making. PLoS Comput Biol. 11:e1004110. Bibliography 125 Levine S, Pastor P, Krizhevsky A, Ibarz J, Quillen D (2017) Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. The International Journal of Robotics Research 37:421–436. Liu D, Todorov E (2007) Evidence for the Flexible Sensorimotor Strategies Predicted by Optimal Feedback Control. Journal of Neuroscience 27:9354–9368. Liu S, Gu Y, DeAngelis GC, Angelaki DE (2012) Choice-related activity and correlated noise in subcortical vestibular neurons. Nature Neuroscience 16:89–97. Loeb GE (2012) Optimal isn’t good enough. Biological Cybernetics . Macmillan N, Creelman C (2005) Detection Theory: A User’s Guide Erlbaum. Maier MA, Armand J, Kirkwood PA, Yang HW, Davis JN, Lemon RN (2002) Differences in the corticospinal projection from primary motor cortex and supplementary motor area to macaque upper limb motoneurons: An anatomical and electrophysiological study. Cerebral Cortex 12:281–296. Manohar SG, Chong TTJ, Apps MAJ, Batla A, Stamelou M, Jarman PR, Bhatia KP, Husain M (2015) Reward Pays the Cost of Noise Reduction in Motor and Cognitive Control. Current Biology 25:1707–1716. Marr D, Poggio T (1976) From Understanding Computation to Understanding Neural Circuitry. Massey FJJ (1951) The Kolmogorov-Smirnov test for goodness of fit. Journal of the American Statistical Association 46:68–78. Mayne D, Rawlings J, Rao C, Scokaert P (2000) Constrained model predictive control: Stability and optimality. Automatica 36:789–814. Mazurek M, Roitman J, Ditterich J, Shadlen M (2003) A role for neural integrators in perceptual decision making. Cereb Cortex. 13:1257–1269. Mazzoni P, Hristova A, Krakauer JW (2007) Why Don’t We Move Faster? Parkinson’s Disease, Movement Vigor, and Implicit Motivation. Journal of Neuroscience 27:7105–7116. McPeek RM, Keller EL (2004) Deficits in saccade target selection after inactivation of superior colliculus. Nature Neuroscience 7:757–763. Bibliography 126 Meyniel F, Dehaene S (2017) Brain networks for confidence weighting and hierarchical inference during probabilistic learning. Proc Natl Acad Sci U S A 114:E3859–E3868. Meyniel F, Schlunegger D, Dehaene S (2015) The sense of confidence during probabilistic learning: A normative account. PLoS Computational Biology 11:e1004305. Miller J (1988) A warning about median reaction time. Journal of Experimental Psychology: Human Perception and Performance 14:539. Miller KJ, Botvinick MM, Brody CD (2018) Value Representations in Orbitofrontal Cortex Drive Learning, not Choice pp. 1–25. Missenard O, Mottet D, Perrey S (2008) The role of cocontraction in the impairment of movement accuracy with fatigue. Experimental Brain Research 185:151–156. MN S, WT N (2001) Neural basis of a perceptual decision in the parietal cortex (area lip) of the rhesus monkey. J. Neurophysiol. 86:1916–1936. Mormann MM, Malmaud J, Huth A, Koch C, Rangel A (2017) The drift diffusion model can account for the accuracy and reaction time of value-based choices under high and low time pressure. Judgm. Decis. Mak. 5:437–449. Mountcastle VB (1997) The columnar organization of the neocortex. Brain 120:701–722. Nambu A (2008) Seven problems on the basal ganglia. Current Opinion in Neurobiology 18:595–604. Nelson WL (1983) Physical principles for economies of skilled movements. Biological Cybernetics 46:135–147. Oldfield RC (1971) The assessment and analysis of handedness: the Edinburgh inventory. Neuropsychologia 9:97–113. Oostwoud Wijdenes L, Ivry RB, Bays PM (2016) Competition between movement plans increases motor variability: evidence of a shared resource for movement planning. Journal of neurophysiology 116:1295–1303. Ore O (1960) Pascal and the Invention of Probability Theory. The American Mathematical Monthly pp. 409–419. Bibliography 127 Padoa-Schioppa C (2007) Orbitofrontal Cortex and the Computation of Economic Value. Annals of the New York Academy of Sciences 1121:232–253. Padoa-Schioppa C (2009) Range-Adapting Representation of Economic Value in the Orbitofrontal Cortex. The Journal of Neuroscience 29:14004–14014. Padoa-Schioppa C (2011) Neurobiology of Economic Choice: A Good-Based Model. Annual Review of Neuroscience 34:333–359. Padoa-Schioppa C, Assad JA (2006) Neurons in the orbitofrontal cortex encode economic value. Nature 441:223–226. Padoa-Schioppa C, Assad JA (2007) The representation of economic value in the orbitofrontal cortex is invariant for changes of menu. Nature Neuroscience 11:95–102. Pardo JV, Pardo PJ, Janer KW, Raichle ME (1990) The anterior cingulate cortex mediates processing selection in the Stroop attentional conflict paradigm. Proceedings of the National Academy of Sciences of the United States of America 87:256–259. Peters J, Buchel C (2010) Neural representations of subjective reward value. Behavioural Brain Research 213:135–141. Philiastides M, Ratcliff R (2013) Influence of branding on preference-based decision making. Psychol Sci. 24:1208–1215. Platt ML, Glimcher PW (1999) Neural correlates of decision variables in parietal cortex. Nature 400:233–238. Platt ML, Glimcher PW (1997) Responses of Intraparietal Neurons to Saccadic Targets and Visual Distractors. Journal of neurophysiology 78:1574–1589. Pleskac T, Busemeyer J (2007) A dynamic, stochastic theory of confidence, choice, and response time In DS M, JG T, editors, 29th Annual Cognitive Science Society, pp. 563–568. Cognitive Science Society. Pleskac T, Busemeyer J (2010) Two-stage dynamic signal detection: a theory of choice, decision time, and confidence. Psychol Rev. 117:864–901. Pouget A, Drugowitsch J, Kepecs A (2016) Confidence and certainty: distinct probabilistic quantities for different goals. Nat Neurosci. 19:366–374. Bibliography 128 Pylyshyn Z (1984) Computation and Cognition: Toward a Foundation for Cognitive Science The MIT Press, Cambridge, MA. Ramsay JO, Silverman BW (2007) Applied Functional Data Analysis Methods and Case Studies. Springer. Rangel A, Camerer C, Montague PR (2008) A framework for studying the neurobiology of value-based decision making. Nature Reviews Neuroscience 9:545–556. Rangel A, Hare T (2010) Neural computations associated with goal-directed choice. Current Opinion in Neurobiology 20:262–270. Ratcliff R (1978) A Theory of Memory Retrieval. Psychological review 85:1–50. Ratcliff R, Rouder J (1998) Modeling response times for two-choice decisions. Psychol. Sci. 9:347–378. Ratcliff R, Starns J (2009) Modeling confidence and response time in recognition memory. Psychol. Rev. 116:59–83. Ratcliff R, Thapar A, McKoon G (2004) A diffusion model analysis of the effects of aging on recognition memory. J. Mem. Lang. 50:408–424. Ratcliff R, Cherian A, Segraves M (2003) A Comparison of Macaque Behavior and Superior Colliculus Neuronal Activity to Predictions From Models of Two-Choice Decisions. Journal of neurophysiology 90:1392–1407. Ratcliff R, McKoon G (2008) The diffusion decision model: theory and data for two-choice decision tasks. Neural Computation 20:873–922. Ratcliff R, Smith PL, Brown SD, McKoon G (2016) Diffusion Decision Model: Current Issues and History. Trends in Cognitive Sciences 20:260–281. Resulaj A, Kiani R, Wolpert D, Shadlen M (2009) Changes of mind in decision-making. Nature 461:263–266. Reutskaja E, Lindner A, Nagel R, Andersen RA, Camerer CF (2018) Choice overload reduces neural signatures of choice set value in dorsal striatum and anterior cingulate cortex. Nature Human Behaviour 79:1–15. Bibliography 129 Ridderinkhof KR, van den Wildenberg WPM, Segalowitz SJ, Carter CS (2004) Neurocognitive mechanisms of cognitive control: The role of prefrontal cortex in action selection, response inhibition, performance monitoring, and reward-based learning. Brain and Cognition 56:129–140. Robertson EM, Miall RC (1997) Multi-joint limbs permit a flexible response to unpredictable events. Experimental Brain Research 117:148–152. Roe RM, Busemeyer JR, Townsend JT (2001) Multialternative decision field theory: A dynamic connectionist model of decision making. Psychological review 108:370–392. Rosenbaum DA, Gong L, Potts CA (2014) Pre-crastination: hastening subgoal completion at the expense of extra physical effort. Psychological Science 25:1487–1496. Rosner B (1983) Percentage Points for a Generalized ESD Many-Outlier Procedure . Technometrics 25:165–172. Rousselet GA, Wilcox RR (2019) Reaction times and other skewed distributions: problems with the mean and the median 3:12–43. Rudebeck PH, Walton ME, Smyth AN, Bannerman DM, Rushworth MFS (2006) Separate neural pathways process different decision costs. Nature Neuroscience 9:1161–1168. Sailer U, Eggert T, Ditterich J, Straube A (2002) Global effect of a nearby distractor on targeting eye and hand movements. Journal of Experimental Psychology: Human Perception and Performance 28:1432–1446. Samuelson PA (1937) A Note on Measurement of Utility. The Review of Economic Studies 4:155–161. Sandamirskaya Y, Conradt J (2013) Learning Sensorimotor Transformations with Dynamic Neural Fields pp. 1–8. Schaal S (2007) The SL simulation and real-time control software package University of Southern California. Schall JD, Purcell BA, Heitz RP, Logan GD, Palmeri TJ (2011) Neural mechanisms of saccade target selection: gated accumulator model of the visual-motor cascade. European Journal of Neuroscience 33:1991–2002. Bibliography 130 Scholz JP, Schöner G (1999) The uncontrolled manifold concept: identifying control variables for a functional task. Experimental Brain Research 126:289–306. Schöner G (2008) Cambridge Handbook of Computational Cognitive Modeling, chapter Dynamical systems approaches to cognition, pp. 101–126 Cambridge University Press. Schweighofer N, Xiao Y, Kim S, Yoshioka T, Gordon J, Osu R (2015) Effort, success, and nonuse determine arm choice. Journal of Neurophysiology 114:551–559. Selen LPJ, Beek PJ, Van Dieën JH (2007) Fatigue-induced changes of impedance and performance in target tracking. Experimental Brain Research 181:99–108. Sescousse G, Redoute J, Dreher JC (2010) The Architecture of Reward Value Coding in the Human Orbitofrontal Cortex. The Journal of Neuroscience 30:13095–13104. Sherrington SCS (1906) The Integrative Action of the Nervous System. Simon HA (1972) Theories of Bounded Rationality. Smith PL, Ratcliff R (2004) Psychology and neurobiology of simple decisions. Trends in Neurosciences 27:161–168. Smith P (2016) Diffusion theory of decision making in continuous report. Psychol Rev. 123:425–451. Smith R (1992) Inhibition: History and meaning in the sciences of mind and brain Univ of California Press. Sparrow WA (2000) Energetics of human activity Human Kinetics. Stewart B, Gallivan J, Baugh L, Flanagan J (2014) Motor, not visual, encoding of potential reach targets. Curr Biol. 24:953–954. Stone M (1960) Models for choice reaction time. Psychometrika 25:251–260. Stuss DT, Alexander MP, Shallice T, Picton TW, Binns MA, Macdonald R, Borowiec A, Katz DI (2005) Multiple frontal systems controlling response speed. Neuropsychologia 43:396–417. Bibliography 131 Sugrue LP, Corrado GS, Newsome WT (2004) Matching behavior and the representation of value in the parietal cortex. Science 304:1782–1787. Taniai Y, Nishii J (2015) Optimality of Upper-Arm Reaching Trajectories Based on the Expected Value of the Metabolic Energy Cost. Neural Computation 27:1721–1737. Tavares G, Perona P, Rangel A (2017) The attentional drift diffusion model of simple perceptual decision-making. Front Neurosci. 11. Tehovnik E, Lee K (1993) The dorsomedial frontal cortex of the rhesus monkey: topographic representation of saccades evoked by electrical stimulation. Experimental Brain Research 96:1–14. Tipper SP, Howard LA, Houghton G (1998) Action-based mechanisms of attention. Philosophical Transactions of the Royal Society B: Biological Sciences 353:1385–1393. Todorov E (2002) Cosine tuning minimizes motor errors. Neural Computation 14:1233–1260. Todorov E, Jordan MI (2002) Optimal feedback control as a theory of motor coordination. Nature Publishing Group 5:1226–1235. Towal B, Mormann M, Koch C (2013) Simultaneous modeling of visual saliency and value computation improves predictions of economic choice. Proc Natl Acad Sci U S A 110:3858–3867. Trueblood J, Endres MB J, Finn P (2011) Modeling response times in the go/no-go discrimination task In L C, C H, TF S, editors, 33rd annual meeting of the Cognitive Science Society, pp. 1866–1871. Cognitive Science Society. Tversky A, Kahneman D (1981) The framing of decisions and the psychology of choice. Science 211:453–458. Tversky A (1972) Elimination by aspects: A theory of choice. Psychological review 79:281. Uno Y, Kawato M, Suzuki R (1989) Formation and control of optimal trajectory in human multijoint arm movement. Biological Cybernetics 61:89–101. Usher M, McClelland JL (2001) The time course of perceptual choice: the leaky, competing accumulator model. Psychological review 108:550–592. Bibliography 132 Usher M, McClelland JL (2004) Loss Aversion and Inhibition in Dynamical Models of Multialternative Choice. Psychological review 111:757–769. van den Berg R, Anandalingam K, Zylberberg A, Kiani R, Shadlen M, Wolpert D (2016) A common mechanism underlies changes of mind about decisions and confidence. Elife 5. Van Dieën JH, Visser B, Hermans V (2003) The contribution of task-related biomechanical constraints to the development of work-related myalgia. Chronic Work-related Myalgia. Neuromuscular Mechanisms behind Work-related Chronic Muscle Syndromes, Gävle University Press, Gävle pp. 83–89. Vickers D (1979) Decision processes in visual perception Academic Press, New York. Vickers D (2001) Where does the balance of evidence lie with respect to confidence? In E S, R K, T L, editors, 17th annual meeting of the international society for psychophysics, pp. 148–153. Vickers D, Packer J (1982) Effects of alternating set for speed or accuracy on response time, accuracy and confidence in a unidimensional discrimination task. Acta Psychol. 50:179–197. Vickers D, Smith P (1985) Accumulator and random-walk models of psychophysical discrimination: a counter-evaluation. Perception 14:471–497. von Neumann J, Morgenstern O (1980) Theory of Games and Economic Behavior. Wald A (1945) Sequential tests of statistical hypotheses. The Review of Economic Studies 16:117–186. Wang JL, Chiou JM, Mueller HG (2015) Review of Functional Data Analysis. arXiv.org . Wang JX, Kurth-Nelson Z, Kumaran D, Tirumala D, Soyer H, Leibo JZ, Hassabis D, Botvinick M (2018) Prefrontal cortex as a meta-reinforcement learning system. Nature Neuroscience 21:1–14. Wardak C, Olivier E, Duhamel JR (2002) Saccadic target selection deficits after lateral intraparietal area inactivation in monkeys. Journal of Neuroscience 22:9877–9884. Bibliography 133 Welsh TN, Elliott D, Weeks DJ (1999) Hand deviations toward distractors. Evidence for response competition. Experimental Brain Research 127:207–212. Whelan R (2008) Effective analysis of reaction time data. The Psychological Record pp. 475–482. Wilke M, Kagan I, Andersen RA (2013) Effects of Pulvinar Inactivation on Spatial Decision-making between Equal and Asymmetric Reward Options. Journal of Cognitive Neuroscience 25:1270–1283. Wong AL, Haith AM (1) Motor planning flexibly optimizes performance under uncertainty about task goals. Nature Communications 8:1–10. Woodworth RS (1899) The Accuracy of Voluntary Movement.
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Deficits and rehabilitation of upper-extremity multi-joint movements in individuals with chronic stroke
PDF
Computational principles in human motor adaptation: sources, memories, and variability
PDF
The evolution of decision-making quality over the life cycle: evidence from behavioral and neuroeconomic experiments with different age groups
PDF
Model-based approaches to objective inference during steady-state and adaptive locomotor control
PDF
Relationship between brain structure and motor behavior in chronic stroke survivors
PDF
Experimental and computational explorations of different forms of plasticity in motor learning and stroke recovery
PDF
Value-based decision-making in complex choice: brain regions involved and implications of age
PDF
Hemisphere-specific deficits in the control of bimanual movements after stroke
PDF
Brain and behavior correlates of intrinsic motivation and skill learning
PDF
Modeling motor memory to enhance multiple task learning
PDF
The representation, learning, and control of dexterous motor skills in humans and humanoid robots
PDF
A multi-site neuroimaging approach to studying hippocampal damage in chronic stroke
PDF
Understanding associations between whole-body dynamic balance and spatiotemporal asymmetry during healthy and pathological gait
PDF
Laboratory studies in the economics of information
PDF
Tactile object localization: behavioral correlates, neural representations, and a deep learning hybrid model to classify touch
PDF
The motivated affective behavior system: a dynamic account of the attachment behavioral system
PDF
Neuromuscular dynamics in the context of motor redundancy
PDF
Dynamic network model for systemic risk
PDF
Validation of an alternative neural decision tree
PDF
Homeostatic imbalance and monetary delay discounting: effects of feeding on RT, choice, and brain response
Asset Metadata
Creator
Enachescu, Vincent Arie
(author)
Core Title
Reaching decisions in dynamic environments
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Neuroscience
Degree Conferral Date
2021-12
Publication Date
12/21/2021
Defense Date
12/21/2021
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
behavioral economics,Decision making,dynamic neural fields,dynamical systems models,motor control,OAI-PMH Harvest
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Schweighofer, Nicolas (
committee chair
), Brocas, Isabelle (
committee member
), Monterosso, John (
committee member
), Winstein, Carolee (
committee member
)
Creator Email
enachesc@usc.edu,vincent@enachescu.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC18807272
Unique identifier
UC18807272
Legacy Identifier
etd-EnachescuV-10320
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Enachescu, Vincent Arie
Type
texts
Source
20211223-wayne-usctheses-batch-906-nissen
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
behavioral economics
dynamic neural fields
dynamical systems models
motor control