Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Bayesian hierarchical and joint modeling of the reversal learning task
(USC Thesis Other)
Bayesian hierarchical and joint modeling of the reversal learning task
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
UNIVERSITY OF SOUTHERN CALIFORNIA FACULTY OF THE USC GRADUATE SCHOOL DOCTOR OF PHILOSOPHY (PSYCHOLOGY) Bayesian hierarchical and joint modeling of the reversal learning task Author: Benjamin Smith Advisor: Dr. Stephen J. READ May 2019 ii Abstract In this thesis, I used recently-developed Bayesian joint modeling methods to estimate learn- ing in a reversal learning dataset, in order to explore the relationship between neural computation and brain activity. The aim was to both test Differential Evolution MCMC and joint modeling in this context, and to explore the properties of the reversal learning task and differences in different subject groups. I used linear modeling, Bayesian joint modeling, and hierarchical Bayesian modeling using both NUTS and Differential Evolution MCMC (Brandon M. Turner, Forstmann, Wagenmakers, et al., 2013; Ter Braak, 2006), in order to complete these tasks. Using linear modeling, I confirmed that a neural pain signature previously developed on a separate dataset (Wager et al., 2013) correlated with an incorrect response in the punishment condition of the task. I implemented a three-level behavioral model of reversal learning and found evidence that learning rates for meth-using subjects are likely lower than for non-meth-using subjects. In a hybrid reinforcement learning linear ballistic accumulator model, two-level DE-MCMC Bayesian hierarchical modeling of learning rates for each group suggested that learning rates for meth users during the task are likely to be below learning rates for non-meth users. A Bayesian joint model suggested reward prediction error was related to activity in the putamen and nucleus accumbens. Having done these analyses, I explored steps for further directions. iii iv ABSTRACT Acknowledgements Nate Haines for his help building the reinforcement learning model; Brandon Turner for his assistance in implementing his DE-MCMC hierarchical joint model; Feng Xue for his work preprocessing data used in this thesis and the rest of SAND Lab for their participation in the project; Jonas Kaplan for his assistance in working on processing of the brain signal; most importantly, my advisor, Stephen J. Read, for all his supervision, suggestions, advice over the last few years. v vi ACKNOWLEDGEMENTS Contents Abstract iii Acknowledgements v 1 Introduction 1 1.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Why use DE-MCMC Bayesian joint modeling? . . . . . . . . . . . . . . . . . . . 3 1.2.1 Why use joint modeling? . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Joint modeling across neural and behavioral data . . . . . . . . . . . . . . 5 Implementing reversal learning in DE-MCMC . . . . . . . . . . . . . . . 6 1.2.2 Why use DE-MCMC over other forms of MCMC? . . . . . . . . . . . . . 6 1.3 Reversal learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3.1 Wider study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3.2 Basic design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3.3 Behavioral models of reversal learning . . . . . . . . . . . . . . . . . . . . 9 Existing review articles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.4 Reversal learning dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.4.1 Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.4.2 Runs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.4.3 Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.4.4 Basic data structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.5 Neuroanatomy of the decision-making system . . . . . . . . . . . . . . . . . . . . 18 1.5.1 Reversal learning task . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 vii viii CONTENTS 1.6 Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.7 Summary of complete models I have tested . . . . . . . . . . . . . . . . . . . . . 21 2 Data preparation 23 2.1 Extracting Pain Signature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.2 Choosing measures of neural activity . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.2.1 Extracting data from images . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.3 Extracting ROIs from Freesurfer . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.4 Neural data cleaning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.5 Behavioral data cleaning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.5.1 Removing runs with evidence of no learning . . . . . . . . . . . . . . . . 30 2.5.2 Short reaction times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3 Model design 35 3.1 Overview of the double-update reversal learning model . . . . . . . . . . . . . . . 35 3.1.1 Reinforcement learning model . . . . . . . . . . . . . . . . . . . . . . . . 36 Softmax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Step by step notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.1.2 Incorporating reaction time . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Reaction time expectations . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Exclusion of abnormally short reaction times . . . . . . . . . . . . . . . . 39 Racing diffusion implementation . . . . . . . . . . . . . . . . . . . . . . . 39 3.1.3 Mapping the expected value to the drift rate . . . . . . . . . . . . . . . . . 41 3.2 Extending to a hierarchical model . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.2.1 Configuring priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.2.2 Conjugate priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.2.3 Non-centered parameters when using stan . . . . . . . . . . . . . . . . . . 48 3.3 Reversal learning model extensions . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.3.1 Discriminability model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.3.2 OpAL model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.4 Neural model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 CONTENTS ix 3.5 Joint modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4 Linear modeling 57 4.1 Linear model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.1.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.1.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.2 Pain signature correlates with behavioral performance . . . . . . . . . . . . . . . . 61 Neural pain signature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 NPS in this task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.2.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.3 General Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5 Simple reinforcement learning model 69 5.1 Comparing Variational Bayes and MCMC . . . . . . . . . . . . . . . . . . . . . . 70 5.1.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.1.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 MCMC and Variational Bayes Reliability . . . . . . . . . . . . . . . . . . 71 MCMC Representativeness, accuracy, and efficiency . . . . . . . . . . . . 74 5.1.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.2 Three level model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.2.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 Reaction time model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Group differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 x CONTENTS 5.2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 5.3 Neural-behavioral joint model attempt . . . . . . . . . . . . . . . . . . . . . . . . 89 5.3.1 Joint modeling Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 5.3.3 Joint Model Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.4 General Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 6 Differential Evolution MCMC Bayesian model 93 6.1 Model design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 6.1.1 DE-MCMC algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 Gamma tuning parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 Migration step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Key functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 6.1.2 RL-Drift Diffusion model design . . . . . . . . . . . . . . . . . . . . . . . 96 Starting values for hyper-parameters . . . . . . . . . . . . . . . . . . . . . 96 6.2 Two-level model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 6.2.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Initial values optimization and implementation . . . . . . . . . . . . . . . 97 Running . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 6.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Comparing groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 6.2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 6.3 Three level model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 6.3.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Optimization and initialization: Initial values . . . . . . . . . . . . . . . . 112 Gamma tuning parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 Migration step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 6.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 CONTENTS xi Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 6.4 General Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 7 Linear ballistic accumulator Bayesian model 135 7.1 RL LBA Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 7.2 Joint model implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 7.3 Pre-testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 7.3.1 Experimenting with initialization . . . . . . . . . . . . . . . . . . . . . . 142 7.3.2 Prior specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 Calculating empirical priors . . . . . . . . . . . . . . . . . . . . . . . . . 145 Comparing empirical priors to weakly-determined priors . . . . . . . . . . 146 Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Representativeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 Testing the joint model design . . . . . . . . . . . . . . . . . . . . . . . . 149 7.4 Single-level joint model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 7.4.1 Joint model design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 7.4.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 Initial values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 7.4.3 Results: 25-region analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 Neural-behavioral Correlations . . . . . . . . . . . . . . . . . . . . . . . . 156 7.4.4 Results: 4-region analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Diagnostic statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Neural-behavioral Correlations . . . . . . . . . . . . . . . . . . . . . . . . 157 7.4.5 Model Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 7.5 General Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 8 Discussion 167 8.1 Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 8.1.1 Group differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 xii CONTENTS 8.1.2 Neural-behavioral results . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 8.2 Further work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 8.2.1 Alternative Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 Modeling recognition: the discriminability model . . . . . . . . . . . . . . 170 Opponent-Actor models . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 Separate parameterization of pre- and post- reversal . . . . . . . . . . . . . 171 Chance performance mixture model . . . . . . . . . . . . . . . . . . . . . 172 8.2.2 Joint modeling across tasks . . . . . . . . . . . . . . . . . . . . . . . . . . 172 8.2.3 Pain signature data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 8.2.4 Testing Meth use and sex risk as continuous. . . . . . . . . . . . . . . . . 173 8.2.5 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 8.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 Glossary 175 Acronyms 179 List of Models 181 List of Figures 183 List of Tables 187 References 189 Chapter 1 Introduction I wrote this thesis to explore the potential of Bayesian neural-behavioral joint modeling to yield more information about motivational tasks than can be found using other computational modeling methods. The most common approaches to model-based neural science are either almost entirely data-driven, or at the other extreme, seek to model behavioral and computational processes first and then measure the fit of pre-determined parameters to brain data. Joint modeling allows behavioral, computational, and neural parameters to be estimated simultaneously. This could provide more accurate estimation about how the brain processes information related to motivational tasks. In this thesis I will describe my findings examining a deterministic reversal learning task using several different kinds of models. I apply hierarchical linear models, joint models, and Bayesian hierarchical models using two distinct techniques known asDifferentialEvolutionMonte Carlo Markov Chains(s) (DE-MCMC) (Brandon M. Turner, Sederberg, Brown, and Steyvers, 2013; Ter Braak, 2006) and the popular No U-Turn Sampler (NUTS) implemented in stan (Stan Development Team et al., 2016). The deterministic reversal learning task requires subjects to learn an association between an abstract stimulus and response using S-R-O reinforcement learning. After a set number of trials, or after the response has been learned, the R-O contingency ‘flips’ or reverses and the subjects must learn a different response. The task helps us to better understand reinforcement learning, cognitive control, and cognitive flexibility (see Section 1.3 for a more detailed description). This thesis will not only demonstrate utility of joint modeling techniques for the models 1 2 CHAPTER1. INTRODUCTION presented, but also investigate the psychological and neurological properties of reversal learning, and will gather data about differences in learning between Meth and No Meth groups. 1.1 Outline The first chapter describes the dataset, the theory behind psychological models I used to implement these, and the concept of joint modeling of neural and psychological data. The next chapter describes implementation details of the psychological models themselves. The third chapter describes data extraction and preparation. The following chapters each outline the method and results derived from implementing a particular methodology to examine this dataset. Each section reports the method used and the results found. The results varied for each section, but for each of the Bayesian analyses, I divide results into two parts: first I report the diagnostic statistics for the analyses, and then report the primary findings. In the first results chapter, I first describe the use of hierarchical linear models to explore the dataset and gain a basic understanding of its structure before attacking it with more complex methods. I also explore a model of pain in order to verify dataset reliability, and to take the opportunity to explore the dataset’s rare fMRI electric shock paradigm. Then, I describe a three-level hierarchical behavioral analysis of the dataset, written to use the stan NUTS MCMC algorithm. I survey comparisons between variational Bayes and MCMC, and then describe the three-level implementation and issues involved in building that hierarchy. I then describe the findings, in particular, possible group differences in learning parameters. In the following chapter, I describe use of a Differential Evolution Monte Carlo method to implement a two-level model in stan. This was an interesting problem to work on because it implements and adapts a bespoke solution designed by Brandon Turner as its MCMC algorithm rather than applying the off-the-shelf stan package. Once again, in the results section, I describe the reliability of the MCMC method, and then the findings. I survey attempts to create a three-level DE- MCMC model and offer possible explanations for why my three-level DE-MCMC implementation may have failed. 1.2. WHYUSEDE-MCMCBAYESIANJOINTMODELING? 3 The fourth and final results chapter describes the implementation of a Linear ballistic accumulator joint Bayesian model in stan. I outline the methods used to create this model, report on the reliability of the method, and then present the data obtained from the method. This model includes a joint model in which I measured correlations between neural and behavioral data. In the discussion, I explore some of the challenges in extending an MCMC model into three levels. I also interpret the group difference and neural correlation findings. I explore the shortcomings of the psychological models used and propose some future directions for models that might explore a deterministic reversal learning task. 1.2 Why use DE-MCMC Bayesian joint modeling? This thesis was motivated by the desire to demonstrate the utility of introducing a new statistical technique, Bayesian joint modeling, to estimating computational models of motivation. I have made several choices differing from the most common approaches used, and so I want to justify those choices in this section. First, we have decided to jointly model parameters across behavioral and neural measures. The motivation for this was primarily to demonstrate we could get a better estimate of parameters using this method than by fitting to behavioral parameters alone. Second, I used a Bayesian approach over a parametric hierarchical modeling approach, largely because it allows for a better representation of the data. A Bayesian approach allows us to treat both neural data and behavioral parameters as prior information in a principled way, even if our prior knowledge about how they are connected is only comprised of a weak theory about how they could be linked. More basically, when trying to estimate unknown parameters, for models more complex than a simple hierarchical model, estimation can be intractable using other methods, and so a Bayesian approach is the natural one (Gelman & Hill, 2006). Third, I have used DE-MCMC approach, because it allows for faster estimation of highly correlated variables (Brandon M. Turner, Sederberg, et al., 2013); I expand on this in greater detail below in Section 1.2.2. 4 CHAPTER1. INTRODUCTION 1.2.1 Why use joint modeling? 1 We can illustrate the benefit of using joint modeling with an example. Cognitive models of behavior in decision decision-making tasks typically include parameters to encapsulate individual differences in psychological constructs. For instance, the hyperbolic model of temporal discounting (Green, Myerson, & Mcfadden, 1997), V = (1 +kt) 1 calculates the rate of devaluation of a particular reward, given its time delay t. In this model,k represents a particular discounting factor; the greater the value ofk, the more heavily the model discounts as a function of time. An experimenter can calculate the value ofk based on an individual’s behavior in a task. By experimentally, systematically varying delay timet and observing its effect on reward valueV , we can estimate the value ofk. A line of research initiated by McClure, Laibson, Loewenstein, and Cohen (2004) attempts to understand neural correlates of delay discounting and the parameters of models that describe it. Takek from the hyperbolic model described above as an example. We can estimate the best fit fork based on the behavioral data, and then use the delay discounting model to make predictions about an observed activity contrast. For instance, we might use the hyperbolic model with a calculated k, predict the subjective value perceived when a reward is presented, and then search the brain for regions with neural activity that correlates with that prediction. Brain regions correlated withk are presumed to be involved in the processing of subjective value during the task. Brandon M. Turner, Forstmann, Love, Palmeri, and Van Maanen (2016) describe this kind of modeling as a form of behavioral model predicting neural data through either latent input (where behavioral parameters are supposed to directly predict model values) or two-stage neural (where behavioral parameters predict some parameter derived from the neural data itself). This method has the advantage that it is very easy to construct a model and statistically test for significant correlations in neural data without a specific, a priori hypothesis. However, like any model, a model’s prediction often contains considerable error, and thus the prediction made for expected brain activity can only be approximate. Error would be introduced both in the calculation 1 This section was copied and adapted from my dissertation proposal. 1.2. WHYUSEDE-MCMCBAYESIANJOINTMODELING? 5 of the parameterk from observed trials, as well as the prediction of subjective value in each new trial usingk. Thus, any new information relevant to calculating true values ofk and true values of subjective value givenk would improve the overall estimation of the task. If error arises because a purely behavioral-based model lacks constraints that neural data might have, then including neural data as well as behavioral data in model estimation would better constrain the model – this is what we call joint modeling. Joint modeling could come in two forms: simultaneous modeling and hierarchical modeling. In simultaneous modeling (Brandon M. Turner, Forstmann, Love, et al., 2016), parameters for a model are estimated using multiple sources simultaneously. The simplest version of simultaneous modeling is one in which a single model predicts both behavioral activity and neural data. To use our temporal discounting model example, if we expect a particular brain region to be related to an individual’s discount-rate parameterk, then we could estimate thek parameter from our model using activity measured from that region (alongside the behavioral data). However, this can be difficult when we’re unsure how the neural system relates to the behavioral model. In that case, we can estimate parameters for a model of the neural data, which may just be a set of regions of interest that are plausibly related to the task, and then estimate a set of hyper-parameters that estimate the relationship between the two models. Brandon M. Turner, Forstmann, Love, et al. (2016) call this hierarchical modeling, because we estimate parameters for more than one model as well as hyper-parameters that estimate the relationship between those disparate parameters. Joint modeling across neural and behavioral data The example above describes why joint modeling might be useful for combining neural data and behavioral data. Joint modeling has been used successfully to model neuropsychological systems in the past, including Brandon M Turner, Van Maanen, and Forstmann (2015) and van Maanen, Turner, and Forstmann (2015). There are well-known computational models for a reversal learning task, although it is less clear which models might fit a deterministic reversal learning task like the one used in the current study. Some work does connect psychological models of each of these tasks to neural data (e.g., Xue et al., 2013); however, usually predicted neural data is generated from behavior and this has been fit to real neural data. By using joint modeling, we can fit neural data and behavioral data 6 CHAPTER1. INTRODUCTION simultaneously. This should improve the model’s fit to the neural data. Most importantly, from a theoretical perspective, it is appropriate to model the behavioral and neural data together because, as long as we have some prior belief about the brain’s role in processing, observation of neural data tells us something useful about the computation occurring. It therefore is informative in actually constraining the Bayesian estimate of the computation itself. Implementing reversal learning in DE-MCMC We can implement a joint model in reversal learning by examining properties of the reversal learning task and how they relate to brain data. I survey particular models of reversal learning tasks below, but in this work I primarily examine reversal learning tasks in the context of a reinforcement learning model. Any part of a model, including parameters being estimated and values that are not directly estimated in the model, can be linked to brain properties. In this model, we can examine the covariance between expected value, reward prediction error, and various regions of the brain’s decision-making system. Although we don’t examine the links between task-level parameters such as the learning rate here, these could theoretically be linked to subject-level parameter such as brain structure also using a joint modeling technique. 1.2.2 Why use DE-MCMC over other forms of MCMC? There are many possible algorithms to obtain Bayesian posteriors using Monte Carlo MarkovChain(s) (MCMC). Two popular R packages for implementing Bayesian models in MCMC are Jags (‘Just another Gibbs Sampler’) and Stan. As the name suggests, Jags uses Gibbs sampling. By default, Stan uses a NUTS, and also can be implemented using classic Hamiltonian MCMC. (Brandon M. Turner, Sederberg, et al., 2013) argues for use of DE-MCMC over other forms of MCMC. Compared to the popular Metropolis-Hastings algorithm, DE-MCMC allows for much more efficient convergence when model parameters are highly correlated. Gibbs sampling and Hamiltonian Markov chains are other possible options for spaces in which parameters are highly correlated. (Brandon M. Turner, Sederberg, et al., 2013) claims that DE-MCMC can be a more efficient method for sampling from highly correlated parameter spaces. DE-MCMC can draw 1.3. REVERSALLEARNING 7 samples for highly correlated parameters without evaluating the likelihood function multiple times per iteration. Gibbs and related ‘blocked sampling’ methods sample from a whole block of parameters per iteration and are thus less computationally efficient than a method ‘involving a single evaluation of likelihood per iteration’ (Brandon M. Turner, Sederberg, et al., 2013). When running the algorithms using currently available implementations, DE-MCMC’s efficiency advantage may not translate into speed of analysis in practice, because it is implemented in native R, in contrast tostan’s C++ compiled scripts. The DE-MCMC algorithm is described briefly in Section 6.1.1, and in more detail in (Brandon M. Turner, Sederberg, et al., 2013). 1.3 Reversal learning This section describes the reversal task. It then surveys existing models of reversal learning in order to identify which models may be most useful for modeling the reversal learning task we aim to model. 1.3.1 Wider study The data in this thesis was collected in our large study examining three distinct groups of sexually active and non-monogamous young (18-31) men who have sex with men. One group reported consistent safe sex practice over the prior 90 days; one group reported sometimes having unsafe sex, and one group reported having unsafe sex and some use of methamphetamine at any time. Thus, this methamphetamine use group ranged from people who tried methamphetamine years prior to the study, all the way through to people who regularly use methamphetamine. The subjects performed a number of tasks in an fMRI scanner, including the reversal learning task described at length here. Scans were performed over two days. There were two runs of the Reward task and two runs of the Punishment task. The Punishment task involved an electric shock paradigm and due to the time limitations of setting up the necessary hardware on both days, the runs were always grouped by motivation, so that the Punishment runs and the Reward runs each happened on a single day. There was no consistent order, so that some subjects did the Punishment 8 CHAPTER1. INTRODUCTION runs on the first day and the Reward runs on the second day while others did the Reward runs on the first day and the Punishment runs on the second day. We were interested in exploring possible cognitive differences between men who used meth compared to men who did not use meth, and also the cognitive differences between men who were sexually risky (had unprotected anal sex in the previous 90 days) compared to men who were not sexually risky (had protected but not unprotected anal sex in the previous 90 days). In this context, reversal learning was interesting because it is able to tell us about cognitive flexibility (Izquierdo & Jentsch, 2012) and about reinforcement learning capacity in general. Differences in sexual risk-taking are related to differences in personality, including anxiety and negative and positive urgency (Smith et al., 2017). One possible explanation for this is differences in cognitive processing. Methamphetamine use is also believed to disrupt reinforcement learning (Groman, Rich, Smith, Lee, & Taylor, 2018), and is associated with lower inhibitory control (Monterosso, Aron, Cordova, Xu, & London, 2005). For these reasons, we wanted to explore relationships between meth use, sexual risk-taking, and the reversal learning task, and in particular, learning rates within the task, and to see whether neural activity during the task was associated with particular features of computational models of learning. 1.3.2 Basic design In a reversal learning task, subjects are trained to learn a particular response to a stimulus, and then trained to learn a competing response, effectively requiring extinguishing of the first learned response. Many reversal learning tasks are probabilistic. For instance, Remijnse, Nielen, Uylings, and Veltman (2005) required subjects to learn an S-R response to a stimulus; positive reinforcement for a correct response was given with an 80:20 probability. In this design, stimuli are easy to distinguish from their alternatives, but because feedback is only given probabilistically, subjects cannot conclude with certainty that a particular reinforcement indicates a correct response. The Remijnse et al. (2005) task, in common with many reversal learning tasks, exposes subjects to repeated trials using the same stimuli, and each learning task is completed consecutively. All these features characterize the most common form of reversal learning task, pioneered in fMRI by O’Doherty, Kringelbach, Rolls, Hornak, and Andrews (2001) and Cools, Clark, Owen, and 1.3. REVERSALLEARNING 9 Robbins (2002). The task discussed in this paper is a deterministic reinforcement learning task, which has been previously implemented in papers like Ghahremani, Monterosso, Jentsch, Bilder, and Poldrack (2009) and Robinson, Cools, Carlisi, Sahakian, and Drevets (2012). Deterministic reversal learning tasks in human studies are rare (Izquierdo, Brigman, Radke, Rudebeck, & Holmes, 2017); some include Ghahremani et al. (2009), Robinson et al. (2012), and Xue et al. (2013). In a deterministic reinforcement learning task, feedback on the response (either reward or punishment, as the case may be) is given consistently rather than probabilistically. Thus, when a reversal occurs, they may be easier to detect and subjects more able to quickly respond. However, in the variant described in more detail below, stimuli are abstract and hard to distinguish. Additionally, they are interspersed with several (typically 3 other) reversal learning tasks, meaning that subjects must complete three reversal learning trials simultaneously. This introduces additional complexity and challenge into the task. In the deterministic reversal learning task as described in this paper, subjects must recognize and discriminate each visual cue from other visual cues; hold in memory the reinforced response for each cue, and they must do this for a set of around 4 tasks simultaneously. They must also make decisions quickly, in less than one second, in order to receive a reward or to avoid punishment. 1.3.3 Behavioral models of reversal learning Reversal learning is a fairly simple task. Particularly in the form we have implemented it here (see Section 1.4), it may be that complex models do not achieve additional power over simple reinforcement-learning models. Existing models for reversal learning can be roughly divided into three categories: Hid- den Markov Models, reinforcement-learning models, and neural network models. (Hampton & O’Doherty, 2007) describes a Hidden Markov Model (HMM) of reversal learning. However, (Gläscher, Hampton, & O’Doherty, 2009) suggested that HMMs do not exhibit superior perfor- mance to reinforcement learning models. They compared (Hampton & O’Doherty, 2007)’s HMM performance to a fictitious updatemodel, a form of reinforcement learning model. Thus, I did not see the benefit in an HMM over a simpler reinforcement-learning model, and opted to use a 10 CHAPTER1. INTRODUCTION reinforcement learning model. A neural network model such as the OpAL model (Frank & Claus, 2006) or related models (Collins & Frank, 2014) may offer superior performance and in particular, a theoretical basis for a neural-behavioral joint model. Thus, if possible, these should be compared to the simpler reinforcement learning model. Existing review articles Computational models One useful source is Samson, Frank, and Fellous (2010), which reviews computational models of reinforcement learning, all the way from neurochemical-level models to behavioral-level models. The temporal-difference learning, multiple model-based RL framework, and the actor-critic model (as in the OpAL model described below) could all be useful. This area is well-developed, and it should be possible to identify both behavioral and neural models of reinforcement learning. Linking from reversal learning to other models Bari and Robbins (2013) describe how GoNoGo, SST, delay discounting and reversal learning tasks might relate to one another, particularly considering the kind of response inhibition that they each measure. This suggests that models for each of those tasks could be combined. If we have multiple tasks from the same subject, and each task has some parameters in common, then there is no reason we can’t combine those tasks to create a new overall analysis. For instance, we can get a measure of response inhibition from each task using some behavioral model, and measure the correlation; this might describe some common response inhibition function; and we can identify neural parameters that fit those common responses. OpAL model The Collins and Frank (2014) Opponent Actor Learning (OpAL) model contains two separate reinforcement learning functions, separating out expected value (critic learning) updat- ing from Go and NoGo learning (actor learning). The latter mimics a neural network design. The two actor components can be linked to specific neural processes (respectively, separate dopamin- ergic pathways). The actor learning applies a ‘three-factor Hebbian rule’ where pre-synaptic and postsynaptic activation and dopamine are all represented separately. The direct representation of multiple neural processes makes the model a powerful tool for investigating psychological-neural 1.3. REVERSALLEARNING 11 links. Using behavioral-neural joint modeling, this is particularly useful. In Section 3.3.2 (p. 51), I describe exactly how this is implemented mathematically. A neuroanatomical model Peterson et al. (2009) used a reinforcement learning model to model the reversal learning task. The model includes parameters for prediction error and a term representing bias toward exploration and exploitation, separated out according to phases of the experiment (initial and reversal). The reversal learning task includes metrics for instrumental learning including reward and punishment susceptibility. Just about any behavioral-level reinforcement learning model could be appropriate to model reversal learning, and the field is too vast to sum up here. Using the Collins and Frank (2014) model as an example, we would estimate parameters for prediction error at each trial where correction was needed and estimate exploration and exploitation biases on a subject level. The neural level model would include features thought to be involved in each of these tasks: conflict monitoring areas like the ACC during error periods, subcortical reward systems during reward periods, and other neural markers hypothesized to be associated in some way with exploration vs. exploitation biases. These need not be all for the same form of analysis – for instance, characteristic properties of resting state data could be extracted and measured for their explanatory power for parameters in the model. Reiter review Reiter, Heinze, Schlagenhauf, and Deserno (2017) reviewed a number of rein- forcement learning model variants. For their application, they found that an individually-weighted double-update better represented the data than a single-update model. In a single-update model, only the response to the chosen stimulus is updated. In contrast, a double-update model takes into account the anti-correlation relationship between the chosen response and a competing response, and the values associated with them are updated together. A dynamic learning rate model did not improve performance. In our task, a single-update and double-update model are not distinguishable because there are always exactly two alternatives, and the response to one alternative is always exactly opposite to the response to another alternative. 12 CHAPTER1. INTRODUCTION 1.4 Reversal learning dataset This section describes the deterministic reversal learning dataset implemented in the larger study of sexually active non-monogamous young men who have sex with men. 1.4.1 Subjects There were two runs of two different Motivation conditions, the Reward Motivation Con- dition and Punishment Motivation Condition. In the reward condition, when a subject correctly responded to a trial, he was shown a symbol on screen indicating a small amount of money would be added to a total they would receive after the task. In the punishment condition, when a subject correctly responded to a trial, he was given a small electric shock via an electrode attached to the inside of the ankle. The electric shock was calibrated to be uncomfortable, but not painful. A few runs were not completed correctly because some subjects did not attend their second session, or data was lost, or for other reasons. Table 1.1 describes the number of subjects whose data we had for each condition. These numbers emerged after the data cleaning process described in Section 2.5. Following the data cleaning steps, there were 584 runs over 159 subjects; 53 in the Safe Group, 63 in the Risky Sex No Meth Group, and 39 in the Risky Sex Meth Group. Analyses that included neural data and behavioral data also occasionally had neural data missing for a particular subject or run; in these cases, some analyses proceeded with as few as 144 subjects available. 1.4.2 Runs Within each of the two runs, for either reward or punishment, subjects saw 218 images in total; a total of 18 unique images per subject, per run, plus additional ‘control’ images at the start and end of the task which were never reversed. There were 5-8 presentations of each image before reversal and 5 presentations of the image after reversal. There were 4 images, two at the start of the run and 2 at the end, presented 5 times each without any reversals. These were excluded from the data, leaving 198 images. Figure 1.2 below shows the count of images across one run, at each time point, relative to that image’s reversal. All subjects were given the same basic design as shown here. 1.4. REVERSALLEARNINGDATASET 13 Table 1.1: Number of subjects in each condition by run. Motivation runid Risky Meth Risky No Meth Safe No Meth All punishment 1 34 62 47 143 punishment 2 34 59 48 141 reward 1 38 61 52 151 reward 2 37 60 52 149 Table 1.2: Task design Motivation runid -8 -7 -6 -5 -4 -3 -2 -1 R=0 1 2 3 4 punishment 1 3 6 9 18 18 18 18 18 18 18 18 18 18 20 punishment 2 3 6 9 18 18 18 18 18 18 18 18 18 18 20 reward 1 3 6 9 18 18 18 18 18 18 18 18 18 18 20 reward 2 3 6 9 18 18 18 18 18 18 18 18 18 18 20 Subjects received the same set of images for each of the motivations and runs. The first 40 image presentations are presented below. Each image is presented between 5 and 8 times pre-reversal, and 5 times post-reversal. Image presentations are interleaved, with two new images presented at a time. For each presentation, subjects must press either the LEFT key or the RIGHT key. Either left or right could be the correct response, and subjects can only learn using trial and error. There’s about 3 seconds between each presentation at the beginning and end, there are a few “Control” trials using images whose reinforcement contingencies are never reversed. Left and right key presses are counterbalanced across subjects. Table 1.3 is data from one particular subject (Sub 146). The order is exactly the same for all subjects, but the onsets will be slightly different and the responses and response times will of course be different. 14 CHAPTER1. INTRODUCTION Table 1.3: A sample of the beginning and end of the task. Images were shown in the order presented for every subject, for this particular run. Each of the four runs had similar designs with the same number of presentations and reversals, although different image cues were used for each run. Image Onset (s) Presentation Type Image Presentation Response RT (ms) 4 2.7 PreReversal 1 Nonresponse 2 5.2 Control 1 Correct 522 3 7.9 PreReversal 1 Incorrect 347 1 11.0 Control 1 Incorrect 595 2 14.1 Control 2 Correct 720 1 16.9 Control 2 Incorrect 580 3 19.6 PreReversal 2 Incorrect 471 4 22.4 PreReversal 2 Incorrect 407 1 25.7 Control 3 Correct 641 3 28.3 PreReversal 3 Incorrect 737 4 31.0 PreReversal 3 Incorrect 478 2 34.3 Control 3 Incorrect 778 3 37.0 PreReversal 4 Nonresponse 1 40.7 Control 4 Incorrect 563 4 43.7 PreReversal 4 Correct 398 2 46.6 Control 4 Correct 650 4 49.8 PreReversal 5 Correct 754 2 54.6 Control 5 Correct 595 3 57.2 PreReversal 5 Incorrect 680 5 59.7 PreReversal 1 Incorrect 737 1 63.6 Control 5 Incorrect 534 3 66.2 PostReversal 1 Correct 818 4 69.5 PreReversal 6 Correct 858 6 72.9 PreReversal 1 Correct 723 3 75.9 PostReversal 2 Incorrect 965 1.4. REVERSALLEARNINGDATASET 15 Table 1.3: A sample of the beginning and end of the task. Images were shown in the order presented for every subject, for this particular run. Each of the four runs had similar designs with the same number of presentations and reversals, although different image cues were used for each run. Image Onset (s) Presentation Type Image Presentation Response RT (ms) 4 79.4 PostReversal 1 Correct 590 6 82.2 PreReversal 2 Incorrect 802 4 85.5 PostReversal 2 Correct 934 5 88.6 PreReversal 2 Nonresponse 6 92.1 PreReversal 3 Correct 800 4 95.3 PostReversal 3 Incorrect 513 5 99.2 PreReversal 3 Correct 567 3 104.4 PostReversal 3 Incorrect 471 4 107.7 PostReversal 4 Correct 536 6 110.3 PreReversal 4 Correct 644 5 113.3 PreReversal 4 Correct 389 3 116.3 PostReversal 4 Correct 377 5 119.6 PreReversal 5 Correct 463 3 122.5 PostReversal 5 Correct 369 4 125.1 PostReversal 5 Correct 549 Last 20: Image Onset (s) Presentation Type Image Presentation Response RT (ms) 20 650.8 PreReversal 6 Correct 446 21 656.4 Control 1 Incorrect 406 22 663.5 Control 2 Incorrect 404 20 666.7 PreReversal 7 Correct 720 21 669.7 Control 2 Incorrect 671 19 672.7 PostReversal 2 Incorrect 357 20 675.6 PostReversal 1 Incorrect 554 16 CHAPTER1. INTRODUCTION Image Onset (s) Presentation Type Image Presentation Response RT (ms) 19 678.8 PostReversal 3 Incorrect 436 21 681.3 Control 3 Incorrect 380 20 684.2 PostReversal 2 Incorrect 227 22 687.4 Control 3 Incorrect 267 20 690.0 PostReversal 3 Correct 301 21 692.7 Control 4 Correct 411 19 695.6 PostReversal 4 Incorrect 379 20 699.1 PostReversal 4 Incorrect 356 22 701.9 Control 4 Incorrect 451 20 705.1 PostReversal 5 Correct 480 22 708.1 Control 5 Incorrect 411 19 713.8 PostReversal 5 Incorrect 595 21 716.9 Control 5 Incorrect 488 1.4.3 Hierarchy In order to model the RL task hierarchically it is important to grasp the levels in the data we need to model. These are: • Subjects are organized into three groups, with around 50 subjects per group. The groups are Safe Sex, Risky Sex No Meth, and Risky Sex Meth. Between-group comparisons may have interesting implications for sexual health or drug abuse research. • There are two types of sessions, Reward and Punishment. Most subjects have done two sessions of each, i.e., four sessions per subject. However, there’s quite a lot of missing data. Overall, 133 subjects have four sessions, 8 have three sessions (all of which include both a punishment and a reward session), and 14 subjects have only two sessions (4 of which are subjects with only reward sessions; 10 are subjects with only punishment sessions). Data levels Thus, we can describe Level 1 as the run-level parameters, describing parameters for a subject’s performance across a particular run. Level 2 parameters are subject-level parameters 1.4. REVERSALLEARNINGDATASET 17 Figure 1.1: Proportion of correct judgments at each time point in the task, relative to reversal. Each colored line represents the aggregate of one subject’s data. Black lines represent the averages across the group, and are reproduced with standard deviations in Figure 1.2. Dashed lines represent the mean across subjects at the last time point pre- and post-reversal. and describe performance for all of a subject’s runs, using a hyper-parameter which specifies the distribution of the Level 1 (run-level) parameters. Group level hyper-parameters are at level 3, and they describe the distribution of the subject-level hyper-parameters within each group. 1.4.4 Basic data structure Figure 1.1 describes the average performance of each subject. It can be seen that many subjects scored worse than chance overall, and many performed better than chance, though across subjects, performance immediately preceding reversal (0:76 0:17) and at the end of the task (0:61:16) were above chance (bothp< 0:001). The group mean and standard deviations across subjects are illustrated in Figure 1.2. 18 CHAPTER1. INTRODUCTION Figure 1.2: Proportion of correct judgments at each time point in the task, relative to reversal,by group, with error bars indicating standard deviations. 1.5 Neuroanatomy of the decision-making system To properly choose candidate regions to include in our model it is useful to briefly map out the structure of the decision-making system. Broadly speaking, the prefrontal cortex, striatum, and insula are generally held to be brain regions important for decision-making. (e.g., Bechara, 2005). Within the Somatic Marker framework (Reimann & Bechara, 2010), the dorsolateral prefrontal cortex is responsible for cognitive control, the ventromedial prefrontal cortex for valuation of alternatives, the striatum and nucleus accumbens in particular for processing reward and punishment signals, and the insula for transmission of bodily states and integration as information to be processed by the rest of the decision-making system. Recent research has shed light more light on the functioning of these regions. Luke J Chang, Yarkoni, Khaw, and Sanfey (2012), Droutman, Bechara, and Read (2015), Deen, Pitskel, and Pelphrey (2010) have all divided the insula into three regions: the posterior insula, and the dorsal and ventral anterior insula. The posterior insula is responsible for sensorimotor processing (Droutman, Read, & Bechara, 2015; Wager & Barrett, 2017), while the dorsal anterior insula has been implicated in risky decision-making (Droutman, Read, & Bechara, 2015) as well as salience 1.6. HYPOTHESES 19 (Menon & Uddin, 2010), task set maintenance (Dosenbach et al., 2007), and cognitive control (Cole & Schneider, 2007). 1.5.1 Reversal learning task Bari and Robbins (2013) wrote a review on inhibition and had a section comparing choice impulsivity and reversal learning with inhibition. They trace the history of inhibition and all the different senses it has been used in, from behavioral definitions to neuropsychological, physiological, cognitive, and personality. Their article contains a useful graph showing different kinds of inhibition and how one might break down inhibition into different types. It continues with a discussion of reversal learning and delay discounting. Of reversal learning, they say that OFC lesions but not dorsomedial or lateral lesions cause impairments in reversal learning. With delay discounting, they associate a wider network: (1) a ventral cortico-striatal network including medial OFC and ventral striatum; (2) a lateral prefrontal cingulate network comprising lateral OFC, cingulate cortex, dl and vl PFC implicated in conflict detection and behavioral inhibition, and (3) a medial temporal- hippocampus network for prospective evaluation of future outcomes. More generally, reversal learning research has often focused on the ventrolateral and or- bitofrontal cortex (O’Doherty et al., 2001). Cools et al. (2002) focused on the ‘frontostriatal circuitry’, emphasizing the ventral striatum and various parts of the prefrontal cortex, particularly the ventrolateral PFC. In a broad review, Izquierdo et al. (2017) has also pointed to the distinct roles of the medial PFC and OFC. The OFC seems to play a role in stimulus discrimination, declining in importance when stimuli are easier to discriminate; in contrast, while mPFC has a role, it has been less clear exactly what that role is. 1.6 Hypotheses Here we want to investigate the neural correlates of the reversal learning task. In particular, we want to follow up on the claimed role for the OFC by Fellows and Farah (2003) and investigate the ongoing ambiguity in the role of the OFC described by Izquierdo et al. (2017). By examining the links between computational processes in reinforcement learning and OFC activity, we can provide some more clarity on the role of the OFC. Which areas of the orbitofrontal cortex interact with the 20 CHAPTER1. INTRODUCTION task? More importantly: what kind of roles do they play? In the joint model, I will investigate two key features of a reversal learning model: Reward Prediction Error (RPE) and Expected Value (EV). If we are able to link either of these to the OFC, we can help to clarify the role of the OFC during reversal learning. I aim to test several distinct areas in this thesis. Methamphetamine use and sex risk First, I want to demonstrate that meth-using subjects are slower learners than non-meth-using subjects. Two hypotheses could suggest this pattern. First, methamphetamine users may be less responsive to reward, and thus be driven to seek out highly rewarding stimuli such as methamphetamine. Second, methamphetamine users may be impaired by their methamphetamine use, having damaged striatal reward learning systems arising from overuse during methamphetamine consumption, and consequently their learning systems could be impaired. We will also compare sexually risky and sexually safe, non-meth users. However, I don’t have an expectation that will find differences in learning rates or other decision-making capacities between these two groups. Brain system Second, I want to demonstrate neural underpinnings of decision-making during the reversal learning task, thus mapping the neural circuitry of decision-making for this task. Most psychological models of reinforcement learning include learning rate and an ‘inverse temperature’, or exploration/exploitation measure. Alternatively, in the linear ballistic accumulation models presented here, rather than an inverse temperature, we measure non-decision time and relative drift rate. These models yield trial-level values, particularly expected value and reward prediction error. In this context, expected value is the value that the subject predicts a particular response to a stimulus will yield. Reward prediction error is the difference between the expected value and the value actually obtained. In line with established research, I predict that reward prediction error ought to correlate with dopaminergic systems and striatal systems associated with reward, in particular, the Accumbens and Putamen. DE-MCMC and NUTS Joint modeling In this thesis I test several different methods for Bayesian estimation of reinforcement learning parameters (a listed overview is provided in the next Section 1.7). First, I examine linear models of reinforcement learning. Second, I examine 1.7. SUMMARYOFCOMPLETEMODELSIHAVETESTED 21 reinforcement learning models implemented in stan using a No-U-Turn Sampler, and those implemented in Differential Evolution Markov Chain Monte Carlo. I predict that the two Bayesian methods should yield roughly similar results, with each method having various advantages over the other, while both will be more sensitive to the distributions within the data and thus more powerful at measurement and at detecting group differences than the hierarchical linear model. 1.7 Summary of complete models I have tested This is a summary of the complete models I tested in preparation for this thesis. Each model has different properties. This section outlines each of the major advantages and disadvantages I found with each model. At least all of these models were potentially the best among all the models I’ve tested at addressing a particular specific problem or approach. • Hierarchical NUTS Reinforcement Learning models (Chapter 5). Based on my collabo- ration with Nathaniel Haines. They applied a simple Reinforcement Learning (RL) model to the reversal learning data. I did experiment with incorporating pain data, but this wasn’t found to be helpful in the end. All models were built in stan, and I used them to compare and contrast two estimation methods, variational Bayes and MCMC NUTS (described more in Chapter 5). I was able to extend this to a full, three-level behavioral model, as detailed in the chapter, and run it on a limited number of subjects, but was not able to generate a working three level model that could estimate group-level statistics for an entire subject group. These were not joint models. • NUTS models with Linear Ballistic Accumulator (Chapter 7). These were built from scratch incorporating the linear ballistic accumulator model described in Annis, Miller, and Palmeri, 2017 with a basic RL model, estimated using stan’s built-in MCMC NUTS algorithm. There were several useful models generated: – Single-level NUTS joint models, second generation. These were built from scratch based on Annis et al., 2017. * Single-level joint model with linear ballistic accumulator. This model models just each single level on its own. It fundamentally follows Annis et al., 2017. It jointly models reward prediction error and expected value with a number of neural markers. 22 CHAPTER1. INTRODUCTION * Single-level joint model with linear ballistic accumulator and the OpAL actor-critic framework described in (Collins & Frank, 2014). * Single-level joint model with linear ballistic accumulator and a discriminability model, modeling the discriminability of cues from one another. – Multi-level NUTS models, second generation. I tried a number of ways to get a three- level LBA-RL model working in stan through June 2018. I did not try a multi-level joint model. Some 3-level models with 10 subjects were able to initialize, but the model failed to initialize with a full group of subjects. There were a number of different initial condition and prior values I tried, but they did not work consistently. Further information is described in Chapter 7. • Differential Evolution MCMC (DE-MCMC) (Chapter 6). This was implemented in R using Brandon Turner’s DE-MCMC modeling library, originally written for a GoNoGo task. I implemented an RL model with a drift diffusion model, and focused on implementing a behavioral model for the DE-MCMC. These were not joint models. – Single-level Differential Evolution MCMC models. These were implemented in Differential Evolution MCMC using Brandon Turner’s DE-MCMC code, modified to suit this particular task. – Two-level Differential Evolution MCMC models. This two-level model estimated run-level estimates and then estimated a group effects independently for each run. So we had independent estimates of effects for each of the four groups. – Hierarchical Differential Evolution MCMC models. I ran this on smaller subsets of the data as well as models encompassing all subjects. In the all-subjects model we generally we did not see sufficient convergence. I tested this more extensively with a 15-subject model. This may be the first time a three-level hierarchical DE-MCMC model has been attempted. I wasn’t able to get this to converge at this time, but the chapter explores some of the reasons why that may have been. A brief list of all the models described in detail in this thesis is included in the index on Page 181. Chapter 2 Data preparation I used several freely-available neuroimaging analysis tools to process neural data. First, FSL (Woolrich et al., 2009) was used to do preprocessing 1 and to extract alternative Regionofinterest (ROI) timeseries as described in Section 7.4.2. Freesurfer (available at https://surfer.nmr.mgh.harvard.edu/; Dale, Fischl, and Sereno, 1999) was used to extract the primary ROIs used, described in Section 2.1. Freesurfer is noted for its cortical segmentation, in which brain activity occurring in cortical regions is mapped along the surface of the cortex. ROIs can be derived from parcellations respecting anatomical boundaries of the cortical surface (Destrieux, Fischl, Dale, & Halgren, 2010). This method is likely to produce anatomical ROIs that better represent distinct functional regions, because the brain’s neural network connects across and through the inner part of the cortex rather than across sulci. Because of the cortex’s folded shape in the skull, when considered anatomically, regions in different parts of the cortex but touching across a sulcus (a ‘valley’ in the brain surface’s landscape) may be erroneously considered to make up a single activity cluster if a cortical parcellation as in freesurfer is not used. Modules and some raw and modified code from nltools (available at https://github.com /ljchang/nltools) and nilearn (Abraham et al., 2014) were also used to perform regression of raw data including both the Freesurfer ROIs and the pain signature time series. This involves controlling for global blood flow effects, a high pass filter, and convolving the fMRI signal to the HRF in order to infer neural firing from BOLD activity. These are all standard steps in processing brain data. 1 Thanks to Feng Xue, who has done much of the preprocessing for this dataset 23 24 CHAPTER2. DATAPREPARATION Because I was primarily working with time series it was easier to perform these steps in a bespoke implementation based on nltools and nilearn code rather than to use FSL or another tool for the task. 2.1 Extracting Pain Signature I extracted pain signatures using Chang & colleague’s nltools toolkit and Wager and col- league’sNeurologicPainSignature (NPS) tool (Wager et al., 2013). Finding the pain signature in the punishment dataset can act as a way to confirm the data is being processed correctly, can test interesting hypotheses about the pain signature and reinforcement learning, and can act as a neural marker to be integrated into a joint model. The rationale is explained further in Section 4.2. The nltools toolkit contains features for model convolution, and I convolved an HRF signal for each trial from the behavioral data described by each trial. For simplicity, each trial was assumed to last for precisely 2 seconds. Because convolution had already been done for a task unrelated to this thesis separately in FSL, I was able to compare the FSL and nltools output side-by-side to ensure that they were similar. Figure 2.1 shows that when aligned correctly, FSL and nltools both return very similarly-designed time courses, confirming that the time series convolution has been properly implemented in nltools To extract the pain signature, I created a design matrix with one regressor for every trial in each run. The preprocessed data from each run was regressed on to the design matrix using neurolearn’sBrain_Data.regress function 2 . As well as a regressor for every trial, the regression matrix also included regressors representing intercept, linear, quadratic, and cubic functions to act as high-pass filters, and 6 motion regressors produced by FSL during the pre-processing stage. These estimate motion, and by adding them to the design matrix, we can reduce the error in the image due to subject movement in the scanner. The NPS is simply an fMRI template image representing regions associated with a pain signal. I used FSL’sflirt function to transform the NPS into the space appropriate for each run. Then, I could combine the NPS template with the 3D brain image representing each trial using an appropriate mathematical function to represent the similarity of that image to the NPS. I calculated 2 in the reversallearning project repository on github, see the negative affect/pain regression allsubjs.py folder for the raw code used for this procedure 2.2. CHOOSINGMEASURESOFNEURALACTIVITY 25 Figure 2.1: FSL and nltools convolution superimposed on each other. Each series shows both an FSL and nltools timeseries; they can be seen following distinct paths at each even, where one time series has a systematically higher magnitude than the other. Magnitude is unimportant here because the signal is subsequently normalized. The middle rows, the error rows, are different because for the FSL comparison, only the first error trial was recognized, while for nltools, we plotted every error trial. a similarity score using the dot product of the NPS and activity for each trial to get a measurement of pain activity during that trial. Other similarity measures include correlation or cosine similarity. The similarity score is a a pain intensity measure for each trial individually. 2.2 Choosing measures of neural activity 2.2.1 Extracting data from images Here I review options for extracting data from images and explain the choices we made here. When testing the fit of brain activity to our hyper-parameters in our model, there are several sources we can draw on to define ways of extracting neural activity from the brain: • Parcellations defined previously from independent subject groups. These include: – Anatomically-defined parcellations. These are very useful because the information is 26 CHAPTER2. DATAPREPARATION highly interpretable in terms of demonstrating a particular area’s association with a particular function. – Functionally-defined parcellations. These are probably not our preferred option, since we’re not looking at resting-state data. – Meta-analysis parcellations - principally Luke J. Chang and colleagues (2016)’s un- published 50-, 100-, and 200- cluster parcellation available at neurosynth.org. Like anatomically-defined parcellations, these are reasonably interpretable, and have the advantage of being somewhat theory-driven, in the sense that they arise from individual clusters detected across thousands of studies, many of which have used decisions about cluster borders based on their interpretability. • Decoders trained on other datasets, such as Wager et al. (2013)’s pain dataset. Decoders pick up much more signal (Woo, Chang, Lindquist, & Wager, 2017); their interpretability comes primarily from the definition of the task from which they were extracted instead of a particular region obtained. However, it is also possible to identify them with a particular area, either by building a decoder confined to a particular area, or by examining the regions that are most strongly ’active’ within the decoder. • Activity defined from the same subject group. These have the advantage that we may be able to define these separately for each subject, this increasing our power to detect effects. However, when using these, we may need to justify their generalizability. These might include: – Regions of interest related to other tasks from our same subject group. For instance, in preliminary testing during another investigation on this dataset examining a GoNoGo task, activity associated with a ‘stop’ response was detected in the right dlPFC. This activity can be used as a functional ROI to detect similar stop responses apparent in other tasks. – Decoders trained on other tasks from our same subject group. As above, this could be useful for detecting specific processes, e.g., stop responses. Earlier testing on our GoNoGo dataset indicated that the Harvard-Oxford atlas may not have the specificity required to capture signals during decision-making processes in our tasks. Parts of the prefrontal cortex are covered by relatively large region groupings, including the frontal pole, medial 2.3. EXTRACTINGROISFROMFREESURFER 27 and orbitofrontal cortex, and dorsolateral prefrontal cortex. Thus, I investigated other parcellations. There are a number of parcellation atlases available; I aimed to find an atlas having fine- grained (>100) parcellations, at least in the prefrontal cortex and striatum, and being not primarily resting-state based. Luke J. Chang and colleagues (2016)’s may be the only automated meta-analysis based parcellation atlas available, and contains as many as 200 separate clusters. Destrieux et al. (2010) describes an atlas of 144 bilateralized regions as well as an automated method for assigning regions within a subject’s data to the atlas; the tool to implement this method is available in the Freesurfer brain imaging software (Moreau & colleagues, 2016) and a standardized map of the regions in nilearn (Abraham et al., 2014; Abraham et al., 2017). Freesurfer is based on registration of grey matter surface space, which is held to offer greater accuracy than volume registration, for several reasons. It uses cortical folds to align between subjects in surface space (Fischl et al., 2007). In volume registration, 5 mm spatial smoothing can blur adjacent grey matter surfaces across sulci when these surfaces are actually very separate parts of the brain’s neural network and do not have direct neuronal contact. In some cases it allows patterns across the grey matter surface to become more apparent. Third, even small alignment error in 3D space can result in overlaps between distinct sulci. If surface mapping is accurately performed before alignment, then these kinds of overlap issues can be avoided. Perhaps for these reasons, there is some evidence that surface registration yields more statistical (Anticevic et al., 2008) and predictive power (Oosterhof, Wiestler, Downing, & Diedrichsen, 2011). 2.3 Extracting ROIs from Freesurfer I used freesurfer segmentation and parcellation to extract cortical segmentations and subcor- tical parcellations from raw fMRI data. First, freesurfer’srecon-all -all command was used to extract a structural model of each individual subject’s cortex. Then, I took functional data pre-processed in FSL and used the reg-feat2anat command to create a freesurfer registration of the functional data. FSL preprocessing included slicetiming, but not spatial smoothing. Spatial smoothing was specifically excluded because freesurfer aims to map brain activity onto a spherical representation of the cortex. In this way, activity that occurs near to other activity within the cortex space are considered together, 28 CHAPTER2. DATAPREPARATION but activity far in cortex space but near in 3-D space (e.g., where the cortex folds against itself) will not be considered as a single cluster). This contrasts with more traditional GLM cluster analysis in fMRI. When we apply those methods, we typically cluster voxels close to one another that are simultaneously active. This can be misleading if, for instance, voxels represent separate gyri that happen to be folded in to be very close to one another, and may result in spurious cluster representation. By representing mapping voxels onto a cortical surface ’flattened’ out into a sphere before spatial smoothing, we gain a more accurate view of clusters that are significantly active together. Following this,mri_segstats was used to segment the image into subcortical segmenta- tions described in Fischl et al., 2002 and cortical parcellations described in Destrieux et al., 2010. A script was used to automate this process. Following the functional data registration, I followed a process devised with help from Grieve (personal communication, 14 May 2018, at https://www.mail-archive.com/freesurfer@nmr. mgh.harvard.edu/msg58082.html ). First, temporary functional images registered to structural space were created with freesurfer’smri_vol2vol. Because these were very large, only one functional image in structural space was created at a time. Once registered to structural space, functional data could be extracted from the area encompassed by each cortical parcellation and subcortical segmentation. The large, structural-space functional images were each deleted as soon as they were used to create ROI timeseries. 2.4 Neural data cleaning After the ROI timeseries, the HRF convolution script written for the pain signature was modified to instead convolve the time series generated from freesurfer. The convolution script included the same noise-cleaning regressors that were included during pain signature extraction. Following convolution, I added one more cleaning step to the model. Freesurfer generates ROIs for left and right cerebral white matter, 4 different ventricles, and Cerebospinal fluid, repre- senting Blood Oxygen Level Dependent (BOLD) activity in the brain that influences activity all over the brain but does not represent neural activity in any particular area. I controlled for each of 2.5. BEHAVIORALDATACLEANING 29 these signals by regressing each of these ROIs onto the ROIs of interest in a simple linear model, and taking the error to represent the cleaned signal timeseries for each ROI of interest. 2.5 Behavioral data cleaning During earlyMonteCarloMarkovChain(s) (MCMC) single-subject analyses, I often found the model could not easily converge on sensible starting values for some subjects. There was a wide range of performance accuracy between subjects. One subject in particular had performance at chance level and long strings of the same response, suggesting a ‘button-mashing’ behavior that ignored any computation of reward or punishment. Including subjects like these in a reinforcement learning model defined by a learning rate parameter is misleading, because the learning rate describes not the subject’s ability to learn but simply their lack of actual learning. This complicates the interpretation of a group-level learning rate parameter. I tried two different models to deal with this. One used a binomial parameter indicating whether a subject seemed to be randomly button-mashing. But it was difficult to get the model adapted to modeling a binomial parameter. I also experimented with a probability model that calculated a joint probability of button mashing or a linear ballistic accumulation. I was not able to get this model to initialize. In order to deal with the button mashing problem, I eliminated subjects below a threshold, described below. Unless otherwise stated, data cleaning followed the following process. A total of 164 subjects were recorded in the dataset with complete data across the three groups - 54 in the Safe No Meth Group, Group 1, 68 in the Risky No Meth Group, Group 2, and 42 in the Risky Meth Group, Group 3. Risky classification correction Several subjects were eliminated because during analysis, it became clear that several subjects classified as “sexually risky” could not reliably be determined to have had unprotected casual sex. In order to be included in the study, subjects had to have had anal sex with a man other than their primary partner in the prior 90 days. In order to be included in the risky group, subjects had to have had unprotected anal sex with a man in the prior 90 days, but this could have been their primary partner. It is unclear that unprotected sex with a primary 30 CHAPTER2. DATAPREPARATION sexual partner such as a husband really represents risky behavior. We did collect data about the total number of men with which participants had had unprotected anal sex, but not data about whether that group included their primary partner. Thus, if a man had had unprotected anal sex with precisely one other person, and had a primary partner, we could not determine whether they had unprotected anal sex with a non-primary partner. Because they could not be reliably assigned to a particular group these men were removed from the study altogether. Following this step, I was left with a total of 161 subjects–54 in the Safe No Meth Group, 66 in the Risky No Meth group, and 41 in the Risky Meth group. 2.5.1 Removing runs with evidence of no learning Following this stage, I further removed runs within the study where subjects performed so poorly that we had clear evidence they were not learning at all. These were removed because no learning at all might indicate that they aren’t trying to complete the task. Because we hope to estimate actual learning ability, and not simply willingness to complete the task, these subjects were removed. There were two metrics used to estimate overall performance: a button-mash test and a worse-than-chance performance test. Button-mash test The first test eliminated runs in which subjects had made fewer than the expected number of switches between buttons. In the task, subjects must repeatedly press a 1 or 2. Seldom switching between 1 and 2 may indicate subjects are simply repeatedly pressing a single button, rather than making any judgment about the task. For each run, I counted the number of switches between consecutively pressing 1 and consecutively pressing 2. I then used a binomial probability distribution to estimate the likelihood of switching as many times as the subject did, assuming that the probability of switching at each time point between a 1 and a 2 was 0.5 and given that each run has 218 trials. This would be the case if subjects really were making independent judgments about each trial. This was determined by: B(x;N;P) =B(N changes ;N trials ; 0:5) =B(Xk; 218; 0:5) (2.1) 2.5. BEHAVIORALDATACLEANING 31 Based on that binomial probability, for each run, I calculated the expected number of runs that would have at leastB(x;N;P) button switches by multiplying by the total number of trials in the dataset, ranked the runs in order of their number of switches, and then removed all runs for which the probability of observing a run with that many switches was less than a cutoff level of 646 1 = 0:001548, or the probability that any single run would be excluded as a false positive given an arbitrary cutoff. In this way, we control for the number of false positive rejected runs. This ranked-based method was inspired by the False Discovery Rate method of correcting for multiple comparisons (Benjamini and Hochberg, 1995), which ranks p-values in order and rejects the null hypothesis for all tests in which the p-value is less than or equal to the expected p-value at that rank based on a null distribution. Describing this more concretely, out of 646 total runs, there were 10 218-trial runs with 90 or fewer switches, compared to an expected trial count of just 3.9 with 90 or fewer switches. We expected less than 1 run with 87 or fewer changes, whereas 8 were observed, so those 8 runs were eliminated from the sample (on a per-run basis). Four were from 1 subject and three other runs were from one subject each. Worse than chance performance test The second test eliminated subjects who seemed to be performing significantly below chance level. Run performance was simply the number of correct responses divided by the total number of trials in each run; subject performance was the mean of run performance across all runs for each subject. The logic is similar to that in the previous test. If we have a tolerance for precisely one false positive, i.e, falsely exclude precisely one run as worse than chance which is not worse than chance, and assume that we do this, then with 646 runs, excluding precisely one false positive, the probability any single run will be excluded as a false positive as 646 1 = 0:001548. I then solved to find the number of correct trials necessary to approximate this probability in a binomial distribution representing the probability of getting a particular number of trials in a run correct: B(x;N;P) =B(N correct ;N trials ; 0:5) =B(N correct ; 218; 0:5) 0:001548 32 CHAPTER2. DATAPREPARATION 7.389056 148.413159 2980.957987 0.0 0.5 1.0 reaction_time log(count) response time distribution with cutoff point used by Woods, Wyma, Yund, Herron, & Reed (2015) and higher cutoff point used in this study Figure 2.2: Log histogram of reaction times, across all subjects in the study The closest integer solution for this wasN correct 87, which represents an approximately 40% proportion of correct responses. Thus, if all subjects with a mean across their runs of less than 40% performance were excluded, we expect precisely one false positive exclusion. Importantly, subjects had to consistently underperform across several runs in order to have a mean of 40% proportion of correct responses, making this a more conservative exclusion rule than it would otherwise be. Subjects (i.e., on a per-subject basis) whose average performance across runs was at 40% or less were eliminated (6 subjects). 2.5.2 Short reaction times The distribution of response times is bimodal (Figure 2.2). Due to a technical error, there were 7 trials where reaction times between 1.1 and 1.5 were recorded. These were not excluded as they were likely due to a technical error in code or temporary 2.5. BEHAVIORALDATACLEANING 33 latency problem rather than an erroneous response. In a simple reaction time task, Woods, Wyma, Yund, Herron, and Reed (2015) found subjects’ simple response time had a mean of 213 ms, including movement initiation. In their analysis, they coded responses less than 110 ms as a false alarm. However, considering the distribution observed in this study, along with the relatively high cost of including a false response compared to the relatively negative cost of excluding at true response, I used a slightly higher threshold of 140 ms. Trials with responses under this threshold were removed from the dataset entirely. Not only did subjects lack time to make a decision, they would also not have been exposed to stimuli for long enough to learn anything from the trial. Thus, these trials are not relevant to an RL model of reversal learning and can safely be eliminated. 34 CHAPTER2. DATAPREPARATION Chapter 3 Model design In this chapter, I review the technical details of model implementations. These follow on from Chapter 1, which describes the theoretical background for each of the models implemented. This chapter does not describe results from implementation of any of the models; rather, it simply describes the mathematical properties and structure of each of the models that I implement in subsequent chapters. The reader may prefer to move straight on to Chapters 4-7, which describe models as they were implemented in various versions, and the results found, and refer then back to this chapter when necessary. 3.1 Overview of the double-update reversal learning model As the review in Chapter 1 describes, reviews comparing more complex Hidden Markov Models of reversal learning described in the literature with simple double update models have not shown those more complex models to be superior. Although the double-update model and single update model are equivalent, I’ve implemented a double-update model for now because of the easily accessible example provided by Nate Haines in his hBayesDM package (Ahn & Haines, 2017). 35 36 CHAPTER3. MODELDESIGN 3.1.1 Reinforcement learning model The double update model implemented in this subsection is similar to that described in Reiter et al. (2017), and was adapted from code by Ahn and Haines (2017) 1 . It is assumed that at each trial the stimulus is recognized perfectly, but the response to that stimulus is imperfectly recalled, and prediction error reflects this imperfection. This is unlikely to be completely true. The stimuli in the task are actually quite difficult to distinguish from one another under time pressure, but it is unclear how recognition might be represented in this model, or how modeling of recognition might change model outcome (for an attempt to model recognition, see Section 8.2.1). At each trial, based on the particular stimulus being represented, the model looks up the expected value of a responsev 1 and its alternativev 2 . The model describes prediction errore as the difference between outcome and expected outcome, based on the choice that was made. In the double update model, the expected value of its opposite is also calculated. e t =vv e (3.1) e t; =vv e (3.2) The outcome expected values are then updated using the prediction errors, weighted by learning rate: v e =v t1 +e t (3.3) v0 e =v t1 +e t (3.4) A single update model would be made up of only Equation 3.1 and Equation 3.3. In that case, e t would describe the prediction error of the choice made, and v e describes the expected value of the first alternative relative to the second. A double update model adds Equation 3.2 and Equation 3.4, where each alternative is represented separately. Because in the current design, the 1 Credit is due to Nathaniel Haines for coding up a two-level, multiple-subject, single-run double update model at the MIND 2017 summer program 3.1. OVERVIEWOFTHEDOUBLE-UPDATEREVERSALLEARNINGMODEL 37 subject knows that exactly one of the alternatives will be rewarded and the other will not be,v0 e will always be the simple negative ofv e . If there was a possibility that both alternatives would be rewarded, or neither would be, or if reward magnitude varied, then this relationship would break down. Thus, in this particular application, the double update model is functionally equivalent to a single update model; in other applications, this would not necessarily be the case. Softmax The inverse logit of the expected value, which determines the likelihood of responding in each particular way on the current trial, is recorded in a 2 1 choice vector using a 2 1 expected value vector Pr(c t ) = logit(v) (3.5) where describes the odds of the subject choosing the response with the highest expected value, thus effectively modeling an exploration vs. exploitation tradeoff. This is a softmax algorithm and can also quantify subjects’ error in recognizing and respond- ing to stimulus, distinct from their ability to learn particular stimuli. Step by step notes In this section, I walk through an illustration of a particular stan iteration of the RL model described in the previous section. At a specific iteration, = 0:40. Because learning for each image is assumed to be independent (not a perfectly true assumption, but a simplifying assumption), we can deal with reversal learning for one image at a time. At the first presentation of cue 53, the expected value for each response is a vector [0; 0], and Response 2 is rewarded. Thecategorical_logit stan function estimates the likelihood of a response of 1 or 2; the actual response was 1 (Equation 3.5). The prediction error is calculated as the difference between the actual outcomes and the expected value (Equations 3.1-3.2), and then the expected value is updated with the prediction error weighted by alpha (Equation1 3.3-3.4). Expected values have now changed, withE[t; ] = [0:01; 0:01]. Note that the same update occurs regardless of the subject’s choice, because regardless of their choice, the result revealed to them whether their choice was correct and thus by implication, whether the 38 CHAPTER3. MODELDESIGN alternative counterfactual choice they could have made would have been correct. The expected values are multiplied by the valuev and this now yields an input to thecategorical_logit function of [0:004; 0:004]. This is important because the higher the categorical logit input, the more likely the choice will correspond to the expected value. With these low values, a choice against the very expectation is almost as likely as a choice consistent with the expectation. In the next round,e[t; ] = [0:03; 0:03],v = [0:03; 0:03], 0:40 = [0:01; 0:01]. In contrast, with a larger beta value of = 2:78, selection likelioods more closely track expected value. At the second round of presentation of cue 53, the expected value has reached [0:01; 0:01] as before; but because = 2:78, the input to thecategorical_logit function is now [0:02; 0:02]. As a result, the likelihood of a response selection of 2 is higher than previously. A larger beta value will weight both competing expected values higher and thus yield choices that more closely follow rational expectation, while a lower beta value will yield a choice closer to random chance. 3.1.2 Incorporating reaction time The double-update model is a simple reinforcement learning model and does not incorporate reaction time. That simple RL model is described in Chapter 5. However, our task collects subject reaction time during trials, and behavioral testing shows that reaction time is a strong predictor of outcome (Chapter 4). One way to incorporate reaction time into the RL model is to blend in an LBA or Drift Diffusion model. However, in this thesis I also investigated simpler ways to incorporate reaction time, as described in Section 3.1.2. Reaction time expectations Two hypotheses for reaction time are as follows: 1. Holding perceptual and retrieval time equal, slower reaction times would allow for more accurate perceptual judgments and memory. In this case, we’d expect that slower reaction times would lead to more superior performance in a given trial. 2. On the other hand, slower reaction times may reflect slower perceptual and retrieval time driven by experienced task difficulty. In this case, we’d expect that slower reaction times are 3.1. OVERVIEWOFTHEDOUBLE-UPDATEREVERSALLEARNINGMODEL 39 correlated with inferior performance in a given trial, and we’d also expect higher non-response rates in subjects with overall slower performance. Data from the linear analysis (Chapter 4) suggests that if anything, the first of these was true. Conditional on reaction time being greater than 0.1 seconds, there was a significant relationship between longer reaction times and performing the task correctly. I implemented a modification to the double update model to try to incorporate reaction time into the model. We extend Equation 3.5 (p. 37), repeated again below, to add a weighting for reaction time: Pr(c t ) = logit(v) (3.6) Pr(c t ) = logit(( 1 + 2 RT s;t )v) (3.7) In this way, we separately estimate two values instead of one. The first effectively estimates an exploration/exploitation independent of reaction time; the second estimates the effect of waiting, i.e., a slower reaction time, on the overall decision. Exclusion of abnormally short reaction times It was found that in the linear ballistic accumulator model, extremely short reaction times can result in an infinite likelihood if they are treated as a genuine response to the stimulus. For sufficiently short response times, it is impossible for humans to react, let alone decide between competing responses, and thus, as responses to the stimulus, these response times have zero likelihood and must be something other than an actual response, e.g., a simple accidental button press that coincides with the presentation of a stimulus. In order to deal with this, and because these trials don’t have any use from a theoretical perspective, I excluded trials with abnormally short reaction times. See Chapter 2, Section 2.5.2 for a further description of this process. Racing diffusion implementation A racing diffusion process model (Brandon M. Turner, submitted) is helpful in integrating these concepts. I worked with Turner to implement a racing diffusion process model in Brandon M. 40 CHAPTER3. MODELDESIGN Turner (submitted), which for just a single alternative, can be represented as f(tk;A c ;) = p 2(t) 3 exp( [A c (t)] 2 2(t) ) (3.8) which is the probability density function of observed reaction timest, where represents necessary level of evidence accumulated to make a choice, represents the time taken independent of evidence accumulation (for processes such as visual recognition, and movement to indicate decision), andA c , set using the RL valuev t represents the strength of evidence accumulation in that trial. In the context of the LBA model, in this thesis, is often referred to asthresh; in the drift diffusion models it is usually referred to ask. In either case, it represents a threshold of evidence needed to accumulate to make a choice. This function is a parameterization of the Wald distribution (Wald, 1947), and describes a probability that a level of evidence is obtained. In order to get the probability density for multiple competing alternatives, we need to add a PDF function and a CDF function, which is characterized as pdf wald (tkchoice;;A j ;) = p 2(t i ) 3 exp( (v(t)) 2 2(t) ) cdf wald (tkchoice;;A j ;) =pnorm( r 2 t (A j t 1))+ exp(2 2 )pnorm( r 2 t (A j t + 1)) pdf 2c (tkchoice;;v;) = 2 4 pdf wald (tkchoice;;v[1];) cdf wald (tkchoice;;v[2];) 3 5 (3.9) A c orv t is the same variable asv e in the double update model described above. Thus–holding the response timet constant to the observed data, if the estimated learning signal is stronger, then threshold will decrease and the will increase proportionately. Together these parameters make our previous inverse temperature parameter redundant and it can be removed from the model. Under Hypothesis 1 above, we don’t expect a higher learning rate, but we do expect that the threshold will increase in slower subjects, reflecting slower response times. Under Hypothesis 2, we’d expect a similar or perhaps increased threshold among high and low performance subjects, and we may expect a larger value, indicating that tasks like stimulus recognition, rather than response recall, could be responsible for lower performance. In both instances, the value of varies 3.1. OVERVIEWOFTHEDOUBLE-UPDATEREVERSALLEARNINGMODEL 41 proportional to reaction time, but only in Hypothesis 2, the speed of evidence accumulationv t will be lower in poorer performing subjects. v t is set not only by the constraints of the diffusion process; it also represents the value of the evidence accumulated for a particular trial. S. D. Brown and Heathcote (2008), Annis et al. (2017), and Brandon M. Turner (submitted) all describe racing accumulator models, in which evidence for each alternative accumulates over time. In Chapter 7, we use a linear ballistic accumulator model rather than a drift diffusion model. Principles are similar to the drift diffusion model described above, but calculations are slightly simpler. The implemented LBA model is described, line-by-line, in Annis et al. (2017). 3.1.3 Mapping the expected value to the drift rate In Chapter 6, I combine a racing diffusion model with a RL model, as described in Brandon M. Turner (submitted). As has been mentioned several times, the EV from the RL model is used to define the drift rate, but we need to consider a transformation here because a straight unit function is not appropriate. In the double update model, during each learning phase, both the stimulus and its non-reinforced alternative are updated. The value of a stimulus is represented as 1. Thus, the maximum expected value, which the Rescorla-Wagner learning algorithm approaches, is 1, while the minimum expected value is -1. This is not an appropriate range for the drift rate, which theoretically has bounds from zero to positive infinity. The models are combined by assuming that drift rate is proportional to expected value. Because, in the double-update model, an expected value may be negative, we must use a mapping function to map the expected value,v2 [1; 1], to the drift rate2 [0;1]. The most straightforward function to map the expected value to drift rate in this way is f(v) = logit( v + 3 4 ) (3.10) One problem with the equation is that where EV = 0, the drift rate f(v) = 1:098612, which seems like an arbitrary drift rate for zero expected value. However it seems better than any alternatives, and other parameters should find equilibria fitting the drift rate. In practice, what will 42 CHAPTER3. MODELDESIGN provide variance in the model is a difference between drift rates for each item. This difference will always be in the appropriate direction. A unit mapping, f(v) =v (3.11) cannot be used because it doesn’t allow us to sample parametrs from a normal distribution and and use them to estimate values that vary between -1 and 1, without an artificial truncation at those values. Another mapping that could be used is f(v) =v + 1 (3.12) which would simply set a zero drift rate towards an item with an expected value of1, and low drift rates approaching 2 for an expected value approaching 1. Comparing the Expected Value (EV) to drift rate functions in Equation 3.10 and Equa- tion 3.11 graphically, they look like Figure 3.1. Applying a model, we would see expected value proceeding along a curve as in Figure 3.1 as the subject learns. With a learning rate of alpha=0.2, and steady learning, the change in expected value gets progressively smaller over learning iterations (Figure 3.2). Thus, the two functions would proceed as the subject learns as in Figure 3.3. Thus, as long as the subject steadily learns, the drift rate will increase linearly with learning in the first model, whereas it reaches an asymptote with the other models. A linearly, infinitely increasing drift rate seems not plausible; however, considering that the drift rate parameter itself does not linearly increase actual response rate, this may be a plausible relationship to use 3.2 Extending to a hierarchical model From the base reversal learning model created by (Ahn & Haines, 2017), I worked with Nate to extend the model out and include: • Multiple runs 3.2. EXTENDINGTOAHIERARCHICALMODEL 43 0 2 4 6 −1.0 −0.5 0.0 0.5 1.0 Expected.Value Value Function f_1 f_2 Figure 3.1: Drift rate by Expected Value with either Equation 3.10 or Equation 3.11. 44 CHAPTER3. MODELDESIGN 0.00 0.25 0.50 0.75 1.00 5 10 15 20 learning_iteration Expected.Value Figure 3.2: Expected Value by learning iteration with a learning rate of 0.2. 3.2. EXTENDINGTOAHIERARCHICALMODEL 45 0 2 4 5 10 15 20 learning_iteration Drift rate Function Expected.Value f_1 f_2 Figure 3.3: Expected value and drift rate functions by learning iteration with a learning rate of 0.2. ‘F_1’ and ‘f_2’ denote drift rate functions while ‘Expected Value’ denotes expected value 46 CHAPTER3. MODELDESIGN • Multiple run types (Reward and Punishment) • Multiple groups Ultimately, I modeled the reversal learning task over three levels as described in Section 1.4.2, (p. 16). To recap, these were: • Three experimental groups: Safe Subjects, Risk No Meth Subjects, and Risky Meth-using subjects • Each group contains approximately 30 to 60 subjects (see Section 1.4 for the exact description) • Each subject performed the task typically 4 times (some performed the task fewer times than that; some runs were dropped during data cleaning as described in Section 2.5). I estimated learning rate and other parameters across each run. For each of the three levels, it is important to ensure that the model can correctly estimate variability. To consider how to put them in the model, it is useful to consider the type of variability we expect. How many separate samples are there - are there enough to consider them drawn from a random distribution, or do we simply estimate them separately from the level below? For a given dimension, do we need to estimate the effect of a variable by modeling it as a fixed effect (as for Reward vs. Punishment, or between the three groups) or do we simply need to model the variability across that dimension (e.g., modeling multiple runs)? Subject have up to four runs (a few have three), and runs can be of two types: reward, or punishment. Variability across runs is expected to be random; there are four runs, and so parameters for runs should be modeled as drawn from a random distribution of runs for each subject. Run type is fixed, and there are only two run types, so these are simply estimated using separate parameters (or equivalently, with one parameter representing both and a second parameter estimating the difference between them) As described above the three subject groups, modeled at the highest level of a hierarchical model, are Safe, Risky No Meth, and Risky Meth subjects. Groups will be the last set of parameters I try to modify. Because there are dozens of subjects within each group, it may be that we should take advantage of the shrinkage that comes from combining each group into a single analysis. Given that we have around 50 subjects in each group, as a first pass, it seems that estimating groups separately and comparing their distributions is an acceptable way to determine whether group differences exist. Pooling variance across groups by modeling each group parameter arising from a distribution across 3.2. EXTENDINGTOAHIERARCHICALMODEL 47 groups would not do much to enhance estimation because it would be near-impossible to estimate variance across the group. 3.2.1 Configuring priors By definition, the learning rate is between 0 and 1 (Reiter et al., 2017, Supplementary Material), while inverse temperature has a larger range. In a similar model (Reiter et al., 2017, Figure2b), healthy subjects had an inverse temperature between 6 and 7. The hayesDM package (Ahn and Haines, 2017) used ahalf-cauchy(0,5) distribution, but considering the empirical findings of Reiter et al., 2017, the prior should extend well beyond the empirically found level of 6-7 - possibly up to twice this level. I therefore designed priors keeping these basic factors in mind. However, there was also work based on the empirical data in the study to optimize prior selection; see also Section 7.3.2. 3.2.2 Conjugate priors Throughout design of the hierarchical models, I have attempted to maintain the use of conjugate distributions to aid interpretability of the model parameters. A distribution is said to be a conjugate distribution if a prior distribution generates a posterior distribution in the same family. For instance, the normal distribution is a conjugate distribution because if we start with a prior probability distribution with a mean of prior and a standard deviation of prior and generate a posterior distribution with mean posterior and standard deviation posterior , then prior = posterior and prior = posterior . This is useful because we can nest levels of a particular parameter without having to change the distribution used to generate it. For instance, we can estimate the Reinforcement Learning (RL) model learning rate for a particular run using a normal prior. In turn, the prior for the run can be estimated using the subject prior, which can in turn be estimated using the group prior, all using the same distribution. In interpreting the parameters, a group posterior distribution will therefore imply a mean of subject parameters equal to the group mean. 48 CHAPTER3. MODELDESIGN I have estimated all parameters in hierarchical models from a normal distribution, and then transformed them using a transformation function when modeling them at the run level as appropriate. 3.2.3 Non-centered parameters when using stan The code below demonstrates a centered parameterization which generates and pa- rameters, samples them from a group distribution, and then samples that group distribution from a prior. for (s in 1:N){ alpha~normal(alpha_s_mu[s],alpha_s_sigma[s]); beta~normal(beta_s_mu[s],beta_s_sigma[s]); } //sample the subject means alpha_s_mu~norm(s_mu_g_mu[1],s_mu_g_sigma[1]); beta_s_mu~norm(s_mu_g_mu[2],s_mu_g_sigma[2]); //sample the subject standard deviations alpha_s_sigma~cauchy(s_sigma_g_mu[1],s_sigma_g_sigma[1]); beta_s_sigma~cauchy(s_sigma_g_mu[2],s_sigma_g_sigma[2]); //mean and variance of the subject mean, i.e., the group level mean and SD s_mu_g_mu~normal(0,1); s_mu_g_sigma~cauchy(0,5); //mean and variance of the subject-level variance s_sigma_g_mu~normal(0,1); s_sigma_g_sigma~cauchy(0,5);//maybe this should be lower... 3.3. REVERSALLEARNINGMODELEXTENSIONS 49 The Stan reference manual (p. 347) says that group-level effects often do not constrain the hierarchical distribution tightly when there are too few groups or inter-group variability is high. In our case we do have several levels with very few groups. In this instance, one solution is to move from a ‘centered parameterization’ to a non-centered parameterization. Rather than drawing samples from a distribution directly specified by hyper- parameters, we build a model to draw from a standard unit normal distribution, but constrain the model with an additional parameter. The example given in the stan reference manual is to replace the sampling of a parameterx from mu and sigma parameters: x ~ normal(mu_x, sigma_x); with x_raw ~ normal(0, 1); x = mu_x + sigma_x * x_raw; Becausex, x , and x are highly correlated, it might be hard to calculate them in the first, ‘centered’ parameterization. In the non-centered parameterization, the sampling occurs from a standard unit normal distribution, irrespective of the final values of x and x . I have generally implemented stan models using non-centered parameterizations where possible. In testing I have not found them to solve other problems I observe, but they may make estimation of parameters faster and easier. 3.3 Reversal learning model extensions In this section, I describe alternatives or extensions to the simple reinforcement learning model used to model the behavioral data. I haven’t included thorough descriptions of my implemen- tations within this thesis because preliminary analyses indicated that they did not improve on model fit relative to the simple reinforcement learning model. However, I describe them here because there are good reasons to believe some variant on these models may be useful in ongoing work to better our reversal learning task. For further discussion, see Section 8.2.1. 50 CHAPTER3. MODELDESIGN 3.3.1 Discriminability model The discriminability model models the subjects’ ability to discriminate between images, and thus correctly pick out the expected value associated with a particular image. To implement the discriminability model, I assumed a linear relationship between observa- tion of a cue and familiarity with that cue–a cue’s familiarityf c is equal to the number of times that cue has been previously seen. I modeled a 3-back comparison task. I also assumed that subjects would not confuse a target cue with the immediately previous cue because subjects can quickly learn that the same cue never repeats in our task design. Thus, I assumed that when viewing theith cue, subjects would try to remember which cuec j in the set ofj = [i 4;i 3;i 2] matched theith cue. Then, I calculate the discriminability scored of each cue. For each prior cuec j , the discriminablity betweenc i andc j is a simple additive function of the familiarity scores d(i;j) =f i +f j Thus, ifcue i has been viewed once and is being compared to three cues that have appeared 3, 4, and 5 times before respectively, then the familiarity scores will be the vector [4; 5; 6]. The true value of whether eachc j actually matchesc i can be a vectort with a value of -1 if they do not match or 1 if they match. Then the match score, which calculates a competing relative likelihoods of matching eachc j toc i , is m = logit 1 ( d td) where d represents the discriminability learning rate. Then the scaled match score, which sums to 1, is m s = m P m Then the expected value based on the relative expectations that eachc j matchesc i , as expressed by m s , is: v t = X sV where V is a matrix of expected values for each cue and each option a subject can take in response to the cue. 3.4. NEURALMODEL 51 3.3.2 OpAL model As previously described briefly in Section 1.3.3, the Opponent Actor Learning (OpAL) model (Collins & Frank, 2014) is designed to separate processes for action and for learning itself (it is also sometimes described as the ‘actor-critic’ architecture because the ‘actor’ and ‘critic’ are separated). Additionally, the model separates Go and NoGo learning. This makes it useful for neural modeling because it means separate parameters are generated for Go and NoGo learning, respectively, and these can be identified to distinct brain regions and mechanisms. For each choice, we calculate prediction error and update expected value as in Equation 3.1- 3.4 (p. 36), but rather than simply using this to calculate the probability of a particular action as a probabilistic function of expected value as in Equation 3.5 (p. 37), separate GoG t and NoGoN t actor weights are calculated for timestept: G t =G t1 + [ G G t1 (e t )];G 0:::1 (3.13) N t =N t1 + [ N N t1 (e t )];N 0:::1 (3.14) from the corresponding weight at the previous timestept 1, and the product of the applicable learning rate, the previous timestep again, and the reward prediction errore t . The difference betweenG t andN t defines the action taken, according to the softmax equation: A = G G t N N t (3.15) Pr(c t ) = logit(a) (3.16) where G ; N describe the relative weight that subjects give to Go relative to NoGo signals in general. As in the previously described reinforcement learning model, larger values in general means a higher likelihood the subject will select the response with the highest expected value. 3.4 Neural model I was not able to find any existing literature related to reversal learning that proposed systematic models for neural activities. One of the closest was the OpAL model (Section 1.3.3), but 52 CHAPTER3. MODELDESIGN even the OpAL model was mostly a model of psychological constructs, with a few proposed roles for specific dopaminergic pathways. Thus, I opted for a basic neural model for the neural-behavioral joint model. I started with the psychological processes associated with each element of the reversal learning task, and then sought neural ROIs that might correlate with those psychological processes, testing for correlation and covariance relationships with them. The reversal learning task posits the following psychological elements: • a learning mechanism, the speed of which is captured by the learning rate • EV, the value associated with a currently observed stimulus. • an error checking mechanism, which determines whether the response was correct or not and produces Reward Prediction Error (RPE) used in learning. The reward prediction error is scaled by the learning rate • visual input processing of stimuli and matching of visual stimuli to some kind of symbolic representation of the stimulus to an expected reward value. Our reversal learning model does not model this directly, but this is captured somewhat indirectly through the strength of evidence accumulation during the drift rate process. The strength of visual input processing may also be related to nondecision time at the beginning of the task. • A response inhibition mechanism. Subjects must be ready to respond to a cue as quickly as possible in this task. It is possible there is a general response inhibition mechanism inhibiting a response, until a subject makes a decision, at which time, the response inhibition is removed and movement is performed. This may be related to the posited non-decision time at the beginning of each trial. • Movement itself. During the moment of choice, we will expect to see activity related to the physical selection of an option. Having specified psychological phenomena expected to be active, I can hypothesize about neural features associated with each of these using the Neurosynth automated meta-analysis tool to explore regions associated with each function (Yarkoni, Poldrack, Nichols, Van Essen, & Wager, 2011): learning may involve various portions of the striatum, particularly the ventral striatum (Figure 3.4) valuation will likely involve the ventral striatum and medial OFC (Figure 3.5) error checking : This could include the ACC, ventromedial prefrontal cortex, and the striatum 3.5. JOINTMODELING 53 Figure 3.4: learning; an automated meta-analysis of 1144 studies from neurosynth.org (Figures 3.6, 3.7) visual input processing each portion of the visual processing system can be considered. response inhibition regions associated with response inhibition include the superior frontal gyrus, precentral gyrus and middle frontal gryus, the angular gyrus and supramarginal gyrus, the frontal pole, the inferior frontal gyrus pars opercularis, and the bilateral anterior insula (Figure 3.8) These can be obtained from the cortical Destrieux atlas (Destrieux et al., 2010) and the subcortical atlas from Fischl et al. (2002)–see Section 2.3. 3.5 Joint modeling Neural-behavioral joint modeling refers to any modeling method where we seek to find the best fit for a set of parameters, and seek to optimize both neural and behavioral parameters simultaneously. Typically, as described in Brandon M. Turner, Forstmann, Love, et al. (2016), we model a set of neural parameters or simply neural data, a parallel set of behavioral data and 54 CHAPTER3. MODELDESIGN Figure 3.5: value; an automated meta-analysis of 470 studies from neurosynth.org Figure 3.6: error; an automated meta-analysis of 464 studies from neurosynth.org 3.5. JOINTMODELING 55 Figure 3.7: prediction error; an automated meta-analysis of 93 studies from neurosynth.org Figure 3.8: response inhibition; an automated meta-analysis of 218 studies from neurosynth.org 56 CHAPTER3. MODELDESIGN behavioral parameters, and use a link function which may be a simple covariance or correlation matrix to link the two together. In this thesis, I have taken that approach. By convention, I describe the linked neural parameters as parameters, and the linked behavioral parameters as parameters. Thus, these link parameters can collectively be described as parameters. I use neural data values, collected from regions with the specific task in mind (as in Section 3.4), and test for their covariance with behavioral model properties by estimating a ‘ covariance matrix’ which describes covariance between each of the linked parameters. Because I have applied this on an individual run basis, the most useful model parameters are EV and RPE. Because the scale of all of these variables is not important, in all cases, to simplify analysis and interpretation, link parameters were normalized and thus the covariance matrix is in fact equivalent to a simple correlation matrix. We use an estimation method such asMonteCarloMarkovChain(s) (MCMC) estimation to estimate the covariance matrix alongside all other model parameters such as , k, and . This yields a posterior estimate of the covariance matrix representing the likely true values of the covariance between the neural values and the behavioral parameters. Chapter 4 Linear modeling In previous chapters, I described the psychological and statistical theory and framework that have motivated this work. I have also described the main techniques I will use to analyze the data. In this chapter, I want to characterize the dataset by exploring it using simple linear modeling techniques. What can we find out from the data without assuming a Reinforcement Learning (RL) model or any other particular psychological model? I hope this gives the reader a better understanding of the structure of the data and of what we might be able to dig into with a reinforcement learning model. In order to first explore the data, I wanted to explore some basic behavioral measures of the data to aid in construction of the Bayesian model. I explored these using hierarchical linear modeling. Although I did use the Bayesian modeling R packagerstanarm because hierarchical modeling this complex is difficult to do using GLM (Gelman and Hill, 2006, Chp. 16), I have assumed parametric distributions, and the model itself is a simple hierarchical linear model; all its parameters are regressors in the linear model. The chapter is arranged into two main sections. The first section attempts to identify predictors of correct performance in each trial given the properties of each trial and each subject. The second section shows that the electric shock punisher was effective at activating regions previously associated with pain. 57 58 CHAPTER4. LINEARMODELING Model 4.1: (Section 4.1) Correct scoring hierarchical linear model Estimation Method NUTS MCMC Software rstanarm Data type Behavioral only Model Multi-level linear Levels Trial, run, subject 4.1 Linear model In this section, I take a look at a hierarchical linear model of task performance and try to predict task behavior using only a hierarchical linear model. This is a sensible baseline against which to measure the performance of more complex hierarchical and joint models. For hierarchical linear models of increasing complexity, standard parametric estimation techniques can fail to produce accurate estimates (Gelman & Hill, 2006) . In these cases, it is advantageous to use a Bayesian approach to estimation. The rstanarm package allows for the estimation of a linear model using the Bayesian MCMC approach built into the stan package. 4.1.1 Method I thus specified a hierarchical logistic regression model to test the ability to predict correct vs. incorrect responsesC from both fixed and mixed effects predictors. The fixed effects I specified are reaction timeRT , Motivation (reward vs. punishment)M, meth useMETH, sex riskSR, and the positionP of a presentation in a pre- or post- reversal segment of cue presentations. I also specified individual intercepts and slopes on the presentation position for each subjectS and for each individual runR. I also specified a separate mixed effect intercept on each image to represent ease of recognizing any particular image cueCUE. Formula Using notation described in Cohen, Cohen, West, and Aiken (2003), the hierarchical logistic regression model can be written as 4.1. LINEARMODEL 59 log(c irsc ) = 1 RT irsc + 2 SHORTRT irsc + 3 PRESENTATION irsc + 4 MOTIV rsc + 5 METH s + 6 SR s + 6 SUBJECT s + 7 RUN r +CUE c + irs =( 001 SR s + 002 METH s + 010 M rs + 0000 +u 0rs +u 0c +v 00s )+ ( 1000 +u 1rs +v 11s )pres irsc + ( 2000 )rt irsc + ( 3000 )rts irsc + irsc (4.1) In the final form presented, among the first set of parentheses, we see trial-level intercept 0000 , random effects at the run levelu 0rs , subject levelv 00s , and across image cuesu 0c , and intercepts for each level of Motivation (Reward vs. Punishment) 010 M rs , Sex Risk 001 SR s , and Meth Use 002 METH s . The second set of parentheses describes slopes associated with presentation: the fixed-effect slope of presentation ( 1000 ), the within-run slopesu 1rs and within-subject slopesv 11s . Finally, there are also fixed effects for reaction time ( 2000 )rt irsc and reaction time< 0:1 ( 3000 ). In R, this can be expressed as correct ~ Motivation + reaction_time + (reaction_time < 0.1) + presentation_n_in_segment + MethUse + SexRisk + (1 + presentation_n_in_segment | subid/runmotiveid) + (1 | image) Reaction times recorded at less than 0.1 were considered as a separate regressor because we assume that a genuine reaction time of less than 0.1 is not possible (see Section 2.5.2); thus those recorded at less than 0.1 must be button presses unrelated to the actual stimulus, or perhaps a subject anticipating the appearance of a stimulus and responding to the anticipation rather than the stimulus itself. The specific cue presented could not be anticipated, and thus subjects could not actually perform better than chance, but they may be able to anticipate the appearance of some cue and respond to that. Result MCMC Reliability The effective sample size and ^ R values for all parameters were generally within the desired ranges (Table 4.1), with the exception of the Motivation parameter, which appeared to show good reliability from its ^ R value but may have had poor efficiency. 60 CHAPTER4. LINEARMODELING Table 4.1: Fixed effects with stan diagnostic measures mean mcse sd 2.5% 50% 97.5% n_eff ^ R (Intercept) -0.99 0 0.05 -1.10 -0.99 -0.89 1315 1.00 Motivationreward 0.02 0 0.05 -0.07 0.02 0.11 575 1.01 reaction time 0.61 0 0.04 0.53 0.61 0.70 4000 1.00 reaction time < 0.1TRUE 0.22 0 0.16 -0.09 0.22 0.53 4000 1.00 presentation n in segment 0.32 0 0.02 0.29 0.32 0.36 1078 1.00 MethUseTRUE -0.01 0 0.03 -0.07 -0.01 0.06 2825 1.00 SexRiskTRUE 0.02 0 0.03 -0.04 0.02 0.08 2547 1.00 Fixed effects Fixed effects from the regression are shown in Table 4.1. Of the parameters, reaction time and position in segment showed strong predictive effects. I was not able to detect predictive effects of Motivation type, Meth use, or Sex Risk in this model. 4.1.2 Discussion Although the logistic regression model predicting correct response did not show clear group differences, we can see within-run effects in expected and interesting directions. Subjects with slower reaction times tend to be more likely to get a correct response (95% CIb RT = [0:53; 0:70]), suggesting that subjects who respond quickly may be too hasty. With each presentation of a stimulus, within the pre-reversal or post-reversal segments, subjects improve their performance, with a roughly [7%, 9%] performance increase with each trial. This finding is useful in confirming that the data is structured in pretty much the way we’d expect from this task, with subjects gradually gaining in accuracy as they perform the task. It is also useful in helping to characterize the relationship between reaction times and correct response. This may be useful in modeling the role of reaction time in the reinforcement learning model. 4.2. PAINSIGNATURECORRELATESWITHBEHAVIORALPERFORMANCE 61 4.2 Pain signature correlates with behavioral performance In the literature, ‘negative reinforcer’ typically refers to an aversive stimulus that is removed when a correct response is performed; ‘positive punishment’ typically refers to an aversive stimulus that is applied when an incorrect response is performed, while ‘negative punishment’ refers to an reinforcer that is removed when an incorrect response is performed. ‘Negative punishment’ might include taking away money a subject has been given following an incorrect response, while ‘negative reinforcement might include’ continually applying an unpleasant sensation or loud noise until the subject performs the response being reinforced. In our dataset, subjects received positive reinforcement when they made the correct response to a trial in the positive condition, and received an unconditioned positive punisher in the form of an electric shock to the ankle following an incorrect response to a trial in the punishment condition. As far as I am aware there is no published work examining the effect of a unconditioned primary positive pain punisher on shaping human behavior in fMRI 1 . This is interesting because understanding this can help us to understand the differing neural circuitries involved in positive pain punishment compared to positive reinforcement. An fMRI punishment approach is particularly helpful because it helps us to understand the differences in a controlled experimental setting. In our joint modeling approach, a natural way to examine the differences is to examine differences in parameters for positive compared to negative reinforcement runs. We may examine model parameters such as the learning rate, as well as other parameters such as the covariance of expected value and reward prediction error with punishment cues. In this section, I describe my findings extracting neural pain signature signal from the reversal learning task data. This pain signal was later used in an attempt to constrain the model by testing the influence of pain signature intensity on subjects’ performance while selecting correct responses. Although that attempt wasn’t successful, the pain signature results are interesting in their own right because of the novelty of testing the pain signature intensity signal in this context, and they are also helpful because there may be other ways in the future for me to integrate pain signature with a Bayesian computational model of reversal learning, including in a joint model. 1 Loud noises have certainly been used in fMRI studies as an aversive stimulus, but I am unaware of whether they have been used in a behavioral task to shape behavior through positive punishment. Their status as an unconditioned aversive stimulus is also unclear, whereas an electric shock is indisputably an unconditioned aversive stimulus. 62 CHAPTER4. LINEARMODELING Neural pain signature The Neurologic Pain Signature (NPS) is a brain image from Tor Wager’s 2013 paper on brain activity correlated with pain intensity. Rather than attempting to identify clusters, Wager (2013) identified a whole-brain measure, examining the extent to which each individual voxel correlates with pain intensity. In doing so, he was able to get a very fine-grained brain image that expresses in standard space the extent to which each voxel correlates with pain intensity. NPS in this task If the NPS signature is generalizable to our subjects, then we should see that when subjects were in the punishment condition, their NPS activity should be elevated when they receive positive punishment. We can predict pain using a hierarchical linear model, in which responding correctly to a trial and punishment condition are included as independent variables. A main effect of punishment condition would suggest an extended pain feeling associated with the condition. This might be variously interpreted as residual physical pain felt after the electric shock and carrying over into subsequent trials, or as anticipatory pain or anxiety. A main effect of incorrect response would be interesting and suggest that the pain intensity signal identified by Wager et al (2013) can also capture what might be interpreted as ‘non-physical pain’ associated with failing to respond correctly. An interaction effect would suggest that our subjects really felt pain when experiencing the aversive stimulus, and that this was not associated with disappointment from failing to respond correctly. Any of the above would suggest that the NPS is generalizable to our subjects in some form. A failure to detect an effect could variously suggest that the NPS isn’t generalizable to our subjects, or that our electric shock was not substantially painful, or could suggest a simple analysis error. Predictions If the NPS similarity reliably measures pain, then we should see that the NPS score is higher in trials where subjects receive an electric shock compared to trials where they do not. More specifically, we can test the hypotheses that: 1. In the punishment runs, NPS scores should be higher in those trials where subjects receive an electric shock than those where they did not, i.e., where they made an incorrect response or 4.2. PAINSIGNATURECORRELATESWITHBEHAVIORALPERFORMANCE 63 Model 4.2: (Section 4.2) Pain activity hierarchical logistic regression model Estimation Method NUTS MCMC Software rstanarm Data type Behavioral and neural Model Multi-level logistic regression Levels Trial, run, subject; image made no response compared to when they made a correct response. 2. The difference between incorrect or non-responses and correct responses should be substan- tially stronger in the punishments than in the reward runs, where subjects were being rewarded for correct responses, but not punished for incorrect responses. (a) If this is indeed the case, then we have evidence the NPS is measuring the brain response to physical pain. (b) If there is no difference, then the NPS might be detecting non-physical pain, or a negative signal which is not physical pain. 4.2.1 Method After the NPS scores were obtained (Section 2.1), I tested these two postulates using a hierarchical linear model. 607 runs across 161 subjects in the reversal learning dataset were added into a hierarchical linear model. As in Section 4.1, I used rstanarm to do this analysis in a Bayesian model in order to improve performance. We represent subject asS, run asR, and image cue asc. We also include Response (correct or incorrect) and Motivation: Using slightly modified notation described by Cohen et al. (2003), Raudenbush and Bryk (2002), and Snijders and Bosker (1999), this can be written as: V irsc = 1 RESPONSE irsc + 2 PRESENTATION irsc + 3 MOTIV rsc + 4 MOTIV rs RESPONSE irs +SUBJECT s +CUE c + irs =( 010 M rs + 0000 +u 0rs +u 0c +v 00s ) + ( 1000 +u 1rs +v 11s )pres irsc + ( 211 M rs + 2000 )resp irsc + irsc (4.2) 64 CHAPTER4. LINEARMODELING In the second part of the equation, the regressors in the first set of parentheses represent intercepts at the trial level ( 0000 ), run level (u 0rs ), subject level (v 00s ), for each image cue (u 0c ), and for each value of Motivation (Reward, Punishment), a run-level variable, 010 M rs . The regressors in the second set of parentheses describe the slopes of stimulus presentation (from the first presentation to the last, within each pre- and post- reversal segment): a fixed effect slope at the trial-level ( 1000 ), and random effects within runs (u 1rs ) and subjects (v 11s ). The third set of parentheses contain an intercept ( 2000 ) and Motivation interaction of the trial level variableresp irsc indicating whether or not the subject’s response was correct. In R, we specify this model using the formulation ValueScaled~ (ResponseCorrect==FALSE) + Motivation + presentation_n_in_segment + (1+presentation_n_in_segment | subid/runmotiveid) + (1 | image) This model will predict Value using fixed-effects Response, Motivation, Presentation, con- trolling for both the intercept and presentation slope of subject and runs nested within subject, and also controlling for the intercept of each specific image used as cues. Neural pain signature values were mean centered for each subject across all runs. 4.2.2 Results The fixed effects of ResponseCorrect and presentation in segment are shown in Table 4.2. In this model, there was a main effect of Punishment, indicating that even when controlling for the specific experience of pain during the incorrect trials in the punishment condition, there was evidence overall of slightly more pain experience in the Punishment condition than in the Reward condition. With an estimate ofb = 0:102 (CI=[0:079; 0:124]), the interaction of Incorrect Response and Motivation is strong and indicates that there was a brain response to an incorrect outcome in the electric shock punishment condition but not in the reward condition. The interaction is particularly clear in the subject aggregates in Table 4.2 which show a large difference in NPS activity between incorrect and correct trials in the Punishment condition but not in the Reward condition. 4.2. PAINSIGNATURECORRELATESWITHBEHAVIORALPERFORMANCE 65 Table 4.2: Fixed effects for the hierarchical NPS Pain model described in Equation 4.2. mean mcse sd 2.5% 50% 97.5% n_eff Rhat (Intercept) -0.017 0.001 0.034 -0.084 -0.018 0.049 678 1.002 ResponseIncorrect -0.002 0.000 0.009 -0.019 -0.002 0.015 4000 0.999 PunishmentMotivation -0.075 0.001 0.024 -0.123 -0.074 -0.029 1440 1.002 presentation N in segment 0.011 0.000 0.002 0.007 0.012 0.015 4000 1.000 (ResponseIncorrect Pun- ishmentMotivation) 0.102 0.000 0.011 0.079 0.102 0.124 4000 0.999 The main effect of motivation actually turns out to be in the reverse of the predicted direction: the magnitude of the NPS signal is lower for the Pain condition than the Reward condition (b =0:075, Ci=[0:074;0:029]). It should be remembered that when we combine this with the interaction term,b Motiv=punish +b Motiv=PunishResponse=Incorrect , we see that the predicted deviation from baseline is 0:102 0 0:075 =0:075 during the Punishment condition when there is no Punishment signal (i.e., response is correct), but 0:102 1 0:075 = 0:027 during the Punishment condition in trials that do contain a Punishment signal. This effect may arise from normalization of the NPS signal of the Reward and Punishment conditions separately. The graphs in Figures 4.1-4.2 below shows subjects’ mean pain responses in the reward and punishment conditions, when they got a response correct or a response incorrect. Subjects who did not have at least one run in each of the motivation conditions were excluded. The 95% HDI for the beta value for ResponseCorrectPunishment interaction was [-0.11, -0.04], reflecting the high probability that an incorrect response would lead to a pain signal in the Punishment condition only. 4.2.3 Discussion This may be the first time that a generalized NPS has been connected to an electric shock stimulus. The data suggests that although there was a difference in NPS visible within punishment condition between correct and incorrect responses, the level of pain observed during the punishment 66 CHAPTER4. LINEARMODELING Figure 4.1: Subject mean NPS value by Motivation condition and subject response. In the punishment condition, pain signature was higher with incorrect responses than correct responses. However, the NPS is not notably higher in the Punishment condition, likely because Reward and Punishment runs were normalized independently. 4.2. PAINSIGNATURECORRELATESWITHBEHAVIORALPERFORMANCE 67 Figure 4.2: Difference in subject mean NPS value between correct and incorrect responses, by Motivation condition. In the punishment condition, NPS was higher for incorrect responses than correct responses. However, there was very little difference in the reward condition. 68 CHAPTER4. LINEARMODELING condition was comparable to the level of pain experienced in the reward condition. This may be an artefact of the analysis process: because they were separate runs, each were overall mean-centered separately. Overall, the result confirms the initial hypothesis. There was a substantial difference between wrong and correct trials in the punishment condition. There was no evidence for residual pain in the punishment condition: correct responses in the punishment condition had, if anything, lower pain responses than correct responses in the reward condition (Figure 4.1). There was also no evidence for a difference in experienced pain between correct and incorrect responses in the reward condition (Figure 4.2), suggesting that only physical pain through a positive punisher triggered the NPS response while negative reinforcement (losing the opportunity for a reward after an incorrect response in the reward condition) did not yield the same result. 4.3 General Discussion In this chapter, I have shown that a hierarchical logistic regression model can predict correct responses based on subject reaction times and the number of trials they’ve had to learn to recognize a stimulus. I have also shown that we can predict pain experienced by subjects, as measured by the NPS, by examining whether they got a response correct or not, but only in the Motivation trials where they were actually punished with an electric shock. In other words, as expected, the NPS uniquely picks out a pain punisher in the Punishment condition and not simply the absence of a reward when the subject gets a trial wrong in the Reward condition. Chapter 5 Simple reinforcement learning model This chapter describes the first class of reinforcement learning model attempted for this thesis. This model is a simple Bayesian No U-Turn Sampler (NUTS)MonteCarloMarkovChain(s) (MCMC) Reinforcement Learning (RL) model. I modeled behavior through a reinforcement learning model as discussed in Section 3.1.1. Rather than a racing diffusion model, I used a simple softmax algorithm to determine subjects’ likelihood of performing the response most likely to be reinforced based on prior learning. The model extends Nate Haines’ Double Update Model for Reversal Learning (Ahn & Haines, 2017). Nate modified the version available in his package hBayesDM to work with our dataset, which is a deterministic Reversal Learning task. I have since worked with him to incrementally extend it to handle two different tasks (reward and punishment learning) and repeated runs. In the first section of this chapter, I first explore the difference between applying Variational Bayes and a MCMC method as means for calculating Bayesian posteriors. In that section, I describe the concepts of ‘reliability’, ‘representativeness’, and ‘accuracy’, which are useful diagnostic metrics for assessing the success of a Bayesian model. Then, I describe a three-level behavioral model using the same paradigm and what it tells us about Risky Meth vs. Non-Meth subjects. In this process I explore an attempt at integrating reversal learning in to this simple model. Finally, I describe my attempt at a Bayesian joint model with neural data using this behavioral model, and speculate on reasons why the joint model doesn’t work. 69 70 CHAPTER5. SIMPLEREINFORCEMENTLEARNINGMODEL Model 5.1: (Section 5.1) Variational Bayes and MCMC Model Estimation Method Comparing Variational Bayes vs. NUTS MCMC Software rstan Data type Behavioral only Model Reinforcement learning only Levels 2-level: Single-run, multiple-subject, with group estimate Parameters , , both unit normal distribu- tion transformed from normal 5.1 Comparing Variational Bayes and MCMC Variational Bayes and NUTS MCMC are two methods provided in stan for estimating a Bayesian mode. Both methods are iterative. In this section I make a brief comparison of the two estimation methods using the simple RL model. Variational Bayes is known to be less reliable than MCMC, but it is possible it is still useful for running analyses on this dataset. In order to test its performance against a baseline level of replicability and against MCMC, in this section I run a simple RL with both MCMC and Variational Bayes. I want to compare the methods in terms of their speed and reliability. I also want to examine how each of the methods perform in terms of MCMC diagnostics - representativeness, accuracy, and efficiency. Together, these three diagnostic measures allow us to understand whether a model is functioning well (Kruschke, 2014). 5.1.1 Method Here, I compare using the model double_update.stan, which processes only one run of reward or punishment data. I completed these analyses before applying the non-centered parameterization methods described in Chapter 3, Section 3.2.3; without those optimizations, preliminary testing showed the full model was too slow to run on our system using MCMC. I run each model three times, using both MCMC and Variational Bayes, for both Group 2 and Group 3. In this example I focus ondouble_update.stan, which will analyze reward or 5.1. COMPARINGVARIATIONALBAYESANDMCMC 71 punishment data separately. Each time the model is run another time, the seed value for running the model changes; however, within each test group, each time the model is run (i.e., for separate runs or separate groups) the set seed remains a constant value. 5.1.2 Results MCMC has clear benefits over the quick Variational Bayes method. But how do we determine reliability, accuracy, and efficiency? Kruschke (2014) describes reliability, accuracy, and efficiency as important metrics for measuring the performance of a hierarchical model. In this section I will review these measures and how they can be applied to test the performance of a particular model applied in MCMC. Speed Across three iterations and two subject groups, calculated independently, the basic double update model was consistently calculated much faster using Variational Bayes, and generally faster when not calculating trial posteriors (Table 5.1). If Variational Bayes is just as reliable as MCMC for this problem, there would seem to be no reason to use MCMC, given Variational Bayes’ speed. Considering the difference between calculating without trial posteriors and calculating with them, using Variational Bayes appears to be much faster. MCMC and Variational Bayes Reliability Estimations for Punishment data appeared to differ somewhat when both runs and when reward data was added, as we’d expect. Is Variational Bayes reliable? At least using the current number of iterations and other modeling parameters, Figure 5.1 shows that Variational Bayes is not at all reliable across repeated analyses. How does MCMC do? Our test is of how consistently MCMC performs across repeated analyses. In contrast to Variational bayes, MCMC seems to perform very consistently across runs 72 CHAPTER5. SIMPLEREINFORCEMENTLEARNINGMODEL alpha beta 2 3 0.0 0.1 0.2 0.3 0.4 0.5 1 2 3 4 5 0 20 40 60 80 0 20 40 60 80 Value count Analysis Repetition: 1 2 3 α μ and β μ punishment rounds for DU model Figure 5.1: and values for the punishment rounds using Variational Bayes within the double update model. Each time the analysis is repeated, we yield a somewhat different result. (Figure 5.2). Thus, in subsequent estimates there is no need to look at more than one analysis repetition. Table 5.1: This table shows the vast difference in speed between MCMC and Variational Bayes. This data was pulled from an earlier version of the comparison, so it’s not directly descriptive of the rest of the data presented in this section, but the relative speed differences between MCMC and Variational Bayes were consistent across all runs. Model Estimation Method Mean Range Basic double update with trial posteriors vb 116 98 to 131 s Basic double update with trial posteriors MCMC 13092 1613 to 20055 s Basic double update without trial posteriors vb 13 8 to 19 s Basic double update without trial posteriors MCMC 2517 451 to 10095 s 5.1. COMPARINGVARIATIONALBAYESANDMCMC 73 alpha beta 2 3 0.1 0.2 0.3 0.4 0.5 0.6 0.4 0.6 0.8 1.0 0 20 40 60 80 0 20 40 60 80 Value count Analysis Repetition: 1 2 3 α μ and β μ punishment rounds for DU model Figure 5.2: and values for the punishment rounds using MCMC within the double update model. In contrast to Variational Bayes as displayed in Figure 5.1, results are very consistent each time the analysis is completed. 74 CHAPTER5. SIMPLEREINFORCEMENTLEARNINGMODEL MCMC Representativeness, accuracy, and efficiency Considering the wide HDIs produced by our MCMC estimates, I want to take a closer look to see if the MCMC estimates for this task are representative, accurate, and efficient (Kruschke, 2014). Representativeness To assess Representativeness for an MCMC algorithm, we need to see whether chains have converged such that initial randomly-chosen priors are not related to final values. We can visually examine trace plots, examine the density intervals (Figure ??) or density plots (Figure 5.4), and examine the Gelman-Rubin statistic (Brooks and Gelman, 1998; Kruschke, 2014) or the “potential scale reduction factor” or “shrink factor”. Inrstan, this is available as the statistic b R. The b R values should be close to 1–and definitely within 10% of 1 (0.91 to 1.1). It appears that one MCMC failed to converge but others converged separately. Visually, the chains exhibit good convergence (Figure 5.4), except for some minor differences between chains for Group 3, No Trial Posteriors. We can also examine stan output for these, as in the output in Figure 5.6. Accuracy To assess the accuracy of the chains, we need to take into account the effective sample size (ESS)–how much independent data actually exists in our analysis. To do this, we need a measure of autocorrelation. From ESS, we can get a measure of Monte Carlo Standard Error. We can view and measure autocorrelation in a number of ways, but the most straightforward for stan is to measure the effective sample size. Effective sample size is provided in the summary of any stan model output, listed separately for each parameter, and we can simply read this off the list of parameters. The “n_eff” columns in Figure 5.6 describe the effective sample size. The effective sample size is a posterior distribution sample size, adjusted down to account for samples that are non-independent due to autocorrelation in the MCMC chains. Although Kruschke (2014) recommends an effective sample size of 10,000 for estimating an HDI, in order to make processing more tractable, in this work I have sometimes chosen smaller sample sizes. In the example above, increasing chains from 6 to 12 will double the number of effective iterations, and an additional increase by 443% post-warmup iterations per chain, i.e., from 1000 to 4430, would enable us to 5.1. COMPARINGVARIATIONALBAYESANDMCMC 75 mu_p[1] mu_p[2] sigma[1] sigma[2] alpha_pr[1] alpha_pr[2] alpha_pr[3] alpha_pr[4] alpha_pr[5] alpha_pr[6] −1 0 1 2 3 group= 2 double_update_notrialpost 1 MCMC vars= 337 (a) mu_p[1] mu_p[2] sigma[1] sigma[2] alpha_pr[1] alpha_pr[2] alpha_pr[3] alpha_pr[4] alpha_pr[5] alpha_pr[6] −2 0 2 group= 3 double_update_notrialpost 1 MCMC vars= 192 (b) Figure 5.3: MCMC parameter estimate density intervals (one run) 76 CHAPTER5. SIMPLEREINFORCEMENTLEARNINGMODEL mu_p[1,1] mu_p[1,2] mu_p[2,1] mu_p[2,2] sigma[1,1] sigma[1,2] sigma[2,1] sigma[2,2] mu_p_rm[1,1,1] mu_p_rm[1,1,2] −1 0 1 2 3 group= 2 double_update_rpo_repeated_runs_notrialpost 1 MCMC vars= 1181 (c) mu_p[1,1] mu_p[1,2] mu_p[2,1] mu_p[2,2] sigma[1,1] sigma[1,2] sigma[2,1] sigma[2,2] mu_p_rm[1,1,1] mu_p_rm[1,1,2] −1 0 1 2 3 group= 3 double_update_rpo_repeated_runs_notrialpost 1 MCMC vars= 739 (d) Figure 5.3: MCMC parameter estimate density intervals (repeated runs) 5.1. COMPARINGVARIATIONALBAYESANDMCMC 77 alpha_pr[5] alpha_pr[6] alpha_pr[1] alpha_pr[2] alpha_pr[3] alpha_pr[4] mu_p[1] mu_p[2] sigma[1] sigma[2] 1 2 3 4 1 2 3 −1 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 −1.2 −0.8 −0.4 0.0 −1.3 −1.2 −1.1 −1.0 −0.9 1.0 1.5 2.0 0.1 0.2 0.3 0.4 log value chain 1 2 3 4 5 6 group= 2 double_update_notrialpost 1 MCMC vars= 337 (a) alpha_pr[5] alpha_pr[6] alpha_pr[1] alpha_pr[2] alpha_pr[3] alpha_pr[4] mu_p[1] mu_p[2] sigma[1] sigma[2] −2 0 2 0 1 2 3 4 −4 −3 −2 −1 0 1 0 1 2 −3 −2 −1 0 1 0 1 2 3 −1.6 −1.2 −0.8 −0.4 −1.4−1.3−1.2−1.1−1.0−0.9 0.4 0.8 1.2 1.6 0.0 0.2 0.4 0.6 log value chain 1 2 3 4 5 6 group= 3 double_update_notrialpost 1 MCMC vars= 192 (b) Figure 5.4: MCMC parameter estimate density intervals (one run), chains drawn separately 78 CHAPTER5. SIMPLEREINFORCEMENTLEARNINGMODEL mu_p_rm[1,1,1] mu_p_rm[1,1,2] sigma[1,1] sigma[1,2] sigma[2,1] sigma[2,2] mu_p[1,1] mu_p[1,2] mu_p[2,1] mu_p[2,2] 0.0 0.2 0.4 0.6 2 3 4 0.8 1.0 1.2 1.4 0.2 0.3 0.4 0.5 0.75 1.00 1.25 1.50 1.75 0.2 0.3 0.4 −0.9 −0.6 −0.3 −1.1 −1.0 −0.9 −0.8 −0.8 −0.4 0.0 −1.4 −1.3 −1.2 −1.1 −1.0 log value chain 1 2 3 4 5 6 group= 2 double_update_rpo_repeated_runs_notrialpost 1 MCMC vars= 1181 (c) mu_p_rm[1,1,1] mu_p_rm[1,1,2] sigma[1,1] sigma[1,2] sigma[2,1] sigma[2,2] mu_p[1,1] mu_p[1,2] mu_p[2,1] mu_p[2,2] −0.4 0.0 0.4 0 1 2 3 4 0.5 1.0 1.5 2.0 0.2 0.4 0.6 0.8 1.2 1.6 0.0 0.2 0.4 0.6 −1.5 −1.0 −0.5 0.0 −1.4 −1.2 −1.0 −1.6 −1.2 −0.8 −0.4 −1.4−1.3−1.2−1.1−1.0−0.9 log value chain 1 2 3 4 5 6 group= 3 double_update_rpo_repeated_runs_notrialpost 1 MCMC vars= 739 (d) Figure 5.4: MCMC parameter estimate density intervals (repeated runs), chains drawn separately 5.1. COMPARINGVARIATIONALBAYESANDMCMC 79 ## Inference for Stan model: double_update_notrialpost. ## 6 chains, each with iter=2000; warmup=1000; thin=1; ## post-warmup draws per chain=1000, total post-warmup draws=6000. ## ## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat ## mu_p[1] -0.57 0 0.17 -0.90 -0.68 -0.57 -0.45 -0.22 1639 1 ## mu_p[2] -1.10 0 0.05 -1.20 -1.13 -1.10 -1.06 -1.00 2626 1 ## sigma[1] 1.16 0 0.16 0.89 1.05 1.15 1.26 1.52 2999 1 ## sigma[2] 0.25 0 0.05 0.16 0.21 0.24 0.27 0.35 1809 1 ## ## Samples were drawn using NUTS(diag_e) at Thu Oct 5 18:09:05 2017. ## For each parameter, n_eff is a crude measure of effective sample size, ## and Rhat is the potential scale reduction factor on split chains (at ## convergence, Rhat=1). ## [1] "Increase in sample size required to get an ESS of 10^5 at this level of efficiency:" ## mu_p[1] mu_p[2] sigma[1] sigma[2] ## 6.1 3.8 3.3 5.5 ## ## ## Inference for Stan model: double_update_notrialpost. ## 6 chains, each with iter=2000; warmup=1000; thin=1; ## post-warmup draws per chain=1000, total post-warmup draws=6000. ## ## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat ## mu_p[1] -0.89 0 0.17 -1.23 -1.00 -0.90 -0.78 -0.56 1779 1.00 ## mu_p[2] -1.11 0 0.07 -1.25 -1.15 -1.11 -1.07 -0.99 2664 1.00 ## sigma[1] 0.86 0 0.15 0.60 0.75 0.84 0.95 1.20 2426 1.00 ## sigma[2] 0.18 0 0.07 0.06 0.14 0.18 0.23 0.33 1517 1.01 ## ## Samples were drawn using NUTS(diag_e) at Thu Oct 5 18:16:08 2017. ## For each parameter, n_eff is a crude measure of effective sample size, ## and Rhat is the potential scale reduction factor on split chains (at ## convergence, Rhat=1). ## [1] "Increase in sample size required to get an ESS of 10^5 at this level of efficiency:" ## mu_p[1] mu_p[2] sigma[1] sigma[2] ## 5.6 3.8 4.1 6.6 Figure 5.5: rstan raw output for the double update model examining just one run at a time. 80 CHAPTER5. SIMPLEREINFORCEMENTLEARNINGMODEL ## Inference for Stan model: double_update_rpo_repeated_runs_notrialpost. ## 6 chains, each with iter=2000; warmup=1000; thin=1; ## post-warmup draws per chain=1000, total post-warmup draws=6000. ## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat ## mu_p[1,1] -0.54 0 0.13 -0.79 -0.63 -0.55 -0.46 -0.28 1552 1 ## mu_p[1,2] -0.96 0 0.05 -1.06 -0.99 -0.96 -0.93 -0.87 2259 1 ## mu_p[2,1] -0.55 0 0.17 -0.87 -0.67 -0.56 -0.45 -0.22 1186 1 ## mu_p[2,2] -1.10 0 0.05 -1.20 -1.14 -1.10 -1.07 -1.01 2365 1 ## sigma[1,1] 0.97 0 0.11 0.78 0.89 0.96 1.03 1.20 2450 1 ## sigma[1,2] 0.26 0 0.04 0.19 0.23 0.26 0.29 0.35 2310 1 ## sigma[2,1] 1.17 0 0.14 0.92 1.06 1.15 1.25 1.48 2661 1 ## sigma[2,2] 0.25 0 0.05 0.17 0.22 0.25 0.28 0.35 1904 1 ## ## Samples were drawn using NUTS(diag_e) at Thu Oct 5 20:11:15 2017. ## For each parameter, n_eff is a crude measure of effective sample size, ## and Rhat is the potential scale reduction factor on split chains (at ## convergence, Rhat=1). ## [1] "Increase in sample size required to get an ESS of 10^5 at this level of efficiency:" ## mu_p[1,1] mu_p[1,2] mu_p[2,1] mu_p[2,2] sigma[1,1] sigma[1,2] ## 6.4 4.4 8.4 4.2 4.1 4.3 ## sigma[2,1] sigma[2,2] ## 3.8 5.3 ## ## ## Inference for Stan model: double_update_rpo_repeated_runs_notrialpost. ## 6 chains, each with iter=2000; warmup=1000; thin=1; ## post-warmup draws per chain=1000, total post-warmup draws=6000. ## ## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat ## mu_p[1,1] -0.70 0.01 0.20 -1.10 -0.83 -0.70 -0.57 -0.32 1190 1 ## mu_p[1,2] -1.14 0.00 0.08 -1.30 -1.19 -1.14 -1.08 -0.99 1662 1 ## mu_p[2,1] -0.88 0.00 0.17 -1.22 -0.99 -0.88 -0.77 -0.55 1944 1 ## mu_p[2,2] -1.12 0.00 0.07 -1.25 -1.16 -1.11 -1.07 -1.00 3318 1 ## sigma[1,1] 0.99 0.00 0.17 0.70 0.87 0.97 1.10 1.37 2560 1 ## sigma[1,2] 0.30 0.00 0.08 0.16 0.25 0.30 0.35 0.48 1128 1 ## sigma[2,1] 0.86 0.00 0.14 0.63 0.76 0.85 0.95 1.18 2995 1 ## sigma[2,2] 0.19 0.00 0.07 0.06 0.14 0.18 0.23 0.34 1305 1 ## ## Samples were drawn using NUTS(diag_e) at Thu Oct 5 21:18:58 2017. ## For each parameter, n_eff is a crude measure of effective sample size, ## and Rhat is the potential scale reduction factor on split chains (at ## convergence, Rhat=1). ## [1] "Increase in sample size required to get an ESS of 10^5 at this level of efficiency:" ## mu_p[1,1] mu_p[1,2] mu_p[2,1] mu_p[2,2] sigma[1,1] sigma[1,2] ## 8.4 6.0 5.1 3.0 3.9 8.9 ## sigma[2,1] sigma[2,2] ## 3.3 7.7 Figure 5.6: rstan raw output for the double update model examining all reward and punishment runs simultaneously. 5.1. COMPARINGVARIATIONALBAYESANDMCMC 81 reach 10,000. However, if the posterior distribution appears to show a regular and smooth unimodal distribution then the sample size is probably adequate enough. If we were to aim for the 10,000 ESS that Kruschke advocates for, 95% HDIs of posteriors, we would need a very long iterative process. Monte Carlo Standard Error is calculated as MCSE = SD = p ESS or the standard formulation for standard error, but with effective sample size used in place of actual sample size. We can view Monte Carlo Standard Error using the stan functionstan_mcse. MCMC Efficiency Efficiency simply describes how efficient the estimation algorithm is at calcu- lating our model’s result. This includes not only measures like autocorrelation but also other model performance features that could potentially slow it down. Efficiency is one of Brandon M. Turner, Forstmann, Wagenmakers, et al.’s (2013) key arguments for the use of DE-MCMC over other forms of MCMC, such as the Metropolis-Hastings algorithm, or NUTS or Hamiltonian Markov Models (personal communication) as implemented in Stan. 5.1.3 Discussion MCMC proved to yield stable results that, after a 1000-iteration warm-up, with one notable exception, were representative, accurate, and somewhat efficient. Compared to vb, it yielded posteriors that were reliable, but much less informative. Variational Bayes yielded posteriors that seemed informative, but across repetitions, could be shown to be completely unreliable. The efficiency wasn’t too bad; anESS=SS ratio of between 0.2 and 0.4 seems acceptable, but falls well short of Kruschke’s recommendation for a sample size of around 10,000 to estimate an HDI. However, if we get stable, convergent estimates then it may be less important to reach a an sample size of this size; we can focus on estimating parameters themselves. Because of Variational Bayes’ unreliability in this task, I move forward to run estimates in MCMC rather than Variational Bayes. 82 CHAPTER5. SIMPLEREINFORCEMENTLEARNINGMODEL Model 5.2: (Section 5.2) Three-level RL Model Estimation Method NUTS MCMC Software rstan Data type Behavioral only Model Reinforcement learning only Levels 3-level: multiple-run, multiple-subject, with group estimate Parameters , , both unit normal distribu- tion transformed from normal 5.2 Three level model Having found a firm advantage for MCMC over Variational Bayes, I proceeded to analyze the data using stan MCMC, and built a three-level model in which we can analyze multiple runs, subjects, and measure the difference between reward and punishment runs. The purpose of this section is to examine whether, taking all runs into account, we can observe differences in learning rate and inverse temperature among the different groups and conditions. The analysis is over three levels, and includes all trials across all valid subjects, runs, and conditions from all of the behavioral data we have collected, although it does not use the reaction time data. 5.2.1 Method The model presented in this section has been changed and modified several times, but the version presented here is my "Revision 8". The model includes one group of subjects in each MCMC execution (i.e., Safe No Meth, Risky No Meth, or Risky Meth), but in contrast to the model in the previous section, allows us to model in all runs per subject. Runs can be either reward, punishment, or unspecified. Each subject has an individual parameter specifying difference between reward and punishment runs, and these are drawn from a group-level distribution of runs. I ran Revision 8 usingrstan MCMC. The model was run separately on each of the three risk groups. Like the model in the previous section, this model is a reinforcement-learning model that estimates a learning rate and inverse temperature, as described in Section 3.1.1. is an estimate of the subject’s speed of learning S-R associations, and quantifies the likelihood on each 5.2. THREELEVELMODEL 83 run that subjects will in fact select the stimulus with the highest expected value. Reaction time model As an extension, I also tested the reaction time modifier described in Chapter 3, Section 3.1.2 in which reaction time modifies the overall analysis. However, these models did not improve on the simple softmax reaction time model. The analysis was a separate analysis to that described below, so the results below don’t reflect that reaction time model, but the two analyses have otherwise identical methods. 5.2.2 Results This section describes the results of the analysis, first presenting some diagnostic statistics and then examining group differences. Diagnostics To assess Representativeness for an MCMC algorithm, we can examine b R (see Tables 5.2- 5.4). For Group 3, specifically for the group beta parameter, the effective sample size is just 189, indicating poor sampling for that value. This can result from poor starting values, or a more complex distribution of values to model. Because the same method was used for setting starting values and priors for all groups, it is likely that there was a more complex underlying distribution for the Risky Meth Group, and this has caused the difficulty in estimating. Recall that estimates the odds that the subject chooses the option with the highest expected value. There were no detected overall differences in value of the Risky Meth Group, whose median actually sat between the median of the other two groups. However, if is hard to estimate, it may be this is because there’s a wide or non-normal distribution across prioritizing Expected Value for subjects in the Risky Meth group in particular. For all other variables of interest, and for all other groups, b R was in the appropriate range. 84 CHAPTER5. SIMPLEREINFORCEMENTLEARNINGMODEL Group differences Figures 5.7-5.8 show that there is substantial overlap in estimated alpha and beta parameters within each model. Table 5.2: Double Update Model (rev8) Safe No Meth Group vars= 931 mean se_mean sd n_eff b r group_mu_alpha 0.33 0 0.05 2055.30 1.01 group_mu_beta 0.55 0 0.05 754.39 1.01 group_rew_mu_alpha 0.38 0 0.08 2537.63 1.00 group_pun_mu_alpha 0.29 0 0.07 2437.59 1.01 Table 5.3: Double Update Model (rev8) Risky No Meth Group vars= 1199 mean se_mean sd n_eff b r group_mu_alpha 0.34 0 0.05 6833.29 1 group_mu_beta 0.73 0 0.05 4741.86 1 group_rew_mu_alpha 0.30 0 0.06 6511.53 1 group_pun_mu_alpha 0.38 0 0.07 5993.03 1 Table 5.4: Double Update Model (rev8) Risky Meth Group vars= 687 mean se_mean sd n_eff b r group_mu_alpha 0.26 0.00 0.05 1919.88 1.01 group_mu_beta 0.65 0.01 0.08 188.82 1.05 group_rew_mu_alpha 0.31 0.00 0.08 659.24 1.02 group_pun_mu_alpha 0.21 0.00 0.06 2377.73 1.01 We can compare Motivational conditions by subtracting the posteriors derived for one condition with the posteriors derived in the other condition. Comparing reward and punishment estimates (Figure 5.8a), there is not a clear estimate of a difference of overall performance between 5.2. THREELEVELMODEL 85 (a) (b) Figure 5.7: and posterior estimates by group, (a) across Motivational conditions and (b) separated by motivational condition 86 CHAPTER5. SIMPLEREINFORCEMENTLEARNINGMODEL the two motivational conditions. We can also compare the groups, looking at group differences overall, and also specifically at group differences within each reward and punishment conditions (Figure 5.8b). We use the same method as before: subtracting the posteriors derived in one group from the posteriors derived in another group. In the reward condition, there’s no difference apparent in learning rate between the Meth- using group and the other groups. However, in the punishment condition, there may be some evidence of a difference, with the Highest Density Interval of the difference just crossing zero. Comparing Group 2 with Group 3, there may be a difference between alpha learning rates in the punishment condition. The highest density interval for the difference between Group 3 minus Group 2 still en- compassed zero. We can also look at a multivariate 95% highest density interval. This uses the R functionMASS::cov.mve to estimate the ellipsoid with the minimum volume encompassing 95% of the points. The bivariate confidence interval does not confirm a difference between the means of Group 2 and Group 3 in the punishment condition Besides , we can also consider , which quantifies distribution of subject learning rates within the sample and within each group. Figure 5.8 describes the differences in values between Group 2 and Group 3; no difference in their posterior estimates were apparent. 5.2.3 Discussion This analysis demonstrates that it is in principle possible to run a three-level behavioral model in MCMC on our reversal learning dataset. By examining behavioral data we are able to glean insight into behavioral patterns. In this case, we found some evidence for a lower learning rate, at least in the punishment condition, for the Risky Meth subjects compared to the Risky No Meth subjects. The behavioral model also serves as a useful baseline for testing the efficacy of joint modeling. It may be that some insights are available using a joint model that cannot be gleaned 5.2. THREELEVELMODEL 87 (a) (b) Figure 5.8: Group differences in and posterior estimates: (a) differences in Motivational condition (Reward-Punishment); (b) differences in Group (Group3-Group2, i.e., Sexually Risky Meth - Sexually Risky No Meth). and estimates across Motivational conditions, and estimates within condition 88 CHAPTER5. SIMPLEREINFORCEMENTLEARNINGMODEL Figure 5.9: Bivariate posterior distribution of Group 3 minus Group 2 posterior estimate. Blue shaded dots are estimates falling with the 95% minimum volume ellipsoid for that particular statistic. using the method presented here. Then next section will examine an attempt to joint model the same simple RL model. 5.3. NEURAL-BEHAVIORALJOINTMODELATTEMPT 89 Model 5.3: (Section 5.3) Three-level RL Model Estimation Method MCMC Software rstan Levels 3-level: multiple-run, multiple-subject, with group estimate Parameters , , both unit normal distribu- tion transformed from normal Data type Joint model Behavioral Model Reinforcement learning only Neural Model Activity during trials in L & R Accumbens, L & R suborbital gyrus Table 5.5: Means, SDs, and HDIs for the estimates shown in Figure 5.8b Statistic Mean SD HDI mu -0.0848639 0.0699501 [-0.22,0.06] sigma 0.0220197 0.0813884 [-0.13,0.18] rew_mu 0.0126976 0.1022954 [-0.18,0.21] pun_mu -0.1761860 0.0943302 [-0.37,0.00] 5.3 Neural-behavioral joint model attempt I ran Revision 8 usingrstan MCMC, running separately on each of the three risk groups. This time, the behavioral RL model is modeled jointly with related neural data. 5.3.1 Joint modeling Method The model was the same as in the previous section, except that a correlation matrix was added in which the correlation between four brain regions was estimated. The implementation for the joint model covariance matrix is discussed in Section 7.2. 90 CHAPTER5. SIMPLEREINFORCEMENTLEARNINGMODEL Table 5.6: Double update joint model Safe No Meth Group vars=967 mean se_mean sd n_eff b R group_mu_alpha 0.76 0.11 0.26 6 148.40 group_mu_beta 3.46 1.71 4.18 6 145.12 group_rew_mu_alpha 0.80 0.09 0.22 6 125.53 group_pun_mu_alpha 0.72 0.12 0.30 6 202.09 5.3.2 Results Diagnostics As in the previous section, I assess representativeness for an MCMC algorithm by looking at the Gelman-Rubin statistic (Gelman & Rubin, 1992) chain convergence statistic, b R. As can be seen in the tables, b R values are substantially outside the acceptable range of b r < 1:05. The n_eff column offers a hint why these b R values are so poor: for some reason, this model was not able produce acceptable samples. Most likely, the prior distributions used do not closely enough represent the space inhabited by the data, and so the sample space cannot be explored effectively. Without a clearly stable estimate, we cannot sensibly interpret the posterior estimates for this dataset. 5.4. GENERALDISCUSSION 91 5.3.3 Joint Model Discussion In this model, I attempted to extend the working three-level model into a three-level joint model. Unfortunately, the model could not be stably estimated. There may be several ways to improve on this implementation. First, it may be that another approach to modeling across the runs is needed. In this implementation, I simply modeled the theta-delta correlation using the same matrix for all runs. In practice, this will vary between runs. It may be more suitable to generate the matrix using a hyper-parameter and generate a posterior sample from that hyperparameter for each run, as we do for the and parameters. Alternatively, theta-deltas across all runs could be inserted into a single large matrix and the correlation of each could be measured in that matrix. This might complicate the interpretation, because it would include both within- and between-run variance. The current design strictly measures the within-run correlation of theta and delta values in a way that is generalizable to other runs. However, we could overcome some of the problem by normalizing the data from each run separately. It is also possible that simply running the model for a larger number of iterations may yield a stable solution. I have seen from the single level models (Section 7.4) that it is unlikely there are strong correlations between the neural and behavioral parameters. This should not be a problem for the joint model presented here, because the prior for the correlation matrix allows for zero correlations. 5.4 General Discussion In this chapter, I modeled our reversal learning data using a simple reinforcement learning model, using a two-level model to compare VB and MCMC estimation methods, using a three-level behavioral model to test for group differences, and then using a three-level joint model to test for correlations between neural and behavioral data. I found some evidence for a greater learning rate among Group 2 (Risky Sex No Meth) compared to Group 3 (Risky Sex Meth) subjects. I also found that I couldn’t rely on Variational Bayes for reliable estimates in this dataset, so despite its superior speed, I have stuck to MCMC analyses for interpretation. 92 CHAPTER5. SIMPLEREINFORCEMENTLEARNINGMODEL I also was not able to generate a joint model here. One thing that might complicate the data is that we are only applying a simple reinforcement learning model. As we saw in Chapter 4, reaction time is in fact related to performance in this task. Therefore, using a model that incorporates reaction time might help to better understand the data. For that reason, the next chapter will examine a model that incorporates reaction time using the drift diffusion model. Chapter 6 Differential Evolution MCMC Bayesian model In the previous chapter, I examined the reversal learning data using a simple reinforcement learning model, and found evidence of a difference in learning rates between Group 2 (Non-Meth Users) and Group 3 (Meth Users). However, I wasn’t able to create a joint model using this framework. Additionally, our group estimates were quite tight. As a result, I wanted to try and get a better model that might enable me to better model the underlying data. In order to do that, in this chapter, I implemented Brandon Turner’s Reinforcement-Learning Drift Diffusion Model. At the same time, I shifted to aDifferentialEvolutionMonteCarloMarkovChains(s) (DE-MCMC) implementation. DE-MCMC is known from Brandon Turner’s work to produce good joint models, and it had always been the aim to demonstrate estimation using DE-MCMC model once we had got a basic behavioral model implemented. I’ll first outline new model design aspects which haven’t been introduced previously, pri- marily the DE-MCMC algorithm and the RL-Drift Diffusion model design. Then, I’ll describe my two-level model and the results we were able to obtain from that. Following the two-level model, I’ll describe my three-level model design, the work I did to adapt the two-level model to a three-level model, and then the results of the model I attempted. 93 94 CHAPTER6. DIFFERENTIALEVOLUTIONMCMCBAYESIANMODEL 6.1 Model design This section outlines two model design features that have not been implemented in this thesis yet. The first is implementation of the DE-MCMC algorithm. The second is the Reinforcement learning drift diffusion model design. 6.1.1 DE-MCMC algorithm DE-MCMC is an implementation of Monte Carlo Markov Estimation pioneered in psychol- ogy by Ter Braak (2006) and Brandon M. Turner, Forstmann, Wagenmakers, et al. (2013). The iterative algorithm consists of two key steps. In the first step, two chains are randomly selected, and a uniformly random proportion of the parameters within the chain is selected for modification. For those parameters, a proportion of the difference between the chains is taken and added to another chain. A key tuning parameter for DE-MCMC is the parameter, which specifies the proportion of the difference to take between chains. Following this, the new proposal for each chain is probabilistically accepted. If the likelihood for the new, proposed value is greater than the current likelihood, then it is automatically accepted. Otherwise, if the difference between the proposal and current log likelihood islog(p 0 ), then the proposal is accepted with probabilityp 0 . In the second step, occurring only during the warm-up phase of the MCMC, a few values are probabilistically selected for swaps between chains. This acts as an accelerator and a form of evolutionary propagation of values between the chains, speeding up their journey toward the highest-likelihood probability space. Gamma tuning parameter DE-MCMC combines two steps usually separate in MCMC for each iteration of the MCMC algorithm, crossover and mutation (Brandon M. Turner, Forstmann, Wagenmakers, et al., 2013). It does this by taking the difference between the value of a parameter on two different chains and adjusting the value of a third chain by a weighted proportion of that difference, . In this way, the chains control the distributions from one another and ensure convergence on a solution. has been traditionally set to = 2:38= p 2d, whered is the number of parameter space dimensions. 6.1. MODELDESIGN 95 Migration step The migration step in DE-MCMC swaps the values between two chains. At each MCMC, iteration, there is an 0.1 probability of a migration proposal. A randomly selected set of parameters are selected to be migrated between a randomly selected set of chains. Then the proposal is automatically accepted if it has a higher likelihood than the current set of values. If it has a lower likelihood, then it is accepted probabilistically if the difference in probabilities between the new proposal and the current set of weights is less than than a uniform randomly selected threshold between 0 and 1. Later in this chapter I will describe how modifying the probability of a migration proposal changed the estimation iteration in this chapter. Key functions When implementing in native R using DE-MCMC, there are several key functions that structure the running of the model itself. Those are outlined here. In order to estimate the likelihood of each proposed set of parameters for a single-level model, there are three key functions: 1. log.dens.prior() uses the priors to calculate the likelihood density of the currently proposed parameters based on the priors. 2. log.dens.like.m1() uses the data to calculate the likelihood density of currently proposed parameters based on the data. To do this, it calculates the Expected Value (EV) and the Reward Prediction Error (RPE) of the RL model and the drift diffusion model itself, as described in Section 3.1.1 and Section 3.1.2. Likelihoods are not calculated for parameters in the RL model itself; rather, those produce a drift rate, equal to the EV at each trial, that is passed to the drift diffusion model, which gives the log probability. 3. get.dens.2choice() contains the drift diffusion model parameter. Other functions in this model work to perform the crossover and migration steps–producing each proposal. These are described in detail in Brandon M. Turner, Sederberg, et al. (2013); some modifications including tweaking the rate are discussed in Section 6.3.1. 96 CHAPTER6. DIFFERENTIALEVOLUTIONMCMCBAYESIANMODEL 6.1.2 RL-Drift Diffusion model design Section 3.1.2 describes implementing a racing diffusion model. Here, I describe my imple- mentation of Brandon M. Turner (submitted) racing diffusion model to our group data. We built a model estimating a racing diffusion process model, combining a basic update model with a racing diffusion model. Although the previous basic double update model used a learning rate () and inverse temperature () parameter, for this model, I introduce a racing diffusion process model. For this model, in addition to the learning rate, which performs the same role as previously, we estimated two subject-level parameters:k decision boundary threshold and non-decision time. In order to estimate these parameters, we have not only the overall performance but also subject reaction times. With a racing diffusion process estimate, the inverse temperature parameter is redundant, because the racing diffusion model (configured using the boundary thresholdk, evidence accumulation speedv, and non-decision time) itself introduces a subject-level degree of uncertainty into the final selection. Evidence accumulation speed is set to a function of the expected value from the Reinforcement Learning (RL) model. In this way, the outcome of the RL model influences the drift diffusion. This is how we control the overall likelihood of the model. In summary, in this model, we will estimate learning rate, decision boundary thresholdk, and non-decision time. We use the reinforcement learning model to calculate learning and the learning rate, and a function of the resultant EV is used to set the drift rate in the drift diffusion model. Starting values for hyper-parameters Functionsupdate.mu.vector andupdate.sigma.vector are used to initialize hyper- parameters. update.mu.vector is sampled from a normal distribution specified with a mean of prior 2 prior + P X 2 1 2 prior + n 2 (6.1) and a sigma of 6.2. TWO-LEVELMODEL 97 Model 6.1: (Section 6.2) Two-level RL DD Model Estimation Method DE-MCMC Software Turner native R implementation Levels 2-level: single-run, multiple-subject, with group estimates Model Reinforcement learning with Racing Diffusion Process Data type Behavioral only Parameters learning rate, a unit normal dis- tribution transformed from normal; threshold k and non-decision time, both exponential distributions transformed from normal ( 1 2 prior + n 2 ) 1=2 (6.2) update.sigma.vector is the square root of an inverse gamma function with the parameters = prior + n 2 ; = prior + 0:5 X (X) 2 (6.3) 6.2 Two-level model In this section, I describe a two-level model that uses a single run for each subject and calculates group statistics for all subjects in a group using MCMC. We can then compare the posterior estimates for each group. 6.2.1 Method Initial values optimization and implementation There were few changes made to the optimization and initialization of the initial values design compared to Brandon M. Turner, Forstmann, Love, et al.’s (2016) design. In their design, optimization for group-level parameters used the R function optim to minimize the log density likelihood of the initial values given the data. Initialization of subject-level values started with a set 98 CHAPTER6. DIFFERENTIALEVOLUTIONMCMCBAYESIANMODEL of preconfigured initial values. These were treated as means in a random normal distribution, and for each subject a sample was taken in order to get a measurement. Initialization of group-level parameters was slightly more complex. The DE-MCMC sampling statementupdate.mu.vector was used to estimate group-level mean and variance estimates. I started with this design, but did make some modifications as outlined below. Running Following the pattern laid out in Brandon Turner’s DE-MCMC implementation, the model built here contains several key functions. The prior log density is calculated by calculating the log density summed across all first-level parameters,thresh, and, based on their respective second-level hyperpriors. All are modeled using a normal distribution before being transformed for exploration in the model itself. Note that these functions sum the log likelihoods, equivalent to taking the log of the product of the likelihood probabilities: dens=sum(dnorm(level_1_params["alpha"], level_2_params["alpha_mu"], level_2_params["alpha_sigma"],log=TRUE)) + sum(dnorm(level_1_params["thresh"], level_2_params["thresh_mu"], level_2_params["thresh_sigma"],log=TRUE)) + sum(dnorm(level_1_params["tau"], level_2_params["tau_mu"], level_2_params["tau_sigma"],log=TRUE)) The log density likelihood is calculated using the following calculations, which incorporate both the RL model and the Drift Diffusion model: dens=0 ev=matrix(0,100,2) v_t=matrix(0,nt,2) 6.2. TWO-LEVELMODEL 99 #transform alpha from a normal to a unit normal distribution. alpha_tr<-f_alpha_s_tr(as.numeric(x_s[which(par.names=="alpha")])) for(tr in 1:use.data_s$choice){ if (use.data_s$choice[tr]!=0) { #this must come first - this represents the choice being made. # there is some transformation based on ev and beta needed # before a drift rate can be obtained v_t[tr,]=invlogit(ev[use.data_s$cue[tr],]) # prediction error PE = use.data_s$outcome[tr] - ev[use.data_s$cue[tr],use.data_s$choice[tr]] PEnc = -use.data_s$outcome[tr] - ev[use.data_s$cue[tr],3-use.data_s$choice[tr]] # value updating (learning) ev[use.data_s$cue[tr],3-use.data_s$choice[tr]] = ev[use.data_s$cue[tr],3-use.data_s$choice[tr]] + alpha_tr * PEnc; ev[use.data_s$cue[tr],use.data_s$choice[tr]] = ev[use.data_s$cue[tr],use.data_s$choice[tr]] + alpha_tr * PE; } } thresh_s_tr<-f_thresh_s_tr(x_s[which(par.names=="thresh")]) tau_s_tr<-f_tau_s_tr(x_s[which(par.names=="tau")] ) dens.2choice= log(wald.pdf.raw(t[idx1],alpha[1],v[idx1,1],theta[1]) * (1-wald.cdf.raw(t[idx1],alpha[2],v[idx1,2],theta[2]))) + log(wald.pdf.raw(t[idx2],alpha[2],v[idx2,2],theta[2]) * 100 CHAPTER6. DIFFERENTIALEVOLUTIONMCMCBAYESIANMODEL (1-wald.cdf.raw(t[idx2],alpha[1],v[idx2,1],theta[1]))) dens=dens+dens.2choice Using these two functions, the likelihood of the data at the individual run level was assessed. Then, the likelihood of the group,thresh, and parameters can be measured based on the model priors. 6.2.2 Results Diagnostics Traditional diagnostic measures like ^ R are not necessarily applicable for DE-MCMC, so in order to diagnose the performance of chains here, I simply present visual evidence of chain convergence. For group-level (Figure 6.1),k (Figure 6.2), and (Figure 6.3) values, chains seem to have converged nicely at the iteration where I began sampling. We can also look at the distribution of posterior samples for each parameter across each group. Figure 6.4 below shows parameter distributions for,thresh, and values. Overall, we can see that group parameters appear to have converged. Chains are distributed equally across the space, and distributions for all chains are similar. In order to verify the method has converged nicely, we should also take a look at individual-level parameters. I also examined the chain diagrams (Figure 6.5) and distributions (Figure 6.6) for 10 subjects selected sequentially along the list of subjects. Subject-level distributions were not as clean as group-level distributions but considering the group level distributions are what we’re most interested in, these seem to converge to an acceptable degree. Comparing groups There does appear to be some discrepancy between the estimates for values for both the reward and punishment groups. In particular, the starkest difference appears between Meth and No Meth (Figure 6.4). 6.2. TWO-LEVELMODEL 101 Figure 6.1: parameter chains over 1000 iterations in the MCMC analysis 102 CHAPTER6. DIFFERENTIALEVOLUTIONMCMCBAYESIANMODEL Figure 6.2:k parameter chains over 1000 iterations in the MCMC analysis 6.2. TWO-LEVELMODEL 103 Figure 6.3: parameter chains over 1000 iterations in the MCMC analysis 104 CHAPTER6. DIFFERENTIALEVOLUTIONMCMCBAYESIANMODEL Figure 6.4: Distribution of mean,, andthresh for each of the two punishment and two reward runs in the two-level model 6.2. TWO-LEVELMODEL 105 Figure 6.5: parameter chain for each of the four runs for 10 pseudorandomly-selected subjects. Missing chains are instances where the subject was missing a run, or where that run was removed according to the pre-existing criteria. Some chains do show signs of non-convergence, but overall, they appear to have stabilized and converged to a single mean. 106 CHAPTER6. DIFFERENTIALEVOLUTIONMCMCBAYESIANMODEL Figure 6.6: ,, and thresh parameter distributions for the 10 subjects and four runs. All 10 subjects for a particular run and parameter are shown overlapping in different colors. Generally, the subjects have a high degree of overlap for the alpha parameter, but other parameters show a high degree of between-subject variation. 6.2. TWO-LEVELMODEL 107 Figure 6.7: Distribution comparing each of the three groups’ values with each other, separately for punishment and reward. The first row shows RiskyMeth minus RiskyNoMeth, the second row shows RiskyMeth minus SafeNoMeth, and the third row shows RiskyNoMeth minus SafeNoMeth. All results here are presented in their normalized form, i.e., in the form in which they were originally sampled before they were transformed into the model distribution. We can examine the difference between the Meth and NoMeth groups in Figure 6.7. The graph in the top left shows that the 95% HDI for the alpha parameter for the reward-motivated runs appears to be somewhat less for the Risky Meth compared to the Risky No Meth group, with 0 just above the 95% HDI (Table 6.2). There are no other group differences apparent within either the reward or punishment conditions (see also Table 6.1). 108 CHAPTER6. DIFFERENTIALEVOLUTIONMCMCBAYESIANMODEL Figure 6.8: A repeat of Figure 6.7, but combining punishment and reward posterior distributions. 6.2. TWO-LEVELMODEL 109 Table 6.1: 95% Highest Density Interval for Punishment Condition Comparison alpha tau thresh RiskyMeth - RiskyNoMeth [-2.53, 0.30] [-0.38, 0.26] [-0.23, 0.35] RiskyMeth - SafeNoMeth [-2.07, 0.76] [-0.31, 0.38] [-0.31, 0.32] RiskyNoMeth - SafeNoMeth [-1.29, 2.11] [-0.17, 0.35] [-0.28, 0.19] Table 6.2: 95% Highest Density Interval for Reward Condition Comparison alpha tau thresh RiskyMeth - RiskyNoMeth [-2.81, -0.0074] [-0.36, 0.24] [-0.24, 0.32] RiskyMeth - SafeNoMeth [-2.34, 0.79] [-0.30, 0.36] [-0.28, 0.31] RiskyNoMeth - SafeNoMeth [-0.64, 1.73] [-0.16, 0.35] [-0.27, 0.20] Table 6.3: 95% Highest Density Interval for all condition Comparison alpha tau thresh RiskyMeth - RiskyNoMeth [-2.68, 0.19] [-0.37, 0.25] [-0.24, 0.34] RiskyMeth - SafeNoMeth [-2.22, 0.79] [-0.31, 0.36] [-0.29, 0.32] RiskyNoMeth - SafeNoMeth [-1.05, 1.96] [-0.16, 0.35] [-0.28, 0.19] Looking at the group differences across motivations (Figure 6.8, Table 6.3), we can see that the 95% highest density interval sits just above zero, meaning that in a null hypothesis significance test we would be unable to reject a null hypothesis of no difference between groups. However, because we started with a prior of no difference, a distribution of this character is (albeit weak) credible evidence that the Risky Meth group learned the task more slowly than the Risky No Meth group. parameters (variability across subjects) did not substantially differ between groups in any consistent way. 110 CHAPTER6. DIFFERENTIALEVOLUTIONMCMCBAYESIANMODEL 6.2.3 Discussion Overall, there is some evidence presented here of a greater learning rate among the sexually risky, non-meth-using group compared to the sexually risky, meth-using group. The difference reaches a significance level using a 95% HDI for the Reward Condition, but not quite for the Punishment condition. As we have got some useful information here, we might consider extending the two level model into a joint model. With two levels of data in a single model, there is enough to consider neural covariance with both trial level values (e.g., EV and RPE) and subject-level parameters (e.g., learning rate). However, the strongest priority I had was to extend this to a three level model because this would allow us to take into account all the data simultaneously. Particularly considering the varied estimates apparent for reward compared to punishment conditions, it might be helpful to consider all the data together in a single model so that an overall measure can be taken. 6.3 Three level model It’s uncertain how much we would gain from a three-level model. In the two-level model, we do apply appropriate partial pooling of variance across subjects. Effectively, by gathering 4 runs instead of 1, we increase the sample size by a factor of four. With approximately 30 subjects in each group already, this might not gain much extra power. However, a three-level model would certainly be more elegant and parsimonious. Consider- ing the considerable variation I observed in the previous section across the four runs, particularly in the Risky Sex Meth group, a three level model that contains an overall parameter might be instruc- tive. Finally, a three-level model has never been implemented in psychology using DE-MCMC, to our knowledge, so the implementation would represent a genuine methdological novelty, and allow us to explore what DE-MCMC can do for three level models that other methods cannot. 6.3. THREELEVELMODEL 111 Model 6.2: (Section 6.3) Three-level RL DD Model Estimation Method DE-MCMC Software Turner native R implementation Levels 3-level: Multiple-run, multiple-subject, with group estimates Model Reinforcement learning with Racing Diffusion Process Data type Behavioral only Parameters learning rate, a unit normal dis- tribution transformed from normal; threshold k and non-decision time, both exponential distributions transformed from normal 6.3.1 Method The three level model places run-level parameters at the first level, subject-level parameters at the second level, and group-level parameters at the third level. Every parameter is simply some measure of learning rate (), threshold (thresh), or non-decision-time (). In a two-level model, for each of those three parameters, we estimate each parameter once per run. At the second level, we estimate a and a hyper-parameter for each first-level parameter, yielding 6 second-level parameters overall: , ,k ,k , , and . Priors for the second-level hyper-parameters need to be specified. We could estimate all of these as hyper-parameters themselves, or we could make some simplifying assumptions to reduce the number of hyper-parameters at the third level. For instance, denotes the variance of the parameter across runs for a particular subject. That variance could itself vary across subjects, or we might make a simplifying assumption that all subjects have the same run-level variance, or we could make a more moderate simplification by assuming that all subjects within each group have the same run-level variance. With a three-level model, we also have the choice of assuming whether the variance of subjects within groups is fixed across groups or whether it varies. I modeled group-level and values to express the mean and variance of the three pa- rameters across subjects. However, I did not model any group-level run variance. All across-run, within-subject variances were modeled separately for each subject from weakly informative priors. For both level 2 and level 3, normal distribution priors were set to =3;thresh = 112 CHAPTER6. DIFFERENTIALEVOLUTIONMCMCBAYESIANMODEL log(2); =log(0:6min(rt)), where the prior for non-decision time was calculated based on the minimum reaction time across all subjects. Prior distribution values were set to be relatively uninformative in the transformed space. For this model and all other models, all parameters were modeled in normal distributions, and were transformed into appropriate distributions (see also Section 3.2.2, p. 47; Section 3.2.2, p. 144). Optimization and initialization: Initial values Optimization followed the same standard DE-MCMC process that was applied for the two-level model. Initialization had to use an extra level. The extra level mimicked the top level in estimation in that the initial values were sampled from the optimized parameter distribution, and used theupdate.mu.vector andupdate.sigma.vector commands to initialize. To expand the model to three levels, decisions had to be made about the tuning parameter and migration step. This section explains those decisions. Gamma tuning parameter Recall, as introduced in the beginning of the chapter, that the tuning parameter controls the proportion of the randomly-selected difference used to iterate chains during the proposal distribution process. has been traditionally set to = 2:38= p 2d, whered is the number of parameter space dimensions. When testing the three-level model, I found that this setting often yielded values that would be set to ‘NAN’ in the drift diffusion probability density function because they fell too far outside the distribution of plausible proposals, and thus there were very few accepted proposals and the chains could not consistently generate plausible samples. In order to deal with this, I reduced the gamma numerator from 2:38 to 0:3. When using this value, the model proceeded to run much more smoothly. This would have come at the cost of the model taking many more iterations than it otherwise would to stabilize and reach the best possible solution, but it was necessary to get the model to start properly. 6.3. THREELEVELMODEL 113 Migration step Recall that the migration probability is applied to part of the pre-sampling phase of estima- tion, and accelerates convergence of the chains onto a solution. I often found that at the beginning of the analysis, one or more chains would be stuck in an outlying part of parameter space that could not be easily iterated from. By applying the migration step, these chains were able to escape from that space and iterate into space occupied by other chains. I boosted this migration probability from 0.1 to 0.2 and found that this would be much more conducive to producing iterative samples and new proposals that are accepted from one time point to the next. Initially the migration step was only applied at level 1; however, I implemented it at level 2 and this was helpful for generating values at level 2. 6.3.2 Results Unfortunately, diagnostics for the three level model indicated a lack of convergence–see the three level model section. I began by attempting to run an analysis with just 15 subjects. Run-level data often did not properly converge, and didn’t produce useful distributions. This might matter less, except that subject-level data estimating parameters across a subject’s runs also frequently failed to converge on a solution across chains. This was true even when I extended the analysis to a large number of iterations, up to 10,000 iterations. When analyzing the data with 15 selected subjects, group-level chains did converge nicely, but didn’t differ much from the prior distribution. However this is to be expected with such a small number of subjects. I also ran the analysis with a full group of subjects, at the same number of 10,000 iterations. At time of publication, the full output from our analysis is available via our server here. At the level 1 run-level, chains sometimes fail to converge, but generally show convergence on a set of values. Often, one or two chains of the total set of 24 do not properly initialize. At the level 2 subject level, again, chains often failed to converge. At the level 3 group level, we do generally get proper convergence and adequate chain distribution at least for the values, although values sometimes showed poor convergence. 114 CHAPTER6. DIFFERENTIALEVOLUTIONMCMCBAYESIANMODEL For the small set of just 15 subjects, including just 5 per group, there are less likely to be outlying values among the subjects, including subjects whose chains do not converge or subjects whose parameters are outliers. For the full subject set, there is a greater probability of non- converging subjects, or outlying subjects, and thus, the estimate for the group is less stable. This would explain why our estimates for the small groups seem more stable than the estimates for large groups. I also ran the analysis with a full group of subjects and 16,000 burn-in iterations, with 4,000 iterations subsequent to that for a total of 20,000 iterations (here). These did yield somewhat narrower group-level parameter estimates, suggesting that with an even greater number of iterations, parameters might be able to be estimated more precisely still. However, even with this number of iterations, chains did not appear to be entirely stationary, but showed signs of drift over time, indicating that parameter estimates, even though they were narrower, may still fail to be reliable. A lack of convergence in some outlying subjects, possibly due to an insufficient number of iterations given the efficiency of the algorithm, would suggest that as the number of subjects increases, a stable estimate of group-level values becomes more difficult. Thus, the slight improve- ment in parameter estimates observed for the 20,000 iteration analysis lends some further credibility to that explanation for the difference in the 15-subject analysis to the full-group analysis. Discussion Modeling subject-level sigma values without attempting to enforce any constraint over these values at the higher level may be the reason why our data was not able to properly converge. A higher-level hyper-parameter specifying the run-level variance across all the subjects could constrain the variance of run-level parameters. This might tighten the estimates for each subject, and in turn help to reveal group differences and reward-punishment differences which might not be detectable otherwise. One feature of writing a ‘bespoke’ Differential Evolution MCMC algorithm in R as I have done here with Brandon Turner is that each level of the model can be coded to iterate (migrate, crossover, and mutate) as separate steps. This has the advantage that we can apply the steps for each level one at a time, and the changes will immediately have an effect on values calculated at the next level. 6.3. THREELEVELMODEL 115 (a) (b) Figure 6.9: MCMC three level model attempt, Selected Level 1 (Run parameters) diagnostic graphs 116 CHAPTER6. DIFFERENTIALEVOLUTIONMCMCBAYESIANMODEL (c) (d) Figure 6.9: MCMC three level model attempt, Selected Level 1 (Run parameters) diagnostic graphs (Set 2 of 11) 6.3. THREELEVELMODEL 117 (e) (f) Figure 6.9: MCMC three level model attempt, Selected Level 1 (Run parameters) diagnostic graphs (Set 3 of 11) 118 CHAPTER6. DIFFERENTIALEVOLUTIONMCMCBAYESIANMODEL (g) (h) Figure 6.9: MCMC three level model attempt, Selected Level 1 (Run parameters) diagnostic graphs (Set 4 of 11) 6.3. THREELEVELMODEL 119 (i) (j) Figure 6.9: MCMC three level model attempt, Selected Level 1 (Run parameters) diagnostic graphs (Set 5 of 11) 120 CHAPTER6. DIFFERENTIALEVOLUTIONMCMCBAYESIANMODEL (k) (l) Figure 6.9: MCMC three level model attempt, Selected Level 1 (Run parameters) diagnostic graphs (Set 6 of 11) 6.3. THREELEVELMODEL 121 (m) (n) Figure 6.9: MCMC three level model attempt, Selected Level 1 (Run parameters) diagnostic graphs (Set 7 of 11) 122 CHAPTER6. DIFFERENTIALEVOLUTIONMCMCBAYESIANMODEL (o) (p) Figure 6.9: MCMC three level model attempt, Selected Level 1 (Run parameters) diagnostic graphs (Set 8 of 11) 6.3. THREELEVELMODEL 123 (q) (r) Figure 6.9: MCMC three level model attempt, Selected Level 1 (Run parameters) diagnostic graphs (Set 9 of 11) 124 CHAPTER6. DIFFERENTIALEVOLUTIONMCMCBAYESIANMODEL (s) (t) Figure 6.9: MCMC three level model attempt, Selected Level 1 (Run parameters) diagnostic graphs (Set 10 of 11) 6.3. THREELEVELMODEL 125 (u) (v) Figure 6.9: MCMC three level model attempt, Selected Level 1 (Run parameters) diagnostic graphs (Set 11 of 11) 126 CHAPTER6. DIFFERENTIALEVOLUTIONMCMCBAYESIANMODEL (a) Figure 6.10: MCMC three level model attempt, Level 2 (Subject parameters) diagnostic graphs However, one side-effect of this is that the parameter estimates at each level may be too separated from one another. Proposals for level 3 parameters are considered independently of their eventual effect on level 1 parameters and the data itself. The data only indirectly shapes the likelihood of acceptance for level 3 parameter proposals through shaping the likelihood of acceptance for level 2 proposals. In the other direction, proposals for level 1 parameters use level 2 parameters as priors, but level 3 parameters do not directly influence the acceptance of a level 1 parameter proposal. Level 3 parameters can only indirectly influence the proposals one level at a time. Depending on the severity of this problem for generating efficient proposals, this may be the reason that level 2 run parameters frequently did not converge, and that level 3 parameters stuck very close to simply reflecting the prior distribution. 6.4 General Discussion I implemented a two-level DE-MCMC model and get some results from that implementation, for all subjects and all runs. Thus, I was able to implement a two-level model by analyzing each 6.4. GENERALDISCUSSION 127 (b) (c) Figure 6.10: MCMC three level model attempt, Level 2 (Subject parameters) diagnostic graphs: Subjects 2-3 128 CHAPTER6. DIFFERENTIALEVOLUTIONMCMCBAYESIANMODEL (d) (e) Figure 6.10: MCMC three level model attempt, Level 2 (Subject parameters) diagnostic graphs: Subjects 4-5 6.4. GENERALDISCUSSION 129 (f) (g) Figure 6.10: MCMC three level model attempt, Level 2 (Subject parameters) diagnostic graphs: Subjects 6-7 130 CHAPTER6. DIFFERENTIALEVOLUTIONMCMCBAYESIANMODEL (h) (i) Figure 6.10: MCMC three level model attempt, Level 2 (Subject parameters) diagnostic graphs: Subjects 8-9 6.4. GENERALDISCUSSION 131 (j) Figure 6.10: MCMC three level model attempt, Level 2 (Subject parameters) diagnostic graphs: Subject 10 run separately in DE-MCMC. The results suggested that Group 3 (Sexually Risky Meth Users) had slower learning rates than Group 2 (Sexually Risky Non-Meth Users) during the Reward condition, but not necessarily the Punishment condition. Unfortunately, I was unable to get the three-level DE-MCMC model to produce a convergent estimate of the model parameters. Because I wanted to implement a joint model, and was unable to implement a three-level model here, I moved back, in the next chapter, to working in stan. Although the RL-LBA model is theoretically simpler than the RL-DD model, it is still able to take reaction time into account within the model. It may also be simpler to implement a joint model using RL-LBA. 132 CHAPTER6. DIFFERENTIALEVOLUTIONMCMCBAYESIANMODEL (a) (b) Figure 6.11: MCMC three level model attempt, Level 3 (Group parameters) diagnostic graphs 6.4. GENERALDISCUSSION 133 Figure 6.11: MCMC three level model attempt, Level 3 (Group parameters) diagnostic graphs 134 CHAPTER6. DIFFERENTIALEVOLUTIONMCMCBAYESIANMODEL Chapter 7 Linear ballistic accumulator Bayesian model Previously, I used a basic RL model implemented in stan and estimated using NUTS MCMC to explore behavioral differences among risk groups (Chapter 5). That investigation yielded a difference between risk groups. I then moved on to Differential Evolution Monte Carlo Markov Chains(s) (DE-MCMC) (Chapter 6), because it was a pre-determined aim to compare DE-MCMC to other state-of-the- art methods, because DE-MCMC is known from Brandon Turner’s work to produce good joint models, and because a racing diffusion process model was available for DE-MCMC. Unfortunately, there were substantial problems with implementing a three-level model in DE-MCMC–this has actually never been done before–so in order to get a chance to implement a joint model, I moved to implement a Reinforcement Learning (RL) Linear Ballistic Accumulator. See Section 7.1 (LBA) model instan using No U-Turn Sampler (NUTS)MonteCarloMarkovChain(s) (MCMC). This chapter describes that effort and the results I obtained from it. Because of difficulties implementing a full, three-level model, I built a single-level joint model and examined neural correlates of the behavioral measures we extracted from the model. The joint behavioral-neural correlations were interesting and are worth considering further in future research. 135 136 CHAPTER7. LINEARBALLISTICACCUMULATORBAYESIANMODEL 7.1 RL LBA Model The RL LBA model implemented here is similar to the racing diffusion process model implemented in Chapter 6. Learning is captured by a standard reinforcement learning model as described in Section 3.1.1. Rather than using a softmax function to determine choice likelihoods at each trial, we use theExpectedValue (EV) at each trial from the RL model as the drift rate in the LBA model, just as EV was used as the drift rate in the racing diffusion process model in Chapter 6. This section will describe the LBA model and its difference from the racing diffusion process model used in Chapter 6. The reason I switched to the LBA model was ultimately convenience: an implementation of the LBA model forstan has been recently published by Annis et al. (2017). Both LBA and the racing diffusion process model focus on describing accumulating and competing evidence for two or more alternatives that eventually result in the selection of the first choice that reaches some evidence threshold. Both model a non-decision time, a period typically around 100-400 ms, after a stimulus is shown but before any evidence begins to accumulate. Then evidence begins to accumulate, according to some speed v, which differs for each alternative. Eventually the threshold (quantified byk,thresh, orb for boundary in Annis et al., 2017) for one or the other alternative is reached and that alternative is selected. Theoretically, LBA differs from the racing diffusion process model in that it assumes a linear rather than noisey accumulation of information. It was introduced as a series of simplifications of existing racing diffusion models by S. Brown and Heathcote (2005) and S. D. Brown and Heathcote (2008). Both are estimated using response times choices, and in my application, drift rate as determined by EV. In both models, there are several parameters that may be set in advance or estimated: non-decision time, response threshold, and drift rate. The LBA model also randomly distributes the relative response threshold on a uniform distribution from a specified starting evidence A to the set absolute threshold. It also models drift rate on any particular trial randomly (although still linearly) and allows for the setting of the standard deviations based on that randomness. In my application, I have set the starting evidenceA to a low constant and have also set the drift rate SD to 1. Thus, there is minimal difference between the two. 7.2. JOINTMODELIMPLEMENTATION 137 7.2 Joint model implementation In order to implement the neural-behavioral joint model, it was necessary to create a covariance matrix which measures the covariance between each measure or parameter - the neural and the behavioral parameters. In this single-level joint model, I measured the joint covariance between selected regions, as discussed below, and two psychological values, RewardPrediction Error (RPE) and EV. I followed recommended stan best practice to use the following process for creating a covariance matrix: 1. Generate a lower-half Omega matrix using an LKJ cholesky correlation matrix function with an parameter of 1. 2. Generate a lower-half sigma matrix from a cauchy distribution with a = 0; = 1, equal to the number of linked parameters. 3. Get the product of the diagonal matrix formed from the lower-half sigma value and the lower-half Omega matrix. 4. Generate a Sigma matrix using the product of the lower-half sigma matrix and its transpose. 5. Normalize each linked parameter by subtracting its mean and dividing the parameter by its standard deviation. 6. Estimate the log probability of the normalized parameter matrix given the Sigma matrix. The normalized sigma matrix is effectively a correlation matrix. The model was the same as in the previous section, except that a correlation matrix was added in which the correlation between four brain regions was estimated. To add the correlation matrix, several changes needed to be made to the stan code. First, in the parameters block, we declare two additional parameters. In the transformed parameters block, we also define theSigma matrix itself; the name is preceded with an ‘L’, indicating the matrix only stores the lower half of the matrix. parameters{ ... cholesky_factor_corr[TD_N] L_Omega; vector<lower=0>[TD_N] L_sigma; 138 CHAPTER7. LINEARBALLISTICACCUMULATORBAYESIANMODEL } transformed parameters{ ... matrix[TD_N, TD_N] L_Sigma = diag_pre_multiply(L_sigma, L_Omega); ... } Next, the model block contains the command to specify the distribution from which priors for Sigma would be drawn are specified. The model block also contains the key commands to estimate the correlation between the data. model{ ... for (r in 1:NUM_RUNS){ ... theta_delta[:,THETA_rpe]=logit(run_pred_err_c2/4+0.5); theta_delta[:,THETA_ev]=logit(trial_expected_val/2+0.5); //transfer the deltas into the theta-delta matrix. for (d_i in 1:DELTA_N){ theta_delta[:,THETA_N+d_i]= neural_data[run_first_trial_id:run_last_trial_id, d_i]; } for (tdi in 1:TD_N){ //separately calculate mean for each ThetaDelta var td_mean[tdi] = mean(theta_delta[:,tdi]); //separately calculate SD for each ThetaDelta var td_sd[tdi] = sd(theta_delta[:,tdi]); } 7.2. JOINTMODELIMPLEMENTATION 139 for (i in 1:run_N_trials[r]){ td_var[i,:] = (to_vector(theta_delta[i,:]) - td_mean) ./ td_sd; } L_Omega ~ lkj_corr_cholesky(1); //these yield standard deviations of each individual value. L_sigma ~ cauchy(0,1); td_var ~ multi_normal_cholesky(zeros,L_Sigma); } } The second and third lines simply copy the EV and RPE into thetheta_delta matrix of behavioral variables (‘thetas’) and neural data (‘deltas’), transforming them into a space with infinite support (i.e., in the range of negative to positive infinity). The followingfor loop copies the neural data into that same matrix. Then, a set of commands normalize each column in the matrix, so that each variable is normally distributed with a mean of zero and standard deviation of 1. This makes interpretation simpler, because we can simply calculate a correlation matrix rather than a covariance matrix, and these will be equivalent. The final set of three command actually estimates values; L_Omega estimates correlation, whileL_sigma estimates covariance. (I should note the covariance estimation here is redundant because of the transformation of values to standard space and if I was to run this again, I would remove this redundancy and focus on modeling correlation. However, I have run tests for a subset of subjects and found that a correlation-based estimate without this redundancy produces similar results.) I am not aware of any prior implementations of Turner’s joint modeling paradigm in stan; the closest analogue was implemented in jags, but did not employ the three-level hierarchical structure described here (Palestro et al., 2018) Thus, I tested that I had implemented the joint modeling covariance matrix in a way that would detect covariances in the data. I sampled from the prior distribution of the matrix derived from this method and confirmed that although not completely uninformative, with some bias toward zero covariance, priors were loose and enabled wide ranges of proposal covariance values to be considered (Figure 7.1). In order to check whether the overly 140 CHAPTER7. LINEARBALLISTICACCUMULATORBAYESIANMODEL Figure 7.1: Prior distribution for the covariance matrix. This example image shows a 4x4 distribution, but actual covariance matrices sizes depended on the particular run of the model I was testing. wide prior distribution may have skewed the results, I repeated the analysis for a limited set of subjects using the prior distribution shown in Figure 7.2, where all proposals were between -1 and 1. This prior yielded similar results as the first prior: behavioral parameters had low or non-existent correlations with neural parameters, while neural parameters had larger correlations with one another. One novel element to this design is that the correlation matrix is estimated using the same prior matrix independently for each run. This may cause problems if there is a large variance in the actual correlation matrices existing across subjects, but if there are similarities then the model should allow us to recover a correlation matrix reflecting correlations between each of the theta and delta variables. 7.2. JOINTMODELIMPLEMENTATION 141 Figure 7.2: The joint model was repeated with this distribution; although I do not report those results here, use of this prior did not yield larger correlation estimates. 142 CHAPTER7. LINEARBALLISTICACCUMULATORBAYESIANMODEL 7.3 Pre-testing In this section, I describe testing related to various aspects of the RL-LBA model that preceded the primary tests presented in the following sections with empirical results. This section focuses on testing of suitable priors and other model design features. 7.3.1 Experimenting with initialization I experimented extensively with initialization for the three-level hierarchical model. Al- though I wasn’t able to get that model to a presentable stage, I explain the testing here because the testing ultimately informed my initialization and prior settings for the single level model. The three level NUTS model would initialize, run, and produce an estimate for small groups of as many as 15 subjects but could not properly initialize for larger groups. Because initializing seemed to be the main problem, I focused a lot on attempting to resolve initialization problems. The rstan MCMC estimator works by starting chains off at a particular point within the prior distribution, and calculates Jacobian transforms away from that point repeatedly, picking more likely proposals with a randomly determined probability. Thus, it is important that the estimator starts at a point in the probability space that it can iterate from. If the point is too far at one extreme of the distribution, then the model may fail to properly initialize. I thus experimented with several initialization methods, including randomization and boot- strap methods. The first option for initialization was to simply use the default initial starting values that R uses. Unconstrained variables are set on a uniform distribution between -2 and 2; variables constrained to be non-negative are set on an exponential transform of a normal distribution between exp(-2) and exp(2). The second option for chain initial starting values was random initial values. Here, I started with empirical priors and used them to extract random distribution around those empirical priors. Thus, we got a unique set of random initial values for each chain. Although pure randomization was unreliable, one method that did turn out to be useful was to start each chain on set values equidistant within the distribution of between -2 and 2 standard deviations. This mimics the default behavior in 7.3. PRE-TESTING 143 stan, but is a little more intelligent because we set it to operate across whichever distribution is most appropriate for each parameter, rather than a set uniform distribution between -2 and 2. The third option was to bootstrap initial values. Here, I randomly selected with replacement a set of subjects equal to the size of the actual dataset (in this case, 10 subjects). I then calculated group-level statistics based on the single-subject models using these groups and these starting values values were used. This method produces a quite tightly-bound set of initial values. Because the method for group-level statistics based on single-subject models often involved the use of medians, some initial priors are actually identical in some cases. Because this method is almost entirely deterministic (random noise from an empirical distribution is used to allocate starting values to specific subjects and runs), we know it will replicate for any given seed value. However, if our initial values are too narrow, we may not end up properly exploring the parameter space. It might be sensible to apply this method, but use smaller-size groups to generate values. The fourth option (for the three-level model below, and for DE-MCMC, not for single-level models) was to use initial values from the raw empirical distributions from single-level models. One plausible error with the bootstrap method described above is that it may be that a correlation among parameters exists, and for instance, a very high parameter is only viable when combined with a very low parameter. In that case, using means of the posterior distributions to define initial values may yield initial values with an negative infinite log likelihood (i.e., an impossible set of parameters). Rather than taking bootstraps of the summary statistics from each first-level model, I sampled directly from the posterior distribution of the single-level models. This would ensure that every set used to define a starting parameter is, at least at the run-level, a value set with a finite log likelihood. If initial values for hyper-parameter distribution are set high enough then this should ensure that our model is able to initialize. To select samples from the posterior, posterior samples were ranked from most to least likely using the log probability generated for single-level models. For each chain in the three-level model, I took one sample from each run and used that sample to define the initial values for the three-level model chain. The posterior sample with the median likelihood, the posterior sample with the maximum likelihood, and samples with likelihoods between those two were taken. Initial values for higher level parameters were set using summary statistics of the lower level values. This model is essentially a ‘hierarchical model’ with an assumption of no pooling. This represents an initial starting point from which MCMC chains 144 CHAPTER7. LINEARBALLISTICACCUMULATORBAYESIANMODEL can adapt to a higher likelihood posterior. 7.3.2 Prior specification Specifying priors is important for the performance of a good model - many times priors can be uninformative, but they must allow for efficient exploration of the best posterior space. Schuurman, Grasman, and Hamaker (2016) emphasizes the importance of avoiding prior mis- specification. Thus I wanted to pay particular attention to priors, including variance priors. I used the same analysis to configure priors in the DE-MCMC model, and in that model, variance priors must be specified both at the within-subject level, across runs, and at the across-subject level, within groups. We further make two simplifying assumptions - that for each parameter, subject-level variance parameters are drawn from the same distribution; also, group-level variance parameters drawn from the same distribution. Thus it is important for us to specify the best priors for those. First examining within-subject variance, a fair empirical prior for the variance of runs within groups might be based on the observed standard deviation of within-subject parameters. In order to facilitate ease of future development of the model into a hierarchical model, at all times I attempted to generate priors using conjugate distributions, as also noted in Section 3.2.2. Thus, parameter priors were all initially estimated on a normal distribution before being transformed into a more appropriate distribution. The learning parameter is on a scale from 0 to 1, where zero represents no learning, and 1 represents a perfectly learned stimulus-response pair on each trial. An inverse logit transform converts from the normal distribution into a logit distribution. At first, I had erroneously used large sigma values on the normal distribution with an aim of producing a non-informative prior. However, sigma values of greater than about 1.7 actually manipulate the shape of the alpha distribution to produce higher probabilities near the 0 and 1 point and lower probabilities at the midpoint. Thus, I settled on a prior = 0; = 1:5 for use in the RL LBA model. k and were both estimated by starting with normal priors and were transformed to an exponential distribution with the inverse natural log functione x . 7.3. PRE-TESTING 145 Calculating empirical priors I ran the reinforcement learning linear ballistic accumulator single-run model in order to calculate a set of empirical priors. From the estimated best parameter fit for each individual subject, we can calculate overall best priors. The single-run model started with the following priors: • alpha_pr~normal(-3,3). This prior is set as a weakly informative prior that recognizes a priori the difficulty of the task. We know the task was difficult because subject performance at the last run of the task averaged around only 65 percent performance. • k~exp(normal(log(0.5),1)) This is designed to approximate Annis et al. (2017)’s prior, k~normal(0.5,1), but with an exponential distribution. We would expect relatively less sampling toward zero, and relatively more at higher points. • tau_pr~exp(normal(log(0.5),0.5)) This is designed to approximate Annis et al. (2017)’s prior,k~normal(0.5,0.5), but with an exponential distribution. We would expect relatively less sampling toward zero, and relatively more at higher points. These priors were not used in the analysis presented below. They were used to develop an empirical prior, with which I experimented for use in the DE-MCMC model and also in the RL-LBA model. I wanted to ensure that the alpha prior of normal(-3,3) is not low enough to bias the results. In particular, we are most likely to bias results where the true alpha learning rate is at the edges of the distribution; since our prior distribution is centered low, the runs with high true alpha learning rates are of most concern. In order to further investigate this, I selected the runs for which the alpha estimates were highest, and re-ran the model with looser priors, and in particular, an alpha level closer to zero: • alpha_pr~normal(-1,4) • k~exp(normal(log(0.5),2)) • tau_pr~exp(normal(log(0.5),1)) These were not run for all subjects as not all subjects could be fit to this data. For the five runs with the highest alpha levels, the log probability estimate did not differ by more than 3.0 based on the two different prior settings; normed alpha values did not differ by more 146 CHAPTER7. LINEARBALLISTICACCUMULATORBAYESIANMODEL than 0.03. Thus, we can be reasonably confident that the one-level run initial prior does not strongly bias the posteriors. In order to determine group-level priors, I extracted reasonable estimates from across the single-run models. First, any run with any parameters for which b R > 1:05 was excluded. The one-level model samples contained 500 posterior distribution samples for each run. For each parameter, stan calculates the mean of the sample distribution for each run. I used the median run’s mean parameter estimate as the group-level mean. Stan also reports 2.5% and 97.5% posterior distribution values for each parameter - these are the boundaries of the contiguous 95% Highest Density Interval for a given posterior distribution. To calculate the prior distribution variance, I found the median run’s 2.5% and 97.5% posterior distribution values, and set the deviation to the greater of the difference between each of those and the mean. We also need to calculate a half-cauchy parameter to specify a prior distribution for the group distribution of subjects. This was set to twice the standard deviation of posterior distribution means across runs. Finally, I wanted to set a run gamma value. This specifies the prior distribution of within-subject run-level standard deviations. To get this figure, I calculated the standard deviation of single-run posterior distribution means within each subject. I then found the subject at the 97.5% percentile standard deviation; this was used to set the value of the half-cauchy parameter. Although this may seem to be aiming higher than the true value, because we used the 97.5% percentile, I wanted to ensure that the prior encompassed most reasonable values; these values are much more informative priors than the uninformative priors I was previously using. Comparing empirical priors to weakly-determined priors I ran two 10-subject stan analyses to compare weakly-defined priors to strong informed priors that used estimates from group aggregates of the single subject runs. Considering the efficiency, accuracy, and representativeness of these, I wanted to ensure that all group parameters and at least 90% of subject-level and run-level parameters: • Efficiency: Reached reasonable efficiency on all chains; at least 0.5 per chain-iteration, to build up to an effective sample size of 10,000 (this will require long-running models) • Representativeness: reached a Gelman-Rubin ^ R statistic of no more than 1.10, and ideally 1.05, indicating chain convergence 7.3. PRE-TESTING 147 • Accuracy is typically measured by MCSE. I’ll aim for a lower MCSE if possible but if we control well for other estimation quality measures, it will probably be hard to improve MCSE, which I think will reflect real uncertainty in the model. I compared models with 500 iterations, sampled from 450 to 500, with a delta adaption of 0.9. Both models were identical, and used non-centered priors, except that one model used weak, uninformative priors and one model used informative priors. I compared the first 10 subjects in the dataset. Priors are described in Table 7.1. Table 7.1: Empirical priors and weakly-defined priors for the LBA model Weak Priors Informative Priors subject -3.00 -3.24 subject 3.00 1.89 subject 5.00 2.07 run 4.00 2.07 subjectk -0.69 -0.31 subjectk 1.00 0.12 subjectk 3.00 0.51 runk 2.00 0.35 subject -0.69 -1.75 subject 0.50 0.18 subject 2.00 1.23 run 1.00 0.97 Efficiency I wanted to measure efficiency for each parameter. At maximum efficiency, the effective sample size per chain-iteration is 1; lower efficiencies are degradations of this (Table 7.2). 148 CHAPTER7. LINEARBALLISTICACCUMULATORBAYESIANMODEL Table 7.2: Efficiency per chain-iteration by parameter Weak priors Informative priors subject 0.12 0.43 subjectk 0.67 0.49 subject 0.17 0.70 subject 0.87 0.48 subjectk 0.51 0.53 subject 0.14 0.34 run 1.00 1.00 runk 1.00 1.00 run 1.00 1.00 For estimating group parameters, efficiency seemed better overall for the informative prior model, although there’s no clear winner. Representativeness Representativeness is measured by the Gelman-Rubin ^ R statistic, and should ideally be lower than 1.05. Table 7.3: rhat score by parameter WeakPriors InformativePriors subject 1.07 1.05 subjectk 0.99 1.01 subject 1.09 1.02 subject 1.00 1.02 subjectk 1.02 1.01 subject 1.12 1.00 run 0.99 1.00 runk 1.00 0.99 7.3. PRE-TESTING 149 Table 7.3: rhat score by parameter WeakPriors InformativePriors run 1.01 1.02 I did seem to find evidence that the informative priors allowed for a better convergence of MCMC chains to estimate model parameters; these conclusions should be held tentatively until we can run the test several times repeatedly with varying starting seeds. Still, while using weak priors, two of the three group mean values had ^ R values above 1.05, whereas for informative priors, all values were below 1.05. Testing the joint model design Gelman’s team recommends the LKJ cholesky design to model a covariance matrix. This is the design recommended in the stan manual (Stan Development Team et al., 2016, p. 155), for a multivariate normal regression design like y n =x n + n s MultiNormal(0; ) I modified the priors for the distributions slightly compared to the Stan Development Team et al., 2016, p. 155 design but otherwise kept it as-is, and tested the joint model design with code as described in the beginning of this Section 7.2, with just the key joint modeling commands repeated below: data{ int TD_N; int LENGTH; } parameters { cholesky_factor_corr[TD_N] L_Omega; vector<lower=0>[TD_N] L_sigma; 150 CHAPTER7. LINEARBALLISTICACCUMULATORBAYESIANMODEL } transformed parameters { matrix[TD_N, TD_N] L_Sigma = diag_pre_multiply(L_sigma, L_Omega); matrix[TD_N, TD_N] Sigma = L_Sigma * L_Sigma'; } model{ vector[TD_N] td_var[LENGTH]; L_Omega ~ lkj_corr_cholesky(1); L_sigma ~ cauchy(0,1); //these yield standard deviations of each individual value. td_var ~ multi_normal_cholesky(zeros,L_Sigma); } It is important to set the priors for the model correctly. We can examine the priors empirically by running a model generating samples for the parameters without passing in data (other than a matrix size). A test of the model with TD_N=5, the length of each size of the covariance matrix, with 1500 warm-up iterations and an additional 500 samples showed that theL_sigma vector, which quantifies variance of each value represented in the matrix, could not be effectively sampled using the cauchy(0,1) distribution in this model. The specific values do not seem to matter much: cauchy(0,4), and cauchy(0,4) combined with an lkj_corr_cholesky(2.5) also fail to return a stableL\_sigma. A subsequent test showed that–when sampling from a lognormal distribution for sigma– changing the prior for the variance sigma unsurprisingly increases the estimate for a prior dis- tribution, however, increasing the cholesky prior seems to have very little effect on the sigma matrix. 7.4. SINGLE-LEVELJOINTMODEL 151 Model 7.1: (Section 7.4) Single-level joint model Estimation Method MCMC Levels Single-run; parametric statistics used to identify trends across subjects Parameters learning rate, a unit normal dis- tribution transformed from normal; threshold k and non-decision time, both exponential distributions transformed from normal Software rstan Data type Neural-behavioral joint model Behavioral Model Reinforcement learning with linear ballistic accumulation Neural Model Activity during trials in L & R Accumbens, L & R suborbital gyrus; 25- decision-region test 7.4 Single-level joint model With single-level models, it was easier to quickly run models and also to implement joint models. In single-level reversal learning models, I was able to experiment with various different prior values, parameterizations, and models, in order to compare the performance of each of them. This can be done much more quickly than when using a hierarchical model. Some models I compared were the discrimination model (as described in Section 3.3.1) and the OPaL model (Section 3.3.2). I also compared the simple RL model (described more in Chapter 5) with the LBA-RL model (described in this chapter), and compared centered and non-centered parameterization methods as described in Section 3.2.3. I was also able to test the effects of different prior values on the model and try out variations of the covariance matrix estimations method. ALthough I did try a number of different single level joint model configurations, the methods presented in this chapter are the best of my investigations; they are equal to or better than all other single-level joint methods I investigated in terms of log likelihood. 7.4.1 Joint model design In order to avoid computational challenges with a hierarchical structure, I implemented a single-level joint model measuring covariance of EV and RPE. 152 CHAPTER7. LINEARBALLISTICACCUMULATORBAYESIANMODEL The single level model is outlined in previous sections (Section 3.1.3). A simple double- update model (Reiter et al., 2017) is implemented to model the participant choice, and is combined with a linear ballistic accumulator, which models the uncertainty in choice selection. The aim was to apply a joint model of the kind defined in Brandon M. Turner, Rodriguez, Norcia, McClure, and Steyvers (2016), described conceptually in Section 3.5, technically below in Section 7.2 (see also Section 1.2.1 for a discussion on the motivation). We model covariance between neural structures and behavioral function in attempt to show links between them. In order to do this we need many measurements in order to adequately constrain estimation. In a single level model, that means we need to constrain on a per-trial basis. The primary estimated behavioral parameters,k, and are estimated across all trials so these cannot be linked directly to neural markers. However, within the behavioral model there are several per-trial components which may be suitable for linking to a neural model (see the further discussion on this selection process in Section 3.4); most importantly, EV and RPE. EV describes the expectation the participant has for each selection made, assuming that the participant has recognized the image and is trying to choose a response to it. RPE describes the difference between what the participant expected and the reward they actually received in the task. 7.4.2 Method Palestro et al. (2018) describe implementing a covariance joint model in JAGS. In this dissertation I usedstan, among other things, because of its ability to pre-compile models, which can make it faster. I adapted the joint model code inJAGS to write correlation matrix estimation commands instan. (behavioral variables) and (neural variables) variables were standardized before measur- ing their covariance. stan designers generally recommend reparameterization of a model at least so parameters are mean-centered, and often advise that a standard normal distribution is used to calculate variance separately from calculating the mean of a value (see Stan Development Team et al., 2016, page 348). I followed this practice in designing the model. For this reason, parameter covariance as measured was equal to parameter correlations. This had the added benefit of being able to easily quantify the comparable correlations between values with respect to their variance. I 7.4. SINGLE-LEVELJOINTMODEL 153 kept to priors I had been previously using, as described in the previous section in Section 7.3.2 The dataset contained 144 subjects across the three risk groups. Most subjects had done two runs of the Punishment Condition and two runs of the Reward Condition. I ran several single-level joint analyses. These ran quickly without the limitations I experienced running multi-level analyses for joint models. Initial values During single-level modeling, initial values were derived from a quantile function to select evenly-spaced values near the center of the prior distribution. These were then randomly allocated to each chain. For instance, the learning rate parameter had a prior of -3 on the normal distribution (before transformation using the logit function). Priors were taken from evenly-spaced quantiles between 1SD; + 1SD. Regions For neural parameters, I took a look at FSL-derived Harvard-Oxford map anatomical ROIs as well as the freesurfer ROIs presented here. There were adequately large correlations between freesurfer subcortical regions and FSL-derived regions to suggest they were capturing roughly the same regions. Correspondence between cortical regions could not be determined because each method–Freesurfer and FSL–subdivide the cortex using very different regional boundaries. For the first model, I ran the model examining potential links with 25 selected freesurfer regions identified as part of the decision-making network. These were freesurfer regions within the medial or lateral OFC, the striatum, insula, anterior cingulate gyrus, and some dorsolateral prefrontal cortex areas. 7.4.3 Results: 25-region analysis Figure 7.3-7.4 summarize them main results, described in the remainder of this section. 154 CHAPTER7. LINEARBALLISTICACCUMULATORBAYESIANMODEL Figure 7.3: Reward Prediction Error correlation with regions across subjects: selected decision-making regions Diagnostics 68 runs did not complete. Runs fail to complete for several reasons, including timing out after 2 hours of failing to converge, or because ^ R values are insufficiently close to 1, indicating poor convergence. There were 7 subjects missing from the analysis in the analysis altogether. All remaining runs had b R values sufficiently close to 1. 7.4. SINGLE-LEVELJOINTMODEL 155 Figure 7.4: EV correlation with regions across subjects: selected decision-making regions 156 CHAPTER7. LINEARBALLISTICACCUMULATORBAYESIANMODEL Neural-behavioral Correlations Correlation scores were measured individually for every run. The Bayesian estimation yields us a whole posterior likelihood distribution of correlations. In order to test whether there was a significant trend in correlations across subjects, I aggregated the mean correlation scores for each linked parameter pair across subjects. To do this, for each parameter correlation, I first took the mean posterior value for each run. I then averaged these mean posterior values across the runs we had for each subject, to find a subject mean of mean correlations estimates. This yielded a single correlation score for each parameter and subject. Then, I used a t-test to test for the difference in correlation scores across all subjects from zero. The Putamen correlated most strongly with RPE. The Accumbens, cingulate gyrus (both ACC and PCG), and caudate correlated, though less strongly, (Figure 7.3). All of these were significantly different from a zero correlation, adjusting for multiple comparisons. EV not strongly linked to any of the regions studied, but there was a weak, significant relationship with the frontal superior sulcus and putamen. There was a negative relationship between EV and regions right across the left and right insula (i.e., low expected values were associated with higher insula activity). Suborbital prefrontal regions did not show the expected effects (Figure 7.4). The suborbital sulcus was linked negatively to EV, and several other relevant areas - in particular, the rectus gyrus, did not significantly correlate with the modeled EV signal. 7.4.4 Results: 4-region analysis I re-ran the single-level model to test whether we could get a stronger covariance result by focusing on a few regions that might alone link EV and RPE to brain activity: the left and right Accumbens, and the left and right Suborbital Sulcus, in the center of the medial OFC. Diagnostic statistics Using the LJK estimator, the estimates do covary between cells across samples and so it might be important to only include the few values we really think will show a correlation. In this analysis, 112 runs were missing and 9 subjects had no runs at all. The missing runs are a concern, and I believe that these missing runs could be driving the problems I have been having getting 7.4. SINGLE-LEVELJOINTMODEL 157 Figure 7.5: Mean correlation of regions with Reward Prediction Error across subjects: targeted decision- making regions three-level models running. The missing runs are a concern, and I believe that these missing runs could be driving the problems I have been having getting three-level models running. Neural-behavioral Correlations The tiny correlation estimates for- connections don’t seem to be related to the number of parameters. Across the 25-region model and the 4-region model, estimated mean correlations remain of a similar magnitude. The low correlations apparent in this data do not appear to arise from the large number of predictors. 158 CHAPTER7. LINEARBALLISTICACCUMULATORBAYESIANMODEL 7.4.5 Model Discussion RPE has a strong bivariate distribution in this model (Figure 7.6). This could make it difficult to match it closely to the brain area activity, which is normally distributed. EV does not show the same bivariate tendency to the same degree, although it still manifests in some subjects (Figure 7.7). When using a simple RL model, EV and RPE appear to be about equally bivariate distributed. When taking the absolute measure of EV and RPE, the two behavioral measures do become more normally distributed (Figures 7.8-7.9). It may also be that the bivariate distribution in RPE accurately reflects a strong dichotomous experience of RPE by subjects. In this account, subjects really did fail to establish strong expecta- tions for stimuli, and thus really did experience feedback as a dichotomous signal either indicating a strong expectation violation (deviating from a neutral expectation) in either the positive or negative direction. It’s unclear whether brain activity in that situation, as measured by fMRI would also follow a dichotomous pattern. If not, it would be difficult to correlate it back to the modeled, mostly dichotomous RPE. Furthermore, if subjects failed to establish strong expectations for stimuli, then any expected value signal would be weak, which would explain a lack of clear correlation between modeled expected value and the ventromedial prefrontal cortex ROIs. 7.5 General Discussion Results from single-level joint models show that Accumbens activity appears to be linked to RPE. The mean correlation across subjects is low, but highly significant. The result makes sense in light of the Accumbens’ known role in reward processing. The role of EV is harder to assess. EV was strongly negatively correlated with RPE. This is not too surprising. In the early rounds of the task, EV is at its lowest, and RPE is likely to be very high regardless of the subject’s score (regardless of the direction of the error, the magnitude is large). Nevertheless, we see EV showing positive relationships with the superior circular insular sulcus, frontal middle sulcus, and frontal superior gyrus, alongside various other anterior cingulate and prefrontal cortex areas. All the correlations found between neural and behavioral values are low. This is not true for the neural-neural correlations or for the behavioral-behavioral correlations. It is uncertain why these 7.5. GENERALDISCUSSION 159 Figure 7.6: RL-LBA Model, Expected Value by Left Accumbens activity for selected subjects. In fact here, expected value does not show a strong bivariate pattern, in contrast to the RPE measure in this model and also expected value in the simple RL model. 160 CHAPTER7. LINEARBALLISTICACCUMULATORBAYESIANMODEL Figure 7.7: RL-LBA Model, RPE by Left Accumbens activity for selected subjects. The bivariate pattern for RPE is clearly visible here, and might present problems for estimation of the joint model. 7.5. GENERALDISCUSSION 161 Figure 7.8: RL-LBA Model, Absolute Reward Prediction Error by Left Accumbens activity for selected subjects. The absolute measure of reward prediction error is fortunately not bivariate, but it is unlikely whether an absolute measure will strongly track brain activity in any region. 162 CHAPTER7. LINEARBALLISTICACCUMULATORBAYESIANMODEL Figure 7.9: RL-LBA Model, Absolute Reward Prediction Error by Right Accumbens activity for selected subjects. 7.5. GENERALDISCUSSION 163 Figure 7.10: Simple RL Model, Expected Value. In the simple RL model, there is a strong pattern of expected value gathering to three specific points on the distribution. 164 CHAPTER7. LINEARBALLISTICACCUMULATORBAYESIANMODEL Figure 7.11: Simple RL Model, Reward Prediction Error. In the simple RL model, there is a strong pattern of expected value gathering to three specific points on the distribution. 7.5. GENERALDISCUSSION 165 correlations are so low–it is possible that because each correlation is only estimated within-subject, the signal to noise ratio is quite low. As a result, we only see small correlations, but because over all the subjects, they do tend do go in one direction, we are able to get an average of point estimates which can be distinguished from a zero correlation. 166 CHAPTER7. LINEARBALLISTICACCUMULATORBAYESIANMODEL Chapter 8 Discussion In this thesis, I described a number of Bayesian models of our reversal learning task. One model applied a joint modeling approach to explore the relationship between neural and behavioral parameters in single-level models. I used both Differential Evolution MCMC and the No U-Turn Sampler (NUTS) written in stan to implement hierarchical models. My Differential Evolution MCMC was a two-level model that explored each of the four runs independently, allowing me to compare reward and punishment activity. In NUTS, I implemented a three-level reversal learning model that jointly multiple runs within subjects and multiple subjects within groups to most accurately describe group contrasts in reversal learning parameters. 8.1 Findings There were several interesting patterns in the data, one relating to the difference between Sexually Risky Meth using and Non-Meth using subjects, and others relating to the relationship between neural and behavioral data. 8.1.1 Group differences The three-level RL behavioral model suggested that the Sexually Risky, Meth-Using using group had a lower learning rate relative to the Sexually Risky No Meth group, at least during the punishment motivational condition. The two-level differential evolution MCMC model showed the 167 168 CHAPTER8. DISCUSSION same trend, but in that dataset we only observed with 95% confidence a difference in the Reward condition. These likely reflect natural uncertainty in estimation arising from the different methods used to analyze the data in each case. The first key group differences test used NUTS Monte Carlo Markov Chain(s) (MCMC) with a simple RL model. The second used Differential Evolution Monte Carlo Markov Chains(s) (DE-MCMC) with an RL-Drift Diffusion model.Furthermore, the group difference apparent in the second, DE-MCMC test was estimated separately for each run, whereas run variance was pooled for the first model because all data for each Risk group was estimated in a single model. Any of these differences may yield the slightly different result in the quality of the difference between Group 2 and Group 3. In spite of the differences, the consistent feature is some kind of greater learning rate for the Risky Non-Meth Group compared to the Risky-Meth group. The three-level model suggested that this difference was in the Punishment condition, while the two-level model, using RL-Drift Diffusion (DD), suggested the difference was observable in the Reward condition. 8.1.2 Neural-behavioral results The single-level joint model analysis confirms that Reward Prediction Error is at least weakly related to Putamen and Accumbens activity, which we expected to see. We also saw relationships with other regions, including the posterior cingulate, anterior cingulate, insula, and ventromedial prefrontal cortex. Estimated correlations across subjects were very close to zero for the Pallidum, amygdala, and the rectus gyrus (at the base of the ventromedial prefrontal cortex). A more surprising result was a negative relationship between the suborbital sulcus and expected value, and a corresponding positive one withRewardPredictionError (RPE).Expected Value (EV) and RPE were negatively correlated at anr of approximately 0:8, and as an artifact of that, regions positively related to Reward Prediction Error would be negatively related to Expected value, and vice versa. A strong negative correlation between RPE and EV occurs when subjects learn quickly after an expectancy was violated. In this instance, the reversal trial could drive the RPE-EV correlation because in the reversal trial, there is often a fairly strong expected value for the subject’s response, followed by a large RPE as the subject learns that their learned response no 8.2. FURTHERWORK 169 longer applies. In order to see whether the low neural-behavioral correlations might arise from some effect of having two highly correlated behavioral variables in the model, I did repeat the single model run with RPE as the only behavioral parameter to be jointly modeled in the model 1 . Unfortunately, including RPE as the one-and-only linked behavioral value did not improve correlation. In subsequent Discussion sections I explore alternative models which might help to fit the data better than the RL, RL-LBA, and RL-Drift Diffusion models I have used in this thesis. 8.2 Further work In this section, I explore alternative behavioral models and other forms of joint modeling that might be used to more sensitively measure group differences in the reversal learning task, capture the computational dynamics more precisely, and better measure behavioral-neural relationships. 8.2.1 Alternative Models I saw how the distribution of Expected Value and Reward Prediction Error was extremely bimodal , and posterior density estimates were wide , indicating that there was a lot of uncertainty in the models. Where these are the cases, it is important to look at whether the models used–the RL model (Chapter 5), the RL-LBA model (Chapter 7), or the RL-Drift Diffusion Model (Chapter 6)– can be improved upon. If we can find a better-fitting model, we can reduce uncertainty, improve correlation with the neural data, and gain more statistical power to detect and quantify differences between our groups. Although we have modeled the task as a reinforcement learning task, enhanced with a linear ballistic accumulator, another approach may be more suitable. The task certainly involves memory but it is also a discrimination task. The images used in the task are quite abstract and it is difficult to tell the difference between them. So error in any given trial could arise from two sources: a failure to correctly identify the reinforced response for a particular cue, and a failure to correctly identify a particular cue. 1 Albeit with an older and uncorrected version of the neural parameters; testing showed that the correction, though necessary for accurate measurement, didn’t change correlation values very much 170 CHAPTER8. DISCUSSION Modeling recognition: the discriminability model How do we integrate a model of item recognition? In the reversal learning task, recognition (of the stimulus; i.e., distinguishing each image from other images) and recall (of the correct response; i.e., recalling which response is correct) are both likely to contribute to error. Yet our model assumes equal recognition and recall and this is likely to contribute to error. In Chapter 3, Section 3.3.1, I describe a Discriminability model. In the main models presented in this thesis, it is assumed, as a simplifying assumption, that subjects always recognize a stimulus, and then probabilistically decide whether to make a response that matches what they have learned about the stimulus so far. In the simple RL model (Chapter 5), this is quantified through the inverse temperature parameter. In the more complex RL-LBA model or the closely-related drift diffusion model, this is quantified by the subject’s non-decision time and response threshold thresh ork. In either case, any properties of the alternative stimulus do not change the modeled behavior. In retrospect, a model of this task must attempt to model the computational processes of stimulus discrimination because it is too important a component of this task to assume away. In a discrim- inability model, we assume that as subjects gain familiarity with the stimulus they view, they will be less likely to confuse it with alternative stimuli they have recently viewed, and more likely to match it with the stimulus they are actually viewing. We can incorporate learning to model this process, which may be the same as the primary S-R RL learning rate, or it may be an alternative number. Empirical measures of discriminability Unfortunately, testing showed that the discriminability model built for the fulfillment of this dissertation did not yield superior performance over the existing models. One thing the model did not do is to impose any a-priori perspective on image discriminability. It may be that some of the abstract images (a selection of which are shown in on Page 15) have visually more similar properties than others. There are two ways we could determine image similarity. First, it could be be theoretically possible to use visual cortex representations to get a measure of the subjective similarity of each image by looking for pattern similarity in the functional activity associated with each image. Second, a new set of subjects could be presented with a similar task using an n-back design to test subjects’ ability to remember the images. In this n-back design, subjects would be asked to compare each image with the image they viewed two 8.2. FURTHERWORK 171 presentations prior and indicate whether they are the same or different. We present each combination of confusable images to the subjects in order and compare the relative likelihood that each of image pair will be correctly discriminated. Because all images are presented in the same sequence for every subject, the number of such combinations are not too large if we assume that subjects will only confuse each given cue with one of 3 that was presented previously. In fact, the total number of comparisons can be calculated to be 438; this maybe be a feasible group to test. Opponent-Actor models I did model the Opponent Actor Learning (OpAL) model (Collins and Frank, 2014), and have described it in Section 1.3 and Section 3.3.2. However, I found it was unable to yield superior log likelihoods relative to the models I have used for the current models, so I cannot recommend it for modeling the deterministic reversal learning task. In combinations with other modeling methods not used here yet, it may end up being a better fit, so this should still be considered. Particularly useful is the OpAL model’s three-term scaling of learning, so that a model’s existing weight value will influence the speed of learning. This could be useful in reversal learning, where information has to be unlearned and learned again. It could help to decouple the implausibly strong EV RPE correlation observed in the current models presented here. Separate parameterization of pre- and post- reversal In this task, I applied a simple double update model to learning. Such a model is fully adequate to capture and model learning associations, including during a reversal learning task. Prior work by Reiter et al. (2017) found that a double update model with a constant learning rate across the task was actually a better model of prediction than Sutton K1 model (Sutton, 1992), in which learning rate itself is a function of prediction error. For that reason, I did not vary learning rates across the task. However, this restricts our ability to measure cognitive flexibility, or the difference between pre-reversal and post-reversal learning rates. This includes group differences in cognitive flexibility, which might be particularly interesting when comparing meth-using and non-meth using populations. Thus, a future model could explicitly incorporate and estimate best fit learning rates in pre-reversal and post-reversal, in order to better quantify cognitive flexibility. 172 CHAPTER8. DISCUSSION Chance performance mixture model One problem arising in Section 6.3.2 was the possibility that subjects among groups varied, particularly in a way not suggested by a normal distribution of parameters. In particular, their behavioral strategy may vary. Some subjects may prefer to perform the task by simple random guessing, rather than to exert effort to learn the reinforced associations in the task. For those subjects, the reinforcement learning model is inappropriate, and if we assume they are doing reinforcement learning, then their behavior wrongly lowers the true estimate of the learning rate achievable by subjects each group in situations where they are doing reinforcement learning. A mixture model is one way out of this. In a mixture model, we model the probability of two models against one another. In this case, we model the probability of reinforcement learning against random guessing. The group-level estimates for the learning rate and inverse temperature could then be weighted against that estimate, so that subjects with a high probability of doing random guessing have less influence on the group-level estimate of the learning rate 8.2.2 Joint modeling across tasks In the previous section, I described using Bayesian joint modeling to estimate parameters of a model using behavioral and neural data from a particular task. We can also use joint modeling to estimate parameters of a model using data from more than one task. Inter-task functional joint models are virtually un-addressed in current literature and could represent an interesting progression in applications of joint modeling. Thus, they are worth at least consideration for viability. Across tasks, we have at least one measure each of reward and punishment learning, one measure of risk aversion (CUPS task), one measure of reward responsiveness, and four measures of inhibitory control (Reversal learning Reward and Punishment, Stroop, and Go/NoGo). The biggest challenge might be the degrees of freedom introduced by extra models. The principles behind an inter-task functional joint model are minimally different to a joint model within a task. The primary difference is that for an inter-task joint model Brandon M. Turner, Sederberg, et al. (2013), parameters cannot be estimated for each individual. Only one parameter estimate rather than a whole distribution of estimates can be obtained for each participant. The hyper-parameters 8.2. FURTHERWORK 173 that link one task to the other must be estimated across subjects. This contrasts with intra-task behavioral-functional models. In these, because there is a 1 : 1 identity mapping between the trials in the behavioral and functional datasets (i.e., the same trials are in both) the individual variation in behavior from trial to trial can be quantified for whatever parameter is being measured. 8.2.3 Pain signature data Previously, I attempted to use pain signature data in the model by assuming that subjects in the punishment condition would respond according to their EV in a particular trial in proportion to the magnitude of the pain signature they feel on that particular trial. In retrospect, this was a mistake. To test the hypothesis that the intensity of the pain signal incentivizes learning, it needs to modulate learning, not item selection, as I modeled. A future effort can test other roles of pain intensity in the data. 8.2.4 Testing Meth use and sex risk as continuous. Largely because I am following the pre-commitments our team made before collecting the data, I have treated meth use and sex risk as measures of distinct groups. However, these could be treated as continuous variables, modeled along a poisson distribution. One of the advantages of the Bayesian framework is that it is much easier to apply non-normal distributions such as a Poisson distribution to the problem. Because this data is non-normal, a Poisson distribution may be helpful. 8.2.5 Optimization In this thesis I explored optimization of the tuning parameter, suggesting that the default value is too high, and observing that a lower value in this case enabled me to observe chains iterating in contrast to the default value where they would never start (Section 6.3.1). One additional optimization may be to vary the tuning parameter, initially starting with a very small value and increasing it after the start of processing, when all chains have successfully begun iterating, but before the the model begins to sample from the prior. 174 CHAPTER8. DISCUSSION 8.3 Conclusion Meth using subjects appear to have lower learning rates than Non-Meth using subjects, holding constant Sexual Risk-taking. During reversal learning, bilateral Accumbens activity appears related to RPE. In fact, RPE is as at least as strongly related to Accumbens as any other region in the striatum or prefrontal cortex, and in many cases, demonstrably more strongly related. There are not clear differences in performance for subjects in the Punishment compared to Reward conditions. DE-MCMC performs comparably tostan NUTS MCMC, but may yield slightly different results to it, as evidenced here. The two NUTS MCMC analyses suggested that Group 3 had a lower learning rate than Group 2 in the Punishment condition, but not necessarily the Reward condition, but for the DE-MCMC model, it was the Reward and not the Punishment condition that seemed to show a difference. Overall, a difference seems credible, but more analysis will be required to definitely show whether it applies to Reward, Punishment, or neither in particular. Glossary represents learning rate in all RL, RL-LBA, RL-DD, and other RL alternative model presented in this thesis.. 47, 111 in a neural-behavioral joint model, represents neural parameters or data linked into the model.. 56, 152, 157 MCMC tuning parameter affecting the magnitude of a single differential evolution iteration step; See Section 6.3.1. 94, 95, 112 Non-decision time; a parameter estimated in my RL-LBA model and the RL drift diffusion model. See Section 3.1.2. 40 in a neural-behavioral joint model, represents behavioral parameters. Unrelatedly, also used in Equations 3.8-3.9 to refer to a decision threshold.. 40, 56, 152, 157 b R Gelman-Rubin statistic.. 74, 83, 90, 154 k Decision threshold that must be reached by a diffusion process or ballistic accumulator for a decision to be made. In an LBA context this is denoted asthresh; in a drift diffusion context it is denotedk; in Equations 3.8-3.9,. See Section 3.1.2. 40 thresh Seek. See Section 3.1.2. 40 BOLD Blood Oxygen Level Dependent, in the context of BOLD activity, refers to indicator of neural activity in the brain in an fMRI scan. Blood following through a recently-active brain area active is depleted of oxygen, and fMRI detects deoxygenated hemoglobin in the bloodstream; hence Blood Oxygen Level Dependent (BOLD) activity is detected in fMRI and is treated as a proxy for neural activity itself.. 28, 175, 179 DE-MCMC Differential Evolution Monte Carlo Markov Chains(s), a Bayesian Monte Carlo MarkovChain(s) (MCMC) estimation technique pioneered in psychology pioneered by Ter 175 176 Glossary Braak (2006) and Brandon M. Turner, Forstmann, Wagenmakers, et al. (2013). Like other Bayesian MCMC estimation techniques, it uses a parallel iterative process. What is unique to MCMC is the evolutionary algorithm using both crossover and migration steps. It is described in more detail in Section ??. 1, 3, 6, 7, 22, 93–95, 110, 126, 131, 135, 144, 145, 168, 174, 179 EV Expected Value, in the current context, described in detail in Section 3.1.1 in conjunction with RPE. 20, 42, 52, 56, 95, 96, 136, 137, 139, 151, 152, 155, 156, 158, 168, 171, 173, 179, 186 HDI Highest Density Interval, Analogous to a parametric Confidence Interval, a Highest Density Interval describes the highest contiguous density area of a posterior parameter sample. It can be used to quantify the range in which the true value of a parameter very likely falls.. 179 hyper-parameter In Bayesian hierarchical modeling, a hyper-parameter is an argument for a prior for another parameter in a model. For instance, subject learning rates could be estimated by the sampling statementnormal( g ; g ). In this statement, is a parameter and g , g are hyper-parameters.. x, 25, 96, 111, 114 MCMC Monte Carlo Markov Chain(s), a Bayesian technique to estimate a posterior distribution representing the likelihood distribution of a set of parameters as a function of data and a set of priors. In MCMC, at each iteration, we sample a set of parameter estimates from the distribution that deviate slightly from their estimates in the previous iteration, and testing the likelihood of those parameters given the data and the prior distribution of each parameter. This is repeated, both iteratively, typically hundreds and thousands of times, and in parallel, across a number of chains.. 6, 29, 56, 69, 70, 72, 81, 82, 86, 135, 168, 174, 175, 179, 187 NPS Neurologic Pain Signature, the proprietary image created by Wager 2013 (Wager et al., 2013) representing brain region with fMRI activity correlating with increasing pain.. 24, 25, 62–68, 179, 184 ROI Region of interest, a contiguous area of the brain defined according to some criteria that may be anatomical or functional or related to a particular function. In this study, ROIs have been anatomically defined based on neurological atlases. 23, 26, 28, 29, 179 Glossary 177 RPE Reward Prediction Error, in the current context, described in detail in Section 3.1.1 in conjunction with EV . 20, 52, 56, 95, 137, 139, 151, 152, 156, 158–160, 168, 169, 171, 174, 179, 186 178 Glossary Acronyms BOLD Blood Oxygen Level Dependent. 28, 175, Glossary: BOLD DD Drift Diffusion. 168 DE-MCMC Differential Evolution Monte Carlo Markov Chains(s). 1, 3, 6, 7, 22, 93–95, 110, 126, 131, 135, 144, 145, 168, 174, Glossary: DE-MCMC EV Expected Value. 20, 42, 52, 56, 95, 96, 136, 137, 139, 151, 152, 155, 156, 158, 168, 171, 173, 186, Glossary: EV HDI Highest Density Interval. Glossary: HDI HMM Hidden Markov Model. 9 LBA Linear Ballistic Accumulator. See Section 7.1. 135, 136 MCMC Monte Carlo Markov Chain(s). 6, 29, 56, 69, 70, 72, 81, 82, 86, 135, 168, 174, 175, 187, Glossary: MCMC NPS Neurologic Pain Signature. 24, 25, 62–68, 184, Glossary: NPS NUTS No U-Turn Sampler. 1, 2, 6, 21, 69, 70, 135, 167, 168, 174 OpAL Opponent Actor Learning. 10, 22, 51, 171 RL Reinforcement Learning. 21, 47, 57, 69, 70, 88, 96, 98, 131, 135, 136 ROI Region of interest. 23, 26, 28, 29, Glossary: ROI RPE Reward Prediction Error. 20, 52, 56, 95, 137, 139, 151, 152, 156, 158–160, 168, 169, 171, 174, 186, Glossary: RPE 179 180 Acronyms List of Models 4.1 (Section 4.1) Correct scoring hierarchical linear model . . . . . . . . . . . . . 58 4.2 (Section 4.2) Pain activity hierarchical logistic regression model . . . . . . . . 63 5.1 (Section 5.1) Variational Bayes and MCMC Model . . . . . . . . . . . . . . . . 70 5.2 (Section 5.2) Three-level RL Model . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.3 (Section 5.3) Three-level RL Model . . . . . . . . . . . . . . . . . . . . . . . . . 89 6.1 (Section 6.2) Two-level RL DD Model . . . . . . . . . . . . . . . . . . . . . . . 97 6.2 (Section 6.3) Three-level RL DD Model . . . . . . . . . . . . . . . . . . . . . . 111 7.1 (Section 7.4) Single-level joint model . . . . . . . . . . . . . . . . . . . . . . . . 151 181 182 LISTOFMODELS List of Figures 1.1 Proportion of correct judgments at each time point in the task, relative to reversal. Each colored line represents the aggregate of one subject’s data. Black lines represent the averages across the group, and are reproduced with standard deviations in Figure 1.2. Dashed lines represent the mean across subjects at the last time point pre- and post-reversal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.2 Proportion of correct judgments at each time point in the task, relative to reversal,by group, with error bars indicating standard deviations. . . . . . . . . . . . . . . . . 18 2.1 FSL and nltools convolution superimposed on each other. Each series shows both an FSL and nltools timeseries; they can be seen following distinct paths at each even, where one time series has a systematically higher magnitude than the other. Magnitude is unimportant here because the signal is subsequently normalized. The middle rows, the error rows, are different because for the FSL comparison, only the first error trial was recognized, while for nltools, we plotted every error trial. . . . . 25 2.2 Log histogram of reaction times, across all subjects in the study . . . . . . . . . . 32 3.1 Drift rate by Expected Value with either Equation 3.10 or Equation 3.11. . . . . . . 43 3.2 Expected Value by learning iteration with a learning rate of 0.2. . . . . . . . . . . . 44 3.3 Expected value and drift rate functions by learning iteration with a learning rate of 0.2. ‘F_1’ and ‘f_2’ denote drift rate functions while ‘Expected Value’ denotes expected value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.4 learning; an automated meta-analysis of 1144 studies from neurosynth.org . . . . . 53 3.5 value; an automated meta-analysis of 470 studies from neurosynth.org . . . . . . . 54 183 184 LISTOFFIGURES 3.6 error; an automated meta-analysis of 464 studies from neurosynth.org . . . . . . . 54 3.7 prediction error; an automated meta-analysis of 93 studies from neurosynth.org . . 55 3.8 response inhibition; an automated meta-analysis of 218 studies from neurosynth.org 55 4.1 Subject mean Neurologic Pain Signature (NPS) value by Motivation condition and subject response. In the punishment condition, pain signature was higher with incorrect responses than correct responses. However, the NPS is not notably higher in the Punishment condition, likely because Reward and Punishment runs were normalized independently. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.2 Difference in subject mean NPS value between correct and incorrect responses, by Motivation condition. In the punishment condition, NPS was higher for incorrect responses than correct responses. However, there was very little difference in the reward condition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.1 and values for the punishment rounds using Variational Bayes within the double update model. Each time the analysis is repeated, we yield a somewhat different result. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.2 and values for the punishment rounds using MCMC within the double update model. In contrast to Variational Bayes as displayed in Figure 5.1, results are very consistent each time the analysis is completed. . . . . . . . . . . . . . . . . . . . . 73 5.3 MCMC parameter estimate density intervals (one run) . . . . . . . . . . . . . . . . 75 5.4 MCMC parameter estimate density intervals (one run), chains drawn separately . . 77 5.5 rstan raw output for the double update model examining just one run at a time. . . . 79 5.7 and posterior estimates by group, (a) across Motivational conditions and (b) separated by motivational condition . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.8 Group differences in and posterior estimates: (a) differences in Motivational condition (Reward-Punishment); (b) differences in Group (Group3-Group2, i.e., Sexually Risky Meth - Sexually Risky No Meth). and estimates across Motiva- tional conditions, and estimates within condition . . . . . . . . . . . . . . . . . 87 LISTOFFIGURES 185 5.9 Bivariate posterior distribution of Group 3 minus Group 2 posterior estimate. Blue shaded dots are estimates falling with the 95% minimum volume ellipsoid for that particular statistic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 6.1 parameter chains over 1000 iterations in the MCMC analysis . . . . . . . . . . . 101 6.2 k parameter chains over 1000 iterations in the MCMC analysis . . . . . . . . . . . 102 6.3 parameter chains over 1000 iterations in the MCMC analysis . . . . . . . . . . . 103 6.4 Distribution of mean, , andthresh for each of the two punishment and two reward runs in the two-level model . . . . . . . . . . . . . . . . . . . . . . . . . . 104 6.5 parameter chain for each of the four runs for 10 pseudorandomly-selected subjects. Missing chains are instances where the subject was missing a run, or where that run was removed according to the pre-existing criteria. Some chains do show signs of non-convergence, but overall, they appear to have stabilized and converged to a single mean. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.6 ,, and thresh parameter distributions for the 10 subjects and four runs. All 10 subjects for a particular run and parameter are shown overlapping in different colors. Generally, the subjects have a high degree of overlap for the alpha parameter, but other parameters show a high degree of between-subject variation. . . . . . . . . . 106 6.7 Distribution comparing each of the three groups’ values with each other, separately for punishment and reward. The first row shows RiskyMeth minus RiskyNoMeth, the second row shows RiskyMeth minus SafeNoMeth, and the third row shows RiskyNoMeth minus SafeNoMeth. . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.8 A repeat of Figure 6.7, but combining punishment and reward posterior distributions. 108 6.9 MCMC three level model attempt, Selected Level 1 (Run parameters) diagnostic graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 6.9 MCMC three level model attempt, Selected Level 1 (Run parameters) diagnostic graphs (Set 4 of 11) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 6.10 MCMC three level model attempt, Level 2 (Subject parameters) diagnostic graphs . 126 6.11 MCMC three level model attempt, Level 3 (Group parameters) diagnostic graphs . 132 186 LISTOFFIGURES 7.1 Prior distribution for the covariance matrix. This example image shows a 4x4 distribution, but actual covariance matrices sizes depended on the particular run of the model I was testing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 7.2 The joint model was repeated with this distribution; although I do not report those results here, use of this prior did not yield larger correlation estimates. . . . . . . . 141 7.3 Reward Prediction Error correlation with regions across subjects: selected decision- making regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 7.4 Expected Value (EV) correlation with regions across subjects: selected decision- making regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 7.5 Mean correlation of regions with Reward Prediction Error across subjects: targeted decision-making regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 7.6 RL-LBA Model, Expected Value by Left Accumbens activity for selected subjects. In fact here, expected value does not show a strong bivariate pattern, in contrast to theRewardPredictionError (RPE) measure in this model and also expected value in the simple RL model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 7.7 RL-LBA Model, RPE by Left Accumbens activity for selected subjects. The bivariate pattern for RPE is clearly visible here, and might present problems for estimation of the joint model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 7.8 RL-LBA Model, Absolute Reward Prediction Error by Left Accumbens activity for selected subjects. The absolute measure of reward prediction error is fortunately not bivariate, but it is unlikely whether an absolute measure will strongly track brain activity in any region. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 7.9 RL-LBA Model, Absolute Reward Prediction Error by Right Accumbens activity for selected subjects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 7.10 Simple RL Model, Expected Value. In the simple RL model, there is a strong pattern of expected value gathering to three specific points on the distribution. . . . 163 7.11 Simple RL Model, Reward Prediction Error. In the simple RL model, there is a strong pattern of expected value gathering to three specific points on the distribution.164 List of Tables 1.1 Number of subjects in each condition by run. . . . . . . . . . . . . . . . . . . . . 13 1.2 Task design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.3 A sample of the beginning and end of the task. Images were shown in the order presented for every subject, for this particular run. Each of the four runs had similar designs with the same number of presentations and reversals, although different image cues were used for each run. . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.3 A sample of the beginning and end of the task. Images were shown in the order presented for every subject, for this particular run. Each of the four runs had similar designs with the same number of presentations and reversals, although different image cues were used for each run. . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.1 Fixed effects with stan diagnostic measures . . . . . . . . . . . . . . . . . . . . . 60 4.2 Fixed effects for the hierarchical NPS Pain model described in Equation 4.2. . . . 65 5.1 This table shows the vast difference in speed between Monte Carlo Markov Chain(s) (MCMC) and Variational Bayes. This data was pulled from an earlier version of the comparison, so it’s not directly descriptive of the rest of the data presented in this section, but the relative speed differences between MCMC and Variational Bayes were consistent across all runs. . . . . . . . . . . . . . . . . . . 72 5.2 Double Update Model (rev8) Safe No Meth Group vars= 931 . . . . . . . . . . . . 84 5.3 Double Update Model (rev8) Risky No Meth Group vars= 1199 . . . . . . . . . . 84 5.4 Double Update Model (rev8) Risky Meth Group vars= 687 . . . . . . . . . . . . . 84 5.5 Means, SDs, and HDIs for the estimates shown in Figure 5.8b . . . . . . . . . . . 89 187 188 LISTOFTABLES 5.6 Double update joint model Safe No Meth Group vars=967 . . . . . . . . . . . . . 90 6.1 95% Highest Density Interval for Punishment Condition . . . . . . . . . . . . . . 109 6.2 95% Highest Density Interval for Reward Condition . . . . . . . . . . . . . . . . . 109 6.3 95% Highest Density Interval for all condition . . . . . . . . . . . . . . . . . . . . 109 7.1 Empirical priors and weakly-defined priors for the LBA model . . . . . . . . . . . 147 7.2 Efficiency per chain-iteration by parameter . . . . . . . . . . . . . . . . . . . . . . 148 7.3 rhat score by parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 7.3 rhat score by parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 References Abraham, A., Pedregosa, F., Eickenberg, M., Gervais, P., Mueller, A., Kossaifi, J., . . . Varoquaux, G. (2014). Machine learning for neuroimaging with scikit-learn. Frontiers in neuroinformatics, 8, 14. Abraham, A., Pedregosa, F., Eickenberg, M., Gervais, P., Mueller, A., Kossaifi, J., . . . Varoquaux, G. (2017). Download and load the destrieux cortical atlas. Retrieved October 5, 2017, from http://nilearn.github.io/modules/generated/nilearn.datasets.fetch_atlas_surf_destrieux.html# nilearn.datasets.fetch_atlas_surf_destrieux Ahn, W.-Y . & Haines, N. B. (2017). Retrieved October 1, 2017, from https://ccs-lab.github.io/ hBayesDM/reference/index.html Annis, J., Miller, B. J., & Palmeri, T. J. (2017). Bayesian inference with stan: A tutorial on adding custom distributions. Behavior research methods, 49(3), 863–886. Anticevic, A., Dierker, D. L., Gillespie, S. K., Repovs, G., Csernansky, J. G., Van Essen, D. C., & Barch, D. M. (2008). Comparing surface-based and volume-based analyses of functional neuroimaging data in patients with schizophrenia. Neuroimage, 41(3), 835–848. Bari, A. & Robbins, T. W. (2013). Inhibition and impulsivity: Behavioral and neural basis of response control. Progress in Neurobiology, 108, 44–79. doi:10.1016/j.pneurobio.2013.06.005 Bechara, A. (2005). Decision making, impulse control and loss of willpower to resist drugs: A neurocognitive perspective. Nature neuroscience, 8(11), 1458. Benjamini, Y . & Hochberg, Y . (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the royal statistical society. Series B (Methodological), 289–300. Brooks, S. P. & Gelman, A. (1998). General methods for monitoring convergence of iterative simulations. Journal of computational and graphical statistics, 7(4), 434–455. 189 190 REFERENCES Brown, S. D. & Heathcote, A. (2008). The simplest complete model of choice response time: Linear ballistic accumulation. Cognitive psychology, 57(3), 153–178. Brown, S. & Heathcote, A. (2005). A ballistic model of choice response time. Psychological review, 112(1), 117. Chang, L. J. [Luke J.] & colleagues. (2016). Neurosynth parcellation. Retrieved October 5, 2017, from https://neurovault.org/collections/2099/ Chang, L. J. [Luke J], Yarkoni, T., Khaw, M. W., & Sanfey, A. G. (2012). Decoding the role of the insula in human cognition: Functional parcellation and large-scale reverse inference. Cerebral cortex, 23(3), 739–749. Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple correlation/regression analysis for the behavioral sciences. UK: Taylor & Francis. Cole, M. W. & Schneider, W. (2007). The cognitive control network: Integrated cortical regions with dissociable functions. Neuroimage, 37(1), 343–360. Collins, A. G. & Frank, M. J. [Michael J]. (2014). Opponent actor learning (opal): Modeling interac- tive effects of striatal dopamine on reinforcement learning and choice incentive. Psychological review, 121(3), 337. Cools, R., Clark, L., Owen, A. M., & Robbins, T. W. (2002). Defining the neural mechanisms of probabilistic reversal learning using event-related functional magnetic resonance imaging. Journal of Neuroscience, 22(11), 4563–4567. Dale, A. M., Fischl, B., & Sereno, M. I. (1999). Cortical surface-based analysis: I. segmentation and surface reconstruction. Neuroimage, 9(2), 179–194. Deen, B., Pitskel, N. B., & Pelphrey, K. A. (2010). Three systems of insular functional connectivity identified with cluster analysis. Cerebral cortex, 21(7), 1498–1506. Destrieux, C., Fischl, B., Dale, A., & Halgren, E. (2010). Automatic parcellation of human cortical gyri and sulci using standard anatomical nomenclature. Neuroimage, 53(1), 1–15. Dosenbach, N. U., Fair, D. A., Miezin, F. M., Cohen, A. L., Wenger, K. K., Dosenbach, R. A., . . . Raichle, M. E., et al. (2007). Distinct brain networks for adaptive and stable task control in humans. Proceedings of the National Academy of Sciences, 104(26), 11073–11078. REFERENCES 191 Droutman, V ., Bechara, A., & Read, S. J. (2015). Roles of the different sub-regions of the insular cortex in various phases of the decision-making process. Frontiers in behavioral neuroscience, 9. Droutman, V ., Read, S. J., & Bechara, A. (2015). Revisiting the role of the insula in addiction. Trends in cognitive sciences, 19(7), 414–420. Fellows, L. K. & Farah, M. J. (2003). Ventromedial frontal cortex mediates affective shifting in humans: Evidence from a reversal learning paradigm. Brain, 126(8), 1830–1837. Fischl, B., Rajendran, N., Busa, E., Augustinack, J., Hinds, O., Yeo, B. T., . . . Zilles, K. (2007). Cortical folding patterns and predicting cytoarchitecture. Cerebral cortex, 18(8), 1973–1980. Fischl, B., Salat, D. H., Busa, E., Albert, M., Dieterich, M., Haselgrove, C., . . . Klaveness, S., et al. (2002). Whole brain segmentation: Automated labeling of neuroanatomical structures in the human brain. Neuron, 33(3), 341–355. Frank, M. J. [Michael J.] & Claus, E. D. (2006). Anatomy of a decision: Striato-orbitofrontal interactions in reinforcement learning, decision making, and reversal. Psychological Review, 113(2), 300–326. doi:10.1037/0033-295X.113.2.300 Gelman, A. & Hill, J. (2006). Data analysis using regression and multilevel/hierarchical models. Cambridge university press. Ghahremani, D. G., Monterosso, J., Jentsch, J. D., Bilder, R. M., & Poldrack, R. A. (2009). Neural components underlying behavioral flexibility in human reversal learning. Cerebral cortex, 20(8), 1843–1852. Gläscher, J., Hampton, A. N., & O’Doherty, J. P. (2009). Determining a Role for Ventromedial Prefrontal Cortex in Encoding Action-Based Value Signals During Reward-Related Decision Making. Cerebral Cortex, 19(2), 483–495. doi:10.1093/cercor/bhn098 Green, L., Myerson, J., & Mcfadden, E. (1997). Rate of temporal discounting decreases with amount of reward. Memory & Cognition, 25(5), 715–723. doi:10.3758/BF03211314 Groman, S. M., Rich, K. M., Smith, N. J., Lee, D., & Taylor, J. R. (2018). Chronic exposure to methamphetamine disrupts reinforcement-based decision making in rats. Neuropsychophar- macology, 43(4), 770. 192 REFERENCES Hampton, A. N. & O’Doherty, J. P. (2007). Decoding the neural substrates of reward-related decision making with functional MRI. Proceedings of the National Academy of Sciences, 104(4), 1377–1382. doi:10.1073/pnas.0606297104 Izquierdo, A., Brigman, J. L., Radke, A. K., Rudebeck, P. H., & Holmes, A. (2017). The neural basis of reversal learning: An updated perspective. Neuroscience, 345, 12–26. Izquierdo, A. & Jentsch, J. D. (2012). Reversal learning as a measure of impulsive and compulsive behavior in addictions. Psychopharmacology, 219(2), 607–620. doi:10.1007/s00213-011- 2579-7 Kruschke, J. (2014). Doing bayesian data analysis: A tutorial with r, jags, and stan. Academic Press. McClure, S. M., Laibson, D. I., Loewenstein, G., & Cohen, J. D. (2004). Separate neural systems value immediate and delayed monetary rewards. Science, 306(5695), 503–507. Menon, V . & Uddin, L. Q. (2010). Saliency, switching, attention and control: A network model of insula function. Brain Structure and Function, 214(5-6), 655–667. Monterosso, J. R., Aron, A. R., Cordova, X., Xu, J., & London, E. D. (2005). Deficits in response inhibition associated with chronic methamphetamine abuse. Drug and alcohol dependence, 79(2), 273–277. Moreau, A. & colleagues. (2016). Freesurfer cortical parcellation. Retrieved October 5, 2017, from https://surfer.nmr.mgh.harvard.edu/fswiki/CorticalParcellation O’Doherty, J., Kringelbach, M. L., Rolls, E. T., Hornak, J., & Andrews, C. (2001). Abstract reward and punishment representations in the human orbitofrontal cortex. Nature neuroscience, 4(1), 95. Oosterhof, N. N., Wiestler, T., Downing, P. E., & Diedrichsen, J. (2011). A comparison of volume- based and surface-based multi-voxel pattern analysis. Neuroimage, 56(2), 593–600. Palestro, J. J., Bahg, G., Sederberg, P. B., Lu, Z.-L., Steyvers, M., & Turner, B. M. (2018). A tutorial on joint models of neural and behavioral measures of cognition. Journal of Mathematical Psychology, 84, 20–48. Peterson, D. A., Elliott, C., Song, D. D., Makeig, S., Sejnowski, T. J., & Poizner, H. (2009). Probabilistic reversal learning is impaired in Parkinson’s disease. Neuroscience, 163(4), 1092– 1101. doi:10.1016/j.neuroscience.2009.07.033 REFERENCES 193 Raudenbush, S. W. & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods. Sage. Reimann, M. & Bechara, A. (2010). The somatic marker framework as a neurological theory of decision-making: Review, conceptual comparisons, and future neuroeconomics research. Journal of Economic Psychology, 31(5), 767–776. Reiter, A. M. F., Heinze, H.-J., Schlagenhauf, F., & Deserno, L. (2017). Impaired Flexible Reward- Based Decision-Making in Binge Eating Disorder: Evidence from Computational Modeling and Functional Neuroimaging. Neuropsychopharmacology, 42(3), 628–637. doi:10.1038/npp. 2016.95 Remijnse, P. L., Nielen, M. M., Uylings, H. B., & Veltman, D. J. (2005). Neural correlates of a reversal learning task with an affectively neutral baseline: An event-related fmri study. Neuroimage, 26(2), 609–618. Robinson, O. J., Cools, R., Carlisi, C. O., Sahakian, B. J., & Drevets, W. C. (2012). Ventral striatum response during reward and punishment reversal learning in unmedicated major depressive disorder. American Journal of Psychiatry, 169(2), 152–159. Samson, R. D., Frank, M. J., & Fellous, J.-M. (2010). Computational models of reinforcement learning: The role of dopamine as a reward signal. Cognitive Neurodynamics, 4(2), 91–105. doi:10.1007/s11571-010-9109-x Schuurman, N., Grasman, R., & Hamaker, E. (2016). A comparison of inverse-wishart prior specifications for covariance matrices in multilevel autoregressive models. Multivariate Behavioral Research, 51(2-3), 185–206. Smith, B. J., Xue, F., Droutman, V ., Barkley-Levenson, E., Melrose, A. J., Miller, L. C., . . . Chris- tensen, J. L., et al. (2017). Virtually ‘in the heat of the moment’: Insula activation in safe sex negotiation among risky men. Social cognitive and affective neuroscience, 13(1), 80–91. Snijders, T. & Bosker, R. (1999). Multilevel modeling: An introduction to basic and advanced multilevel modeling. Stan Development Team et al. (2016). Stan modeling language: User’s guide and reference manual. version 2.17.0. Stan Development Team. Retrieved from http://mc-stan.org Sutton, R. S. (1992). Gain adaptation beats least squares. In Proceedings of the 7th yale workshop on adaptive and learning systems (V ol. 161168). 194 REFERENCES Ter Braak, C. J. (2006). A markov chain monte carlo version of the genetic algorithm differential evolution: Easy bayesian computing for real parameter spaces. Statistics and Computing, 16(3), 239–249. Turner, B. M. [Brandon M.]. (submitted). Toward a common representational framework for adaptation. Turner, B. M. [Brandon M.], Forstmann, B. U., Love, B. C., Palmeri, T. J., & Van Maanen, L. (2016). Approaches to analysis in model-based cognitive neuroscience. Journal of Mathematical Psychology. doi:10.1016/j.jmp.2016.01.001 Turner, B. M. [Brandon M.], Forstmann, B. U., Wagenmakers, E.-J., Brown, S. D., Sederberg, P. B., & Steyvers, M. (2013). A Bayesian framework for simultaneously modeling neural and behavioral data. NeuroImage, 72, 193–206. doi:10.1016/j.neuroimage.2013.01.048 Turner, B. M. [Brandon M.], Rodriguez, C. A., Norcia, T. M., McClure, S. M., & Steyvers, M. (2016). Why more is better: Simultaneous modeling of EEG, fMRI, and behavioral data. NeuroImage, 128, 96–115. doi:10.1016/j.neuroimage.2015.12.030 Turner, B. M. [Brandon M.], Sederberg, P. B., Brown, S. D., & Steyvers, M. (2013). A method for efficiently sampling from distributions with correlated dimensions. Psychological methods, 18(3), 368. Retrieved November 8, 2016, from http://psycnet.apa.org/journals/met/18/3/368/ Turner, B. M. [Brandon M], Van Maanen, L., & Forstmann, B. U. (2015). Informing cognitive abstractions through neuroimaging: The neural drift diffusion model. Psychological review, 122(2), 312. van Maanen, L., Turner, B., & Forstmann, B. U. (2015). From model-based perceptual decision- making to spatial interference control. Current opinion in behavioral sciences, 1, 72–77. Wager, T. D., Atlas, L. Y ., Lindquist, M. A., Roy, M., Woo, C.-W., & Kross, E. (2013). An fmri- based neurologic signature of physical pain. New England Journal of Medicine, 368(15), 1388–1397. Wager, T. D. & Barrett, L. F. (2017). From affect to control: Functional specialization of the insula in motivation and regulation. bioRxiv, 102368. Wald, A. (1947). Sequential analysis. 1947. Zbl0029, 15805. Woo, C.-W., Chang, L. J., Lindquist, M. A., & Wager, T. D. (2017). Building better biomarkers: Brain models in translational neuroimaging. Nature neuroscience, 20(3), 365–377. REFERENCES 195 Woods, D. L., Wyma, J. M., Yund, E. W., Herron, T. J., & Reed, B. (2015). Factors influencing the latency of simple reaction time. Frontiers in human neuroscience, 9, 131. Woolrich, M. W., Jbabdi, S., Patenaude, B., Chappell, M., Makni, S., Behrens, T., . . . Smith, S. M. (2009). Bayesian analysis of neuroimaging data in fsl. Neuroimage, 45(1), S173–S186. Xue, G., Xue, F., Droutman, V ., Lu, Z.-L., Bechara, A., & Read, S. (2013). Common Neural Mechanisms Underlying Reversal Learning by Reward and Punishment. PLOS ONE, 8(12), e82169. doi:10.1371/journal.pone.0082169 Yarkoni, T., Poldrack, R. A., Nichols, T. E., Van Essen, D. C., & Wager, T. D. (2011). Large-scale automated synthesis of human functional neuroimaging data. Nature methods, 8(8), 665.
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Insula activity during safe-sex decision-making in sexually risky men suggests negative urgency and fear of rejection drives risky sexual behavior
PDF
Deciphering the variability of goal-directed and habitual decision-making
PDF
Modeling, learning, and leveraging similarity
PDF
Bayesian model averaging methods for gene-environment interactions and admixture mapping
PDF
A self-knowledge model of social inference
PDF
Biometric models of psychopathic traits in adolescence: a comparison of item-level and sum-score approaches
PDF
Predicting and planning against real-world adversaries: an end-to-end pipeline to combat illegal wildlife poachers on a global scale
PDF
Bayesian analysis of transcriptomic and genomic data
PDF
Essays on behavioral decision making and perceptions: the role of information, aspirations and reference groups on persuasion, risk seeking and life satisfaction
PDF
Reproducible large-scale inference in high-dimensional nonlinear models
PDF
Neural correlates of learning and reversal of approach versus withdraw responses to in- and out-group individuals
PDF
Identification and analysis of shared epigenetic changes in extraembryonic development and tumorigenesis
PDF
Hierarchical approaches for joint analysis of marginal summary statistics
PDF
Robot life-long task learning from human demonstrations: a Bayesian approach
PDF
The effect of present moment awareness and value intervention of ACT on impulsive decision-making and impulsive disinhibition
PDF
The brain and behavior of motor learning: the what, how and where
PDF
Learning and decision making in networked systems
PDF
3D deep learning for perception and modeling
PDF
Value-based decision-making in complex choice: brain regions involved and implications of age
PDF
Deep learning models for temporal data in health care
Asset Metadata
Creator
Smith, Benjamin James
(author)
Core Title
Bayesian hierarchical and joint modeling of the reversal learning task
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Psychology
Publication Date
04/28/2019
Defense Date
11/02/2018
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Bayesian,behaviorism,joint modeling,OAI-PMH Harvest,reversal learning,risky decision-making
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Read, Stephen J. (
committee chair
), James, Gareth (
committee member
), Kaplan, Jonas (
committee member
), Monterosso, John R. (
committee member
), Turner, Brandon (
committee member
)
Creator Email
benjsmith@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-158581
Unique identifier
UC11660851
Identifier
etd-SmithBenja-7335.pdf (filename),usctheses-c89-158581 (legacy record id)
Legacy Identifier
etd-SmithBenja-7335.pdf
Dmrecord
158581
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Smith, Benjamin James
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
Bayesian
behaviorism
joint modeling
reversal learning
risky decision-making