Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Computational principles in human motor adaptation: sources, memories, and variability
(USC Thesis Other)
Computational principles in human motor adaptation: sources, memories, and variability
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
COMPUTATIONAL PRINCIPLES
IN HUMAN MOTOR ADAPTATION:
SOURCES, MEMORIES, AND VARIABILITY
by
Youngmin Oh
A dissertation presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITTY OF SOUTHEN CALIFORNIA
In partial fulfillment of the
requirements for the degree
DOCTOR OF PHILOSOPHY
NEUROSCIENCE
August 2015
Copyright 2015 Youngmin Oh
Acknowledgments
I would like to thank my academic advisor Dr. Nicolas Schweighofer for his enthusiastic and
professional advice for last five years. I also thank all committee members: Dr. Stefan Schaal
guided me toward interesting theoretical and experimental research paradigm using the
exoskeleton robot. Dr. Carolee Winstein helped me develop insight on clinical aspects of motor
learning and adaptation.
I also give my sincere thanks to all of my co-authors: Dr. Jun Izawa for the first study topic of
dissociation of learning mechanisms, Dr. Michael Mistry for the third study of adaptation in
joint-space. Giovanni Sutanto gave me wonderful help to make the experiment with the
exoskeleton possible with his excellent coding ability, and through insightful discussions.
Former students in our lab, Dr. Jeong Yoon Lee and Dr. Sungshin Kim, guided me to this
research field in my early years of PhD studies. They are also collaborators of two papers, one
for each, which are not included in my dissertation. I thank Daniel Furman for his genuine
collaboration work to seek neural substrate of motor adaptation.
I appreciate all valuable advice and comments from members of my former guidance committee,
Dr. Terence Sanger and Dr. Bosco Tjan; and from other USC faculties, Dr. James Gordon, Dr.
Francisco Valero-Cuevas, and Dr. James Finely.
I thank all of current and past CNRL members. They have been friends, colleagues, and advisors.
Last, but not least, I give my sincerest gratitude and love to my family.
Table of Contents
Abstract 1
Chapter 1. Background and Introduction 2
Chapter 2. Dissociation of error-based learning from reward-based learning 5
2.1. Introduction 5
2.2. Materials and Methods 8
2.2.1. Experiment design 8
2.2.2. Data analysis 12
2.2.3 Computational models 14
2.3. Results 20
2.3.1. Experimental results 20
2.3.2. Simulation results 26
2.4. Discussion 29
Chapter 3. Multiple memories account for forgetting in visuomotor adaptation 33
3.1. Introduction 33
3.2. Materials and methods 35
3.2.1. Experimental design 35
3.2.2. Computational model 37
3.2.3. Simulation 40
3.3. Results 43
3.3.1. Simulation example 43
3.3.2. Savings and aftereffect 44
3.3.3. Decay in error-clamp 46
3.3.4. Triggers in error-clamp 48
3.4. Discussion 52
Chapter 4. Motor adaptation in high-dimensional redundant joint-space 56
4.1. Introduction 56
4.2. Materials and methods 57
4.2.1. Exoskeleton and control law 57
4.2.2. Experiment design 59
4.2.3. Data analysis 61
4.3. Results 64
4.3.1. Movement duration 64
4.3.2. Jacobian conversion factors 65
4.3.3. Trajectories in hand-space and joint-space 66
4.3.4. Procrustes analysis 70
4.3.5. Functional principal component analysis (FPCA) 71
4.4. Discussion 73
Chapter 5. Summary 75
Bibliography 77
List of figures
Figure 2.1. Sketch to show how the two learning mechanisms are integrated 7
Figure 2.2. Experimental paradigm 11
Figure 2.3. Hand direction in training trials of representative individuals 21
Figure 2.4. Probe tests 23
Figure 2.5. Reward probability and trial-by-trial variability in hand direction in RPE 25
Figure 2.6. Generalization curves 26
Figure 2.7. Simulation results 27
Figure 3.1. Experiment design 36
Figure 3.2. Diagram for mixture of experts 38
Figure 3.3. Two different scenarios of simulation 41
Figure 3.4. Simulation result of each experimental condition 44
Figure 3.5. Group-averaged plot of hand direction and time constant analysis 45
Figure 3.6. Mean and individual data in error-clamp 47
Figure 3.7. Distribution of hand direction 48
Figure 3.8. Simulation examples for trigger conditions in error-clamp 49
Figure 3.9. Mean and individual data in error-clamp with two trigger trials 50
Figure 3.10. Regression analysis on effect of triggers 51
Figure 4.1. Sarcos Master Arm 58
Figure 4.2. Experiment design 60
Figure 4.3. Movement duration 65
Figure 4.4. Jacobian conversion factors 66
Figure 4.5. Sample trajectories from an individual subject in the abrupt group 67
Figure 4.6. Group-averaged time-series of each variable 68
Figure 4.7. Normalized variability of each joint 69
Figure 4.8. Procrustes analysis 70
Figure 4.9. Functional PCA for SFE velocity 72
1
Abstract
In this dissertation study, I conduced behavioral experiment and applied computational theories
to understand human motor adaptation. Motor adaptation is one kind of motor learning in which
learners return their performance gradually back to baseline level in the presence of external
perturbation. Among various motor adaptation paradigm, I conducted visuomotor rotation for the
first two studies and force-field adaptation for the third study. The central motor behavior I
studied was volunteer reaching.
Topics of each study are as following: i) Dissociation of different sources of motor adaptation:
sensory feedback for error-based learning and reward feedback for reward-base learning. ii)
Modular decomposition of motor memories to account for repeated learning and washout data,
and stochastic behavior in error-clamp. iii) Trial-by-trial adaptation dynamics in high-
dimensional redundant system, and structure in variability among trajectories during adaptation.
Overall, the dissertation discovered i) dissociative learning mechanisms and their interaction, ii)
modular structure of motor memories and conditions to distinguish two-model learners and one-
model learners, and iii) higher deviation and variability of joint trajectories compared to hand
trajectories and structured variability in trial-by-trial adaptation. These computational
understanding on various aspects of motor adaptation have rich potentials to be applicable to
clinical diagnosis and treatment by analyzing kinetic and kinematic data.
2
Chapter 1. Background and Introduction
Motor adaptation refers to gradual performance improvement to baseline level in response to
external perturbation (Huang & Krakauer 2009; Krakauer & Mazzoni 2011; Shadmehr et al.
2010; Haith & Krakauer 2012). Specific experimental tasks include visuomotor adaptation
(Ghahramani & Wolpert 1997; Krakauer et al. 2005; Zarahn et al. 2008; Shmuelof et al. 2012;
Lee & Schweighofer 2009), force-field adaptation (Shadmehr & Mussa-Ivaldi 1994; Scheidt et al.
2001; Gandolfo et al. 1996), and split-treadmill adaptation (Choi et al. 2009; Reisman et al. 2009;
Torres-Oviedo & Bastian 2012). Computational approach to understand and predict behavioral
data in motor adaptation has proven to be useful (Jordan 1996; Shadmehr & Krakauer 2008;
Schaal & Schweighofer 2005; Haith & Krakauer 2012). Understanding underlying mechanism of
motor adaptation not only helps basic science of movement, but also can contribute to improving
rehabilitation for patients with movement disorder, such as post-stroke patients. For example,
recent studies showed that calculated training schedule from computational models of motor
adaptation can improve retention of learned performance (Choi et al. 2008; Hidaka et al. 2012;
Schweighofer et al. 2011).
In this dissertation study, I investigated three fundamental features of motor adaptation using
computational modeling and analysis: multiple sources of motor adaptation, acquisition and
decay of multiple motor memories, and adaptation in redundant system with multiple joints. The
dissertation is organized to assign chapters to these three studies.
Chapter 2 includes the study on multiple sources of motor adaptation, represented by (sensory)
error-based learning and reward-based learning. Although these two learning mechanisms have
been studied separately for a long time, it began only recently trend that the motor learning
community tries to understand integrative framework uniting different learning mechanisms
(Izawa & Shadmehr 2011; Taylor et al. 2014). However, sensory and reward feedback are
usually bound together in most real-life and experimental conditions. Therefore, individual
contribution from these qualitatively different sources has not been clearly understood. In this
study, we suggested a new experimental paradigm that dissociated two types of feedback
experimentally. Then, we suggested computational models to account for adaptation from each
3
mechanisms; extended Kalman filtering (Kalman 1960) for error-based learning and a policy-
gradient reinforcement learning (REINFORCE; Williams 1992) for reward-based learning.
Chapter 3 investigates trial-by-trial dynamics of motor memory acquisition and decay. How to
retain learned motor memory for long time is critically important for motor learning and
rehabilitation. In the field of computational motor adaptation study, there exist two classes of
theories with regard to formation and decay of motor memory. The first class of theory, known
as linear state-space model (Smith et al. 2006), suggest that memory is overwritten whenever
environment changes. According to this theory, savings or faster relearning is only possible with
meta-learning (increased learning gain after initial learning). In addition, one should expect
gradual and passive decay in error-clamp, in which feedback error is clamped to zero. The other
class of theory, represented as modular decomposition or mixture of experts, assumes multiple
protected memories that are updated only when relevant (MOSAIC: Wolpert & Kawato 1998;
Haruno et al. 2001). According to this class of theory, savings is a result of recall of previously
learned memory (Berniker & Kording 2011). A recent study also suggested that memory may
not decay passively in error-clamp (Vaswani & Shadmehr 2013). We unite these two classes of
theories to predict the condition that results in one of the two cases.
Chapter 4 switches gear and studies adaptation in high dimensional space – redundant system of
multi-joint exoskeleton arm. Previous two studies focused on central learning mechanisms with
relatively simple experimental task of visuomotor rotation. In this study, we investigated how
trial-by-trial dynamics of adaptation takes in different forms for task-space (end-effector space
defined by hand location) and joint-space (high-dimensional virtual space constructed from joint
trajectories). Motor adaptation in redundant system has important implication because it
resembles our real-life experience in motor control and learning. From the previous study
(Mistry et al. 2005), we expected non-returning-to-baseline trajectories from joint-space. We also
expected larger variability in joint-space, which is a hallmark of exploiting redundancy to
achieve a goal in task-space (Cusumano & Cesari 2006; Latash et al. 2002; Schöner & Scholz
2007). We also tested different adaptation patterns of abrupt vs. gradual application of
perturbation, as the former has been known to be more generalizable to novel context (Kluzik et
al. 2008; Torres-Oviedo & Bastian 2012).
4
Finally, in chapter 5, I summarize main findings from each study and discuss implications of the
studies.
5
Chapter 2. Dissociation of error-based learning from
reward-based learning
2.1. Introduction
A central concept in motor control and learning is the forward model (FM), which predicts the
sensory consequences of a given motor command (Jordan & Rumelhart 1992; Miall & Wolpert
1996; Mehta & Schaal 2002). In theory, the FM is updated via by “self-supervised learning” via
the sensory prediction error, that is, the difference between the actual sensory input and the
prediction.
A number of authors have suggested that FM update accounts for error-based learning in motor
adaptation (Wolpert et al. 1995; Imamizu et al. 2000; Krakauer et al. 2006; Burge et al. 2008;
Berniker & Kording 2008; Izawa et al. 2012). However, two fundamental questions need to be
answered to make a definitive case for a role of FM in motor adaption. First, is the sensory
prediction error alone sufficient and necessary for FM update? In experiments where a task goal,
such as a target, is given both a prediction error and a performance error are available to the
learner (Jordan & Rumelhart 1992). The performance error can notably act as a reward signal
that can directly update the motor command (Figure 2.1). A landmark study showed that FM can
be updated with no initial performance error (Mazzoni & Krakauer 2006). However, here again,
the target is in some cases rectified and given as a continuous reward signal (Nikooyan & Ahmed
2015) or as a binary reward signal (Izawa & Shadmehr 2011; Pekny et al. 2015) . Even when no
explicit reward is given, as long as a target is presented, the distance to the target can act as a
reward. Thus, the isolated role of a prediction error alone was not clearly understood. Second,
despite of wildly accepted notion of the “inverse model” (IM), which makes use of a FM
estimate in action selection, no clear link was made on how the nervous system uses a FM
estimate in generating a motor command.
Incomplete dissociation of FM from action selection made it also difficult to understand the role
of reward in motor adaptation. Being provided with or without sensory feedback, valued reward
in simple reaching conditions automatically encodes error gradient, and acts as an indirect error
6
signal (Nikooyan & Ahmed 2015; Galea et al. 2015). On the other hand, binary reward without
sensory feedback minimizes such error gradient information (Izawa & Shadmehr 2011), and thus
exhibits the hallmark of model-free reinforcement learning, such as reward-dependent
modulation of movement variability (Pekny, Izawa et al. 2015). However, it has not been tested
whether even this dissociated reward could update FM in the course of training.
In order to understand separate roles of sensory and reward feedback in updating FM and action
selection, we designed two experimentally dissociated groups. In the sensory prediction error
(SPE) group, subjects performed a random reaching task without a given target, and received
only visual feedback. This design isolated a pure prediction error from a performance error and
any associated reward with a target. We hypothesized that i) a prediction error alone could
update FM, and ii) such an updated FM estimate could be used to generate a motor command for
a goal-directed movement. We modeled the first hypothesis with Kalman filter, and the second
hypothesis with the optimal feedback controller (OFC; Todorov 2004) that generates a motor
command to minimize a quadratic cost. OFC provided a theoretical basis on how a FM estimate
could minimize a cost, without assuming a separate system of IM. Our second group, the reward
prediction error (RPE) group, provided binary reward without visual feedback. We hypothesized
that iii) the dissociated reward would modulate both mean and variance of a motor policy,
described by a policy gradient algorithm (REINFORCE; Williams 1992); and that iv) FM could
be partially updated even with binary reward, modeled by a nonlinear outcome prediction of FM.
7
Figure 2.1. Sketch to show how the two learning mechanisms are integrated to generate a motor
command. (A) A motor command u consists of two distinctive parts: the aiming direction m from the
motor policy and perturbation estimation ˆ p from the inverse model. The motor command
compensates the perturbation p and brings the cursor y toward the target (circle). (B) A motor policy
seeks a motor command that gives the greatest reward probability. As an example, a Gaussian policy with
mean m and variance
2
is shown. (C) The internal model estimates perturbation from the generative
model of the environment. Variables in the shadowed circles are directly accessible to subjects whereas
those in the white circles have to be estimated. From estimation of hand (
ˆ
h ) and perturbation ( ˆ p ), the
forward model predicts sensory and reward consequences, ˆ y and ˆ r , respectively.
8
2.2. Materials and Methods
2.2.1. Experiment design
Twenty-six young and healthy volunteers (25.3 ± 3.9 years old, 5 males and 21 females) were
randomly assigned to either a sensory feedback condition (N = 13), or a reward feedback
condition (N = 13). All subjects signed the informed consent approved by the Institutional
Review Board at the University of Southern California. Subjects sat in front of an experimental
device that matched hand space with visual space via a two-way mirror (Figure 2.2A), and were
instructed to hold a stylus pen moving on a digitizer tablet (Wacom Intuos 7). The experiment
took place in the dark, and view of the forearm and hand was additionally obscured by the mirror.
A cursor representing the tip of the pen was displayed on the mirror. Before the start of each trial,
the subjects were instructed to position the cursor inside a home circle of a radius of
approximately 3 mm. At the onset of each trial, a 30-degree arc of 10 cm radius appeared. The
middle of the arc was aligned toward the forward direction, such that the arc was roughly parallel
to the subject’s torso. Note that we used a polar coordinate system centered on the home circle,
with 0º defined as the forward direction and positive direction as clockwise deviation (Figure
2.2B).
In all conditions and tasks (see below), subjects were instructed to perform a shooting movement
that crossed the arc, and then to return to the home circle during which only radial location of the
cursor was available. Movement durations of the shooting movements were constrained to be
within 500 to 800 ms, with messages “Too fast” or “Too slow” displayed when durations were
out of this range. The shooting direction and the type of feedback varied across conditions and
probe tasks (see below).
The experiment schedule consisted of repeated blocks. A single block comprised 10 training
trials (SPE or RPE exclusively, one kind for each learning condition) and 2 types of probe trials
(catch and localization). These two probe trials were separated by five training trials (Figure
2.2C). See the following sections for description on training trials and probe trials. After 4 initial
blocks, a gradual visuomotor rotation was applied, such that the cursor position was rotated from
the hand position (tip of the pen) around the home location. The rotation angles were gradually
9
changing over blocks from 0º to −8º, by increment of −1º every 2 blocks. After the rotation angle
reached at −8º, it retained the same value for 6 additional blocks. Sixteen out of 26 subjects were
given the extended 8 blocks of −8º rotation after a generalization block (See the following
section on generalization). Before the main session of the experiment, all subjects performed 185
familiarization trials without visuomotor rotation. Familiarization consisted of 60 full-feedback
trials (visual + reward), 30 catch trials, 35 generalization trials, and 60 sets of localization tasks.
Sensory prediction error (SPE) group
In the SPE group, subjects were instructed to make shooting movements toward an arbitrary
direction within the arc, in the absence of any target (Figure 2.2D). Subjects were encouraged to
shoot toward various directions evenly across trials. During the shooting phase, online feedback
of rotated cursor location was displayed in the form of a 1.2 mm-radius red dot. Once the cursor
crossed the arc, on-line feedback was disabled, and the crossing point was displayed for 1s.
Reward prediction error (RPE) group
In the RPE group, subjects were instructed to make shooting movements toward a 3º -radius
circular target on the middle of the arc (Figure 2.2D). The cursor became invisible once the hand
moved outside the home circle. After the cursor crossed the arc, binary reward feedback was
displayed for 1 second to indicate whether the cursor crossed the target area or not. Success
feedback was displayed as a smiley face and was accompanied with a pleasant bell sound,
whereas failure feedback was displayed with a grey face with a low-tone sound. The same
schedule of gradual visuomotor rotation as in the SPE condition was applied to the invisible
cursor.
Probe tests: catch, localization, and generalization trials
Interspersed with training trials in both SPE and RPE conditions, we inserted two types of probe
tests, catch and localization trials (Figure 2.2E). We designed catch trials to measure the overall
10
adaptation level. In catch trials, subjects were instructed to hit a circular target of 3º radius on the
middle of the arc. Neither visual cursor feedback nor reward feedback was provided. The catch
angle was defined as the angular distance between the crossed hand position and the center of the
target. On the other hand, localization trials were designed to measure the degree of forward
model update (Izawa & Shadmehr 2011; See below, computational model, also Figure 2.1A and
1C). Each localization test consisted of two consecutive trials: in the first trial, subjects were
instructed to shoot toward an arbitrary direction within the arc. No feedback was presented. In
the second trial, subjects were asked to point the direction where they estimated that their hand
crossed in the first trial. For this pointing task, subjects controlled a trackball mouse with their
left hand to rotate a clock hand on the screen to point the estimated hand direction. Then, they
clicked the mouse button to confirm the selected direction. The difference between actual hand
direction (measured in the first trial) and the pointing direction (measured in the second trial)
defined the localization error.
Eight out of 13 subjects in each group (16 subjects in total) were further tested for generalization.
Three generalization blocks were inserted (Figure 2.2C): the baseline block before the main
session begins (block 0), the first post-adaptation block after the first asymptote (block 1), and
the second post-adaptation block after the second asymptote (block 2). Each generalization block
consisted of 35 consecutive catch-like probe trials: there was no feedback. Target location varied
pseudo-randomly among seven different locations: −15º, −10º, −5º, 0º, 5º, 10º, and 15º. Each
target was presented five times, intermixed in a pseudorandom order. As in catch trials, the
angular distance between the point where the hand crossed the arc and the center of each target
defined the generalization angle. We defined generalization performance as change in
generalization angle from the baseline block to each of the two post-adaptation generalization
blocks.
11
12
Figure 2.2. Experimental paradigm. (A) The three-layered structure of the device created visual illusion
that matched physical location of the pen tip and the cursor location, unless perturbed. (B) Task space
with the displayed arc (C) A visuomotor rotation schedule (Note that direction of rotation is negative, and
the black line represents the ideal hand direction for adaptation: negative of negative = positive direction).
Task schedule consisted of blocks, and a single block contained 10 training trials (“S” or “R” exclusively
for each group) and two probe tests (“C” and “L” for both groups). (D) The two learning conditions. The
sensory prediction error (SPE) group had sensory feedback only: the black dashed line indicates the
invisible hand movement as an example, and the red line indicates the online visual cursor movement.
The reward prediction error (RPE) group had binary reward feedback only. Success feedback is
demonstrated as an example. (E) Two probe trials for both groups. Catch trials measured deviation of
hand from the center of target (vertical dashed line in the second panel of catch test) without providing
any feedback. Localization test measured error between actual hand direction and the pointing direction
(blue “clock hand” which was always visible to subjects).
2.2.2. Data analysis
For each of localization and catch trials, we divided the data into an “incline” phase lasting until
the external rotation reaches at −8º (blocks 1 to 18: 18 data points, two per each 1º increment of
rotation angle from 0º to −8º), and an asymptotic phase (blocks 27 to 34: 8 points with the
rotation angle −8º).
Within-group comparison
For the SPE group, we tested the following three hypotheses (see Computational models section
below for rationale): 1. The forward model estimate is updated, measured by a positive slope
(incline phase) and a positive asymptote (asymptotic phase) of localization error. 2. Subjects
partially adapt to the perturbation, measured by a positive slope and a positive asymptote of
catch angle. 3. Adaptation is driven by the forward model update, tested by that localization error
is greater than or equal to catch angle, both in slopes and in asymptotes.
Similarly for the RPE group, we tested the following three hypotheses: 4. The forward model is
partially updated, measured by a positive slope and an asymptote of localization error. 5.
13
Subjects adapt to the perturbation, measured by a positive slope and an asymptote of catch angle.
6. Early in the incline phase, adaptation is driven by a new motor policy, but later in the incline
phase and in the asymptotic phase, the forward model update accounts for adaptation more than
the policy. This hypothesis is tested by that localization error is smaller than catch angle, both in
slopes and in asymptotes.
In order to test these hypotheses, we used linear mixed effect models (R package “lme4”; Bates
et al., 2014) in the statistical software R (R core team, 2014). For the positive-slope tests, blocks
were taken as a fixed factor and subjects were taken as random intercepts in each subset of data,
4 subsets from group (SPE or RPE) × probe (catch or localization) combinations. For the
positive-asymptote tests, a constant (“1” in R) was taken as a fixed factor, and blocks and
subjects were taken as random intercepts. For comparing slopes between the two probe tests, we
performed model comparison for nested models with the log-likelihood ratio test. Trials were
taken as a fixed covariate, the type of measurement (catch vs. localization) as a fixed factor, and
subjects as random intercepts. To test whether there was a difference in slopes between the two
measurements, we compared the model comprising interaction between trial and measurement,
against the null model of no interaction. Test statistic was
2
with degree of freedom 1 as
difference in dimensionality between the two models, according to Wilks’ theorem (Wilks 1938).
Comparing asymptotes were done in a similar way of model comparison, now blocks as random
intercepts instead of a fixed covariate. The threshold for significance was p < 0.05.
Out of total 936 data points (26 subjects × 2 types of probe measurement × 18 blocks) in the
incline phase, 8 data points were excluded from analysis because of early pointing or absolute
error being greater than 15º . In the asymptotic phase, we tested for an overall difference between
the two measurements across trials and subjects. Out of total 256 data points (16 subjects × 2
types of probe measurement × 8 blocks) in the asymptotic phase, 5 data points were excluded
from analysis for the same criteria as in the incline phase.
14
Between-group comparison
Here we test the hypothesis that the overall adaptation level (measured by catch trials) would not
differ between the two conditions, but that the degree of forward model update would be greater
in SPE than RPE (see Computational modeling section for rationale). As above, we separated
data into the incline phase and the asymptotic phase. In the incline phase, for each catch and
localization trails in turn, we considered blocks and condition (SPE vs. RPE) as fixed effects, and
subjects as random intercepts. For the asymptote phase, we considered condition (SPE vs. RPE)
as a fixed effect, and blocks and subject as random effects.
2.2.3 Computational models
System dynamics
In a real environment, a hand direction h is determined from a motor command u :
1
h
t t t
hu
(2.1)
where
2
~ (0, )
h
th
N is a motor noise term. Variables represent angles in the polar coordinates
where the reference point is at the center of the home circle (starting location). A visual cursor
location y is rotated by a perturbation angle p from the hand direction h :
,
y
t t t t
yh
t t t
y p h
pu
(2.2)
where
2
~ (0, )
y
ty
N is a visual noise term and
, 2 2
~ (0, )
yh
t y h
N is a mixed noise term. We
assume the perturbation dynamics follows a random walk (Berniker & Kording 2008; Izawa &
Shadmehr 2011):
1 t t t
p ap w
(2.3)
where 01 a is a retention factor and
2
~ (0, )
tp
wN is a process noise term.
Inside a learner’s brain, we assume that the internal forward model (FM) predicts a perturbation
estimate ˆ p (See the section of Kalman filter for the estimation update equations). We further
15
assume that learners approximate their hand estimate
ˆ
h from a visual cursor estimate ˆ y in the
angular coordinate.
1
ˆ
ˆ ˆ
t t t t
h y p u
(2.4)
This approximation is based on smaller visual uncertainty than proprioceptive uncertainty in the
angular direction (van Beers et al. 1999, 2002). u in equation (2.4) is the efferent copy of the
original motor command in equations (2.1) and (2.2).
In our localization task, subjects’ actual hand direction is given by equation (2.1), whereas their
pointing direction is given by equation (2.4). The difference, localization error, is thus
approximately equal to the perturbation estimate:
, 1 1
ˆ
ˆ
local t t t t
h h p
(2.5)
Updating ˆ p with sensory prediction error or reward prediction error
We define an extended state variable
T
t t t
ph x following a previous study (Izawa &
Shadmehr 2011). The state transition equation is given by:
1 1 1
x
k k k k
A Bu
xx η (2.6)
where
0
00
a
A
,
0
1
B
, and ~ (0, )
x
tx
N η a state noise vector. The output equation for the
SPE group is:
y
k k k
yC x (2.7)
where [1 1] C is a linear observation matrix for SPE. Inside a subject’s brain FM simulates this
actual dynamics:
( | 1) ( 1| 1) 1
ˆ ˆ
k k k k k
A Bu
xx (2.8)
with a prediction on the output for SPE:
16
( | 1)
ˆ ˆ
k k k
yC
x (2.9)
Sensory prediction error is defined as:
( | 1)
ˆ ˆ
k k k k k k
y y p p
(2.10)
With the standard KF update rule (Kalman 1960), a learner updates the state estimation:
( | ) ( | 1)
ˆ ˆ
k k k k k k
K
xx (2.11)
where
k
K is a Kalman gain vector. Equations (2.10) and (2.11) show that state update is only
dependent on sensory prediction error, and is independent of a specific choice of a motor
command u to hit a target, thus enables learning without a goal.
For the RPE condition, subjects do not receive visual feedback on cursor location. Instead, they
receive binary feedback of success or failure depending on hand location. We assume that the
brain forms an approximation of binary reward as a continuous probability density function:
22
1
( , ) exp ( ) / 2
2
z h c h c d
d
(2.12)
where c is the center of a target and d is the effective target radius. If we set an observation
matrix as:
( | 1) ( | 1)
( | 1)
ˆ ˆ
ˆ
, 0,
k k k k
kk
h h h h
pp
z z z
H
p h h
(2.13)
, we can replace C with H in equations (2.7) and (2.9) to apply formulae of the extended
Kalman filter (EKF) to update the state. In this case, the update signal is reward prediction error,
defined as:
ˆ
ˆ ( , )
RWD
k k k k
r r r z h c (2.14)
where real reward
k
r is 1 for success and 0 for failure.
17
Motor command generation from feedforward and feedback components
In this section, we describe how a motor command is generated when a target is presented (as in
RPE learning condition and catch trials). The task goal is to minimize an endpoint error each trial.
We decompose a motor command into two components: a feedforward component and a
feedback component:
net FF FB
t t t
u u u (2.15)
The feedback component is determined from the perturbation estimate ˆ p from prediction errors
(sensory or reward). On the other hand, the feedforward component is determined from a motor
policy, which gives an aiming direction to minimize a performance error and to maximize
associated reward probability. The next section describes how reward modulates mean and
variance of a motor policy.
With assumption of an unbiased estimate of perturbation, optimal feedback control (OFC) allows
us to derive a feedback command that minimizes a cost at each trial, given as:
T T 2
t t t t t t t
J Q u Ru h xx (2.16)
where
00
01
Q
and 0 R (i.e., no motor cost). Equation (2.16) gives the quadratic cost of an
performance error, assuming a target location at 0. In the framework of linear quadratic regulator
(LQR), the system dynamics is given by equation (2.6) and feedback is given by equation (2.7)
for SPE. For RPE, C is replaced by H in equation (2.13). Let us express both C and H as H
for the following derivation equations. With an unbiased state estimate, LQR gives the optimal
feedback control law with the gain
t
L :
ˆ
FB
t t t
uL x (2.17)
where
1
11
1
()
()
TT
t t t
T
t t t t
L R B S B B S A
S Q A S A BL
(2.18)
18
The condition of 0 R makes
1
11
()
TT
t t t
L B S B B S A
, and substituting this into the second line
of equation (2.18) gives:
11
1 1 1
()
T T T
t t t t t
t
S Q A S A BB S B B S A
Q
(2.19)
Thus, from the first line of equation (2.18),
T 1 T
11
()
[1 0]
t t t
L B Q B B Q A
(2.20)
where matrices A and B are given by the state dynamics equation (2.6). Thus, the feedback
motor command is simply given by:
ˆ ˆ
FB
t t t t
u L p x (2.21)
Feedforward command from a motor policy and its update
The feedback command in equation (2.21) is optimal only with an unbiased estimate ˆ
t
p of the
real perturbation
t
p . With a changing environment, the estimate is frequently biased, and the
feedforward command can be added to decrease a performance error further. Here, we describe a
Gaussian motor policy with mean m and variance
2
from which a feedforward command u
is generated:
22
,
1
( ) exp ( ) / 2
2
tt
m t t t t
t
g u u m
(2.22)
A policy gradient algorithm REINFORCE (Williams 1992) gives update rules for both mean and
variance of the policy depending on reward:
2
22
3
()
()
()
()
tt
mt
t
t t t
t
t
um
m r b
um
rb
(2.23)
19
where
m
is an update gain for mean,
is an update gain for standard deviation,
t
r is actual
reward, and b is a reward baseline. The choice of b is arbitrary and the update rules guarantee
convergence to local maxima of reward function (Williams 1992), although there exists the
optimal choice of b to minimize variability in parameter change (Peters & Schaal 2008). If
reward is smaller than baseline as a result of smaller motor deviation, i.e.,
22
()
t t t
um
, then
t
increases, leading to exploration or active search. The opposite scenario leads to exploitation
or reduction in variability. To account for minimum variability in motor command generation of
humans, we put a constraint on choice of
t
= max(
min
,
1 t
), where
min
is the minimum
standard deviation in motor command generation.
Overall, the net motor command from equation (2.15) is expressed in the two groups as:
ˆ , SPE
ˆ , RPE
net FF FB
t t t
t
tt
u u u
p
up
(2.24)
Lack of the feedforward command in SPE is due to the dissociated condition of no reward.
Simulations
In the simulation, we represented catch angles from the overall motor command u (equation 2.8),
and localization errors from ˆ hp (equation 2.11). For SPE simulation, we set 0 u
(for
catch) and updated ˆ p using the linear KF. For RPE simulation, we let REINFORCE determine
u
from the policy, and updated ˆ p using EKF.
20
2.3. Results
2.3.1. Experimental results
Hand direction during training trials
Figures 2.3A and 2.3B show hand direction during training trials for representative individuals in
each group, SPE and RPE, respectively. Large inter-trial variability in SPE was due to the
instruction of random reaching. Despite of this large variability, we can observe a trend of
change in the hand direction as the rotation angle increases. In contrast, inter-trial variability was
overall smaller in RPE due to presence of the target. However, we can notice that the hand
direction did not simply follow the increase of rotation for this subject. After the subjects
consistently missed the target, the subject appears to increase the variability of its hand direction
to discover the new hand direction that yield the rewards. Variability appears to decrease again in
the asymptotic phase once the subject found the new hand direction that consistently yield the
reward. This pattern suggests the subjects use an active search mechanism to seek reward.
Figures 2.3C and 2.3D show distribution of hand and cursor during training trials across all
subjects in each group, SPE and RPE, respectively. Mean and standard deviation of distribution
were 0.6 8.0 for cursor in SPE, 4.0 8.4 for hand in SPE, 2.3 3.6 for cursor in RPE,
and 2.4 4.2 for hand in RPE. As expected, SPE had larger variability in shooting direction
than RPE. Overall difference in mean between cursor and hand was because data points were
collected while the visuomotor rotation gradually increased from 0 to 8 (block number 1 to
26).
21
Figure 2.3. Hand direction in training trials of representative individuals in the sensory prediction error
condition (SPE), and the reward prediction error condition (RPE). (A) SPE condition: Subjects reached
toward arbitrary directions. Reaching direction was guided by an arc of 30º width. Dashed lines indicate
boundaries of the effective hand space to be remained within the arc area, after taking into account of the
visuomotor rotation. (B) RPE condition: Subjects reached toward a fixed circular target (6º width) at the
center of the arc and received only binary feedback of success (red dots) or failure (blue dots) without
visual information on cursor location. Success was determined if the invisible cursor crossed the target
area, i.e., if the hand crossed the effective target area (reward zone: bounded by dashed lines). (C) SPE
22
group: distribution of cursor and hand across all subjects in the group. (D) RPE group: distribution of
cursor and hand across all subjects in the group.
Probe trials: Catch and Localization
Figures 2.4A and 2.4B show the average time courses of the two probe measurements for all
subjects in each group: overall adaptation level measured by catch angle (red) and forward model
update by localization error (blue).
Subjects in the SPE group had positive slopes and asymptotes for both catch angle and
localization error (Figure 2.4C). Slope of catch angle was significantly greater than zero: 0.25 ±
0.037º / block (
5
18
6.61, 10 tp
; 56% of rotation increase rate), and that of localization error
was significantly greater than zero: 0.37 ± 0.036º / block (
8
18
10.26, 10 tp
; 83% of rotation
increase rate). Localization error increased faster than catch angle ( : See
Materials and Methods for statistical analyses). Asymptote of catch angle was significantly
greater than zero: 3.99 ± 0.49º (
2
6.77, 0.01 tp ; 50% of rotation), and that of localization
error was significantly greater than zero: 6.31 ± 0.40º (
2
15.84, 0.002 tp ; 79% of rotation).
Asymptote of localization error was greater than that of catch angle (
25
22.91, 10 p
).
Subjects in the RPE group also exhibited positive slopes and asymptotes for both catch angle and
localization error (Figure 2.4D). Slope of catch angle was significantly greater than zero: 0.30 ±
0.035º / block (
5
18
6.61, 10 tp
; 67% of rotation increase rate), and that of localization error
was significantly greater than zero: 0.17 ± 0.048º / block (
3
18
3.65, 10 tp
; 38% of rotation
increase rate). However, in contrast to the SPE group, catch angle increased faster than
localization error (
2
4.26, 0.039 p ). Asymptote of catch angle was significantly greater
than zero: 6.51 ± 0.65º (
2
10.02, 0.0049 tp ; 81% of rotation), and that of localization error
was significantly greater than zero: 5.24 ± 0.62º (
2
8.51, 0.0068 tp ; 65% of rotation). In
contrast to the SPE group, asymptote of catch angle was greater than that of localization error
(
23
11.11, 10 p
).
2
4.53, 0.033 p
23
We then conducted between-group comparison of catch angle and. localization error (Figures
2.4E and 2.4F). Catch angle increase rate was not significantly different between the two groups
( ), but the asymptotic value of catch was higher in RWD than in ERR
(
23
10.97, 10 p
). Localization error increased faster in SPE than in RPE
( ), and it also reached at higher asymptote in ERR than in RWD
(
2
5.25, 0.022 p ).
2
0.78, 0.38 p
2
10.12, 0.0015 p
24
Figure 2.4. Probe tests. A. SPE condition: Both catch direction (overall adaptation level) and localization
error (forward model update) increased as rotation angle increased. B. RPE condition: Increase rate for
localization error was much smaller than that of catch direction during the course of training. C-F. Slopes
were estimated in the incline phase (see Method) using a mixed linear model that included interaction
with each condition (SPE and RPE), or measurement (catch and local), for within-group comparison and
between-group comparison, respectively. Similarly, asymptotes were also estimated using a general linear
model estimation in a similar way (see Materials and Methods). Significance levels: *: p < 0.05, **: p <
0.01, ***: p < 0.001
Reward probability and trial-by-trial variability of hand direction in RPE
Figure 2.5 demonstrates negative relationship between success probability and trial-by-trial
variability in hand direction (
2 27
0.32, 10 Rp
) for RPE subjects. For a given trial, reward
probability was obtained from average reward (success as 1 and failure as 0) in a time window of
past 10 trials. On the other hand, trial-by-trial standard deviation in hand direction was measured
from a time window of next 10 trials. Initially, success probability was high and subjects
maintained low variability. Reward probability then gradually decreased below 0.5 around trials
200, accompanied with gradual increase in trial-by-trial standard deviation of hand direction.
After this point and especially near the end of training, this trend was reversed: success
probability increased again and trial-by-trial variability decreased back to their own baseline
levels, respectively. Combined with the result from Figure 2.4B, subjects not only changed
reaching direction, but also its variability as a function of reward probability.
25
Figure 2.5. Reward probability and trial-by-trial variability in hand direction in RPE. (A) The red line
represents running average of reward probability, and the black line represents running average of trial-
by-trial standard deviation of hand direction. Both lines indicate mean across subjects and were smoothed
with neighbor trials. (B) Negative correlation between the two quantities obtained from (A).
Generalization tests
Figure 2.6 shows two generalization curves for each condition, defined as difference in hand
direction between the baseline generalization block and one of the two generalization blocks.
Both groups showed good generalization with that hand direction was close to 8º for all seven
targets. This indicates that perturbation estimation in SPE was largely direction-independent
except at the edge of 15º , and again confirmed the update occurred without having explicit
targets during training. Similarly, in RPE condition adaptation was not limited to the originally
trained target location, but generalized to neighbor targets within 15º range at minimum.
26
Figure 2.6. Generalization curves for (A) SPE condition and (B) RPE conditions. Y-axis represents mean
of increase in catch direction from the baseline generalization block. Error bars indicate standard error.
Shaded area represents target width of ±3º . Therefore, data points within this area are considered as
successful adaptation. Perfect generalization would be aligned along the 8º line.
2.3.2. Simulation results
Figure 2.7 shows simulation results from a similar perturbation schedule as in the experiment. In
simulation of the SPE condition, of the choice of aiming direction was completely random within
the arc range, as in the experiment. However, because the update of the perturbation estimate ˆ p
does not depend on the actual hand direction (see equations 16 and 17), and because the motor
command is equal - ˆ p (equation 7), simulation result shows that both ˆ p and the hand direction
in catch trials compensate for the rotational angle (Figure 2.7A). In simulation of the RPE
condition, the policy gradient (“REINFORCE”) and the optimal feedback controller were both
updated generate the motor command: the policy gradient searched for a new aiming direction,
and the EKF updated perturbation estimation ˆ p from reward prediction error, which was then
used to update the feedback motor command (Figure 2.7B). Because the net hand direction was
27
obtained from summation of the two contributions – the policy and the perturbation estimate,
note how the contribution from the policy decreases as ˆ p increases. Also note that, compared to
the SPE condition, the update of the perturbation estimate is slower than in the SPE condition.
The RPE simulation also replicated the negative correlation between reward probability and the
trial-by-trial variability in hand via updating the policy variance (Figure 2.7C).
Figure 2.7. Simulation results. (A) SPE condition (20 simulation runs): Motor command generated from
random aiming between −15º and 15º plus the perturbation estimate of ˆ p . Black line: mean, gray area:
std (B) RPE condition (20 simulation runs): Motor command generated from the Gaussian policy and the
perturbation estimate of ˆ p . Black line: mean, gray area: std. Dissociation of motor command into the
28
policy mean m and the perturbation estimate ˆ p shows contribution from the two learning mechanisms.
Smoothed running average of reward and smoothed policy variance are shown. (C) Correlation between
reward probability and variability in motor command for RPE simulation (average across 100
simulations).
29
2.4. Discussion
We dissociated sensory and reward feedback to understand their individual roles in motor
adaptation. The SPE group had pure sensory feedback of visual cursor without externally
provided goal – target. This way, we eliminated source of internal / external reward information.
SPE group showed increased hand direction in catch, though they were never trained with the
goal-given task. The amount of catch angle approximately matched that of localization error,
indicating that the inverse model dominated motor command generation. This ‘learning without
goal’ paradigm demonstrated that prediction error alone could update the internal model in the
absence of goal-directed error. The RPE group, on the other hand, had only binary reward
feedback. The reason of providing binary reward was to minimize associated sensory or gradient
information based on proximity to a target when reward takes continuous value. The limited
access to sensory feedback stimulated active search when reward probability decreased, which
shows that the brain actively modulates variability in a motor policy (Pekny et al. 2015). Though
this policy update was a leading mechanism when reward feedback was isolated from sensory
feedback, we also observed the internal model updated perturbation estimation, though to a less
degree than SPE. We assumed that the internal model works as a background process: no matter
how the motor command is generated, the internal forward model always tries to predict the
consequences – either direct sensory or indirect reward consequence. Similarly, the internal
inverse model always overrides an aiming direction with the compensated perturbation
estimation (Taylor et al. 2014).
Though our study was largely motivated and shared some similar design with the previous study
on sensory and reward prediction errors (Izawa & Shadmehr 2011), we introduced important
differences that led to the main findings described above. First, we removed a target in the
sensory condition (SPE) so that we could evaluate contribution of pure sensory feedback from
target-associated reward feedback. Second, we minimized sensory information in the reward
condition (RPE) by providing binary feedback in a different location than on top of the target.
These first two efforts maximized dissociation of sensory and reward feedback in the experiment
design to evaluate individual contribution separately. Third, we traced dynamic change of overall
adaptation and forward model update independently from each other and online during the
30
course of adaptation. This tracing revealed underlying mechanisms of sensory error-based and
reward-based learning when environment changes dynamically.
Although the internal forward model and sensory prediction error as its updating signal have
been widely accepted as a primary mechanism of motor adaptation, it has not been clearly
verified whether prediction error alone could update forward model regardless of goal-directed
error. Mazzoni and Krakauer (Mazzoni & Krakauer 2006) demonstrated that subjects could
encode sensory prediction error to update forward model even when goal-directed error was
strategically set to be zero at the beginning of adaptation. In this study, the goal-directed error
then gradually increased because subjects kept trying to reach toward the pseudo-target even
after updated forward model shifted actual hand direction from the pseudo-target location.
However, existence of goal-directed error makes it difficult to confirm that prediction error alone
is sufficient and necessary to update the internal model. In addition, how the brain dissolves the
mismatch between goal-directed error and prediction error was unclear. In our design of SPE
group, we removed a target and all associated error and reward information with respect to it.
This directly supported the idea that the internal model could be updated independently of goal-
directed error. In other words, update in forward model is an autonomous and implicit process
independent of strategy or policy to achieve a certain goal. This explains how SPE subjects
showed increase in overall adaptation and perturbation estimation despite of all different random
reaching during training.
Many of the reaching-out task, including the current study, is practically one-dimensional in a
sense that subjects only care about angular direction. In this case, continuous or numerical
reward not only gives value of a specific movement, but also provides a gradient information to
increase reward. On the other hand, exploration is a hallmark of reinforcement learning in many
practical situations due to its high dimensionality. Thus, it is important to understand how the
brain modulates a motor policy to get reward when sensory information is insufficient to
calculate reward gradient. RPE subjects showed active tuning of trial-by-trial variability in their
training, which implies update of motor policy with variance change. We specifically adopted
the policy gradient method REINFORCE (Williams 1992) because it works on immediate
reward and does not require specific model of reward or value function. In another study, similar
design of binary reward also modulated active search in the case of reward vs. non-reward
31
(Pekny et al. 2015). In addition to supporting active search of this study, the current work
suggests this transition between exploitation and exploration may have intermediate steps,
theoretically described as policy parameter update.
Though policy update must the leading mechanism that drives motor command change in the
reward-dominant feedback, RPE showed also non-zero change of the internal model update.
Assuming the internal model operates as a background system, we can consider reward
prediction error as one of possible driving forces to update the internal model. Therefore, in
terms of the internal model update, sensory and reward prediction errors are only different in
their observation functions. In our RPE setting, actual reward was binary and thus highly non-
linear as a function of hand estimation. Therefore, the degree of the internal model update was
lower than that of SPE. However, if we provided continuous reward as a function of hand, then
the degree of update could have been similar to that of the sensory feedback. This partial update
of the internal model has not been found in the previous study with the similar reward condition
(Izawa & Shadmehr 2011). One possibility is that reward-driven memory of the internal model
may last very short time, as we measured localization during adaptation, while they measured
localization before after training. Another scenario is high inter-subject variability in these
reward condition due to the nature of binary reward and stochastic policy. These factors lead to
highly variable scenarios because adaptation depends on probability, in some cases a learner can
find solutions faster than other cases. Unlike SPE, reward prediction error does depend on a
specific motor command, and thus update of the internal model may depend on it.
In combining the two learning mechanisms to generate a motor command, we adopted a simple
summation framework as previous studies (Izawa & Shadmehr 2011; Taylor et al. 2014).
Compared to (Taylor et al. 2014), the current study can be considered as a counterpart to the
summation of learning mechanisms. In their study, they measured the aiming part by explicitly
asking subjects point the direction before reaching. They estimated internal model contribution
indirectly by subtracting the aiming direction from the actual reaching direction. In contrary, we
directly measured the internal model update by localization task, and obtained the aiming part
from the rest of the catch angle. While they considered the aiming part as cognitive strategy, we
do not necessarily take this component as strategic or cognitive. We hid any information on
perturbation and our perturbation paradigm was gradual and difficult to detect obviously. Around
32
75% of subjects reported that they did not even recognize existence of perturbation. Therefore,
we suggest the aiming can be either explicitly strategic or internally modulated for reward as in
the current study. What seems to be certain is that the aiming or policy part of the motor
command is clearly distinguished from the internal model part, and these two learning
mechanisms are driven by different sources.
33
Chapter 3. Multiple memories account for forgetting
in visuomotor adaptation
3.1. Introduction
Trial-by-trial dynamics of motor adaptation has been studied extensively to understand gain and
decay of motor memories. Typical experimental paradigms include initial learning, washout,
relearning, and error-clamp. In explaining behavioral data, people found a single memory model
is not sufficient, and devised a state-space model with multiple time constants (Smith et al.
2006). Though this model contains multiple states with different time constants, all of these
states are subject to update. In other words, memories are overwritten whenever the
environmental perturbation changes. Therefore, the model failed to replicate savings or faster
relearning after complete washout of learned memory (Zarahn et al. 2008). Therefore, this class
of models introduced meta-learning – increase of learning gain on the second learning – to
explain savings (Zarahn et al. 2008; Herzfeld et al. 2014). However, meta-learning has not been
investigated for its theoretical basis or neural substrate. In addition, many of these state space
models take endpoint error or goal-directed error as input feedback to the system update
equation. Thus, motor memories are expected to decay passively in error-clamp condition where
endpoint error is artificially clamped to be zero.
Another class of theory regarding motor memory acquisition include a mixture of expert system,
represented by MOSAIC model (Wolpert & Kawato 1998; Haruno et al. 2001). MOSAIC is a
special model in the general idea of mixture of experts (Jacobs et al. 1991; Jordan 1996;
Ghahramani & Wolpert 1997), which consists of individual modules or experts that act as
predictors or internal models (Jordan & Rumelhart 1992; Kawato 1999; Wolpert et al. 1995). In
this modular structure, each module makes a prediction on the current environment and is
assigned weight to update its prediction. In this process, those not relevant for the current
environment is prevented or protected from update. Upon environmental change, the central
controller switches among modules or states to make the best prediction. This paradigm has been
34
widely used to explain adaptation to multiple tasks without interference (Lee & Schweighofer
2009; Pekny et al. 2011). In this theoretical framework, savings occur due to recall of protected
memory (Berniker & Kording 2011).
Studies favored one of these two different classes to explain behavioral data. Recently, studies
found that performance did not decay, or decayed with various lags in error-clamp (Scheidt et al.
2000; Vaswani & Shadmehr 2013). Another study also revealed that savings is not universal
phenomenon, but subject to schedule in which perturbation is presented (Herzfeld et al. 2014). In
this study, we suggest a modified form of multiple modules to explain these puzzling behaviors.
Specifically, we distinguish two conditions where memory is overwritten, or where memory is
protected and switched. For this, we explicitly introduce concept of baseline model, the default
module that predicts no perturbation. Unlike previous theories of multiple experts that have a
given number of fully developed modules, we suggest that the learning system begins with only
one expert – baseline model, and one undeveloped model (“novice model”). Novice model has a
potential to become a new expert. Our hypothesis is that, if a newly introduced perturbation is
“small enough” to the domain of existing modules, then baseline model is overwritten to update
its estimation. In this scenario, memory is actually updated and erased every time environment
changes, and therefore transition between states is slow: no or little savings, long-lasting
aftereffect in washout, and passive decay in error-clamp. On the other hand, if a new perturbation
is “large enough”, then novice model is updated to a new expert that is specialized in predicting
the perturbation (“perturbation model”). In this scenario, the system has two experts and
subsequent environmental changes cause fast switching between these two modules: fast savings,
no or little aftereffect in washout, probabilistic state transition in error-clamp. Our theory
provides a criterion to distinguish large vs. small perturbation, and makes predictions on savings
and behavior in error-clamp for different experiment conditions.
35
3.2. Materials and methods
3.2.1. Experimental design
Forty-six volunteers participated in the study. All participants signed the informed consent
approved by the Institutional Review Board of the University of Southern California. Subjects
sat in front of the visuomotor task apparatus and performed center-out shooting tasks using a
digitizer pen on a tablet. The middle-layer mirror obscured subjects’ hands, and generated visual
illusion to form visual workspace at the bottom layer. A circular target of radius 3º appeared at a
pseudorandom location within 5º around center of an 60º arc that was 10 cm away from the
starting position. Subjects initiated shooting movement as soon as a target appeared and stopped
after crossing the arc line. A red dot representing cursor locating disappeared when the pen tip
moved farther than X cm from the starting position. When the pen tip crossed the arc, the red dot
was marked on the crossed-point and it remained there for 1 s. Subjects were encouraged to keep
movement duration within 300 ms, where movement duration was defined as time interval
between onset of a target and the arc-crossing time. The sign of “Move faster” was displayed
when movement duration exceeded 300 ms. In providing visual feedback of cursor location on
the arc, we applied three different conditions: no rotation, rotation of 10º or 20º , and error-clamp
(Figure 3.1A). The visuomotor perturbation rotated cursor position counterclockwise by a given
angle with respect to the starting position. We also added Gaussian noise of two different levels,
0.5º and 4.0º as trial-by-trial standard deviation, on top of a given rotation. This Gaussian noise
was also added to the baseline training of 80 un-rotated trials. Depending on the group
assignment (See below), each individual experienced either 0.5 or 4.0 noise level in the baseline.
In error-clamp, visual feedback was sampled from Gaussian distribution of mean 0º and standard
deviation 0.5º or 4.0, again depending on the group assignment. Note that visual feedback in
error-clamp was given independently of actual subject hand direction.
36
Figure 3.1. Experiment design. (A) Three states of visuomotor rotation tasks. Black lines indicate hand
direction (occluded) and red lines indicate visual feedback of cursor. Red curves around visual feedback
line indicate we added Gaussian noise to output. Left – baseline and washout, no rotation except small
Gaussian noise. Middle – visuomotor rotation of 20º (exampled as post-adaptation). Right – visual error-
clamp. Note that feedback location is independent of hand direction. (B) Three training schedules. All
groups experienced same perturbation schedules except its magnitude and noise. Black lines in learning
and washout blocks indicate actual rotation applied. All subjects in each group had exactly same rotation
from pseudo Gaussian noise sequence. Red plots in error-clamp indicates actual visual feedback provided.
We randomly assigned subjects into one of five different conditions (Figure 3.1B). Groups 1a
(n=11) and 1b had the exactly same learning and de-learning conditions before error-clamp:
rotation angle was 20º and noise level was 0.5º (large perturbation and small noise). All
individuals in groups 1a and 1b had the same sequence of visual feedback in error-clamp, except
two single “trigger trials” inserted at half and three quarters of error-clamp, respectively, for 1b
37
group. Trigger trials were simply 20º rotation trials, same as in the learning condition. Groups 2a
and 2b also had the same training schedules with each other: rotation angle 20º and noise level
4.0º (large perturbation and large noise). Like group 1b, group 2b also had two trigger trials at
the same location. All other trials in error-clamp gave the same sequence of visual feedback for
2a and 2b, too. Finally, group 3 had the same schedule as groups 1 and 2, but they had 10º
rotation and 0.5º noise level (small perturbation and small noise).
3.2.2. Computational model
We adopted the mixture of experts framework (Jordan & Jacobs 1994; Jacobs et al. 1991;
Wolpert & Kawato 1998; Haruno et al. 2001) with modifications to explain conditional savings
and stochastic decay. Figure 3.3 is a general diagram showing N experts or modules in the
system. We take each module as a perturbation estimator. At each discrete time step t , each
module makes a prediction on external perturbation before feedback is available. We formulate
this prediction as a form of likelihood function:
2
2
ˆ () 1
( | ) exp
2(S ) 2
tt
t i
i t t
i i
xx
l x m
S
(3.1)
where ( | )
t
i
l x m represent probability density of observing perturbation
t
x given a module
i
m
with mean estimation ˆ
t
i
x and uncertainty
t
i
S . We also assign a prior probability
i
to each
module such that
1
N
i
i
(3.2)
Priors represent probability that each module is correct in the absence of evidence or feedback.
Priors are formed from history of perturbation occurrence. Combining likelihood and prior, we
calculate posterior probability density:
( | ) ( | )
tt
i i i
p x m l x m (3.3)
After feedback of perturbation, the gating controller in the system (Figure 3.2) assigns a weight
to each module from Bayes’ theorem:
( | )
( | )
( | )
t
tt ii
ii N
t
jj
j
l x m
w p m x
l x m
(3.4)
38
, which gives a probability to select a module
i
m given perturbation feedback
t
x .
Figure 3.2. Diagram for mixture of experts. Note that only one model (estimator i in example) is updated
with gain K , while others maintain their perturbation estimation.
In applying equation (3.4) to a real experiment, subjects do not have direct access on exact
perturbation. Instead, they have to infer perturbation from sensory feedback. We assume the
following general relationship between motor command
t
u and visual feedback
t
v :
t t t
v u x (3.5)
where
t
x is perturbation (rotation) from distribution of
2
~ ( , )
t
x
x N x . The way a subject
perceives perturbation is to compare motor command and visual feedback:
t t t t
x v u x (3.6)
Therefore, in the presence of consistent perturbation (such as in learning or washout blocks),
subjects measure real perturbation with mean x and variance
2
x
. In error-clamp condition
where visual feedback is bound to be near target location
t
y , the perceived rotation depends on
motor command:
39
t t t t t
x v u y u (3.7)
Comparing equations (3.6) and (3.7), subjects “believe” that perturbation exists in error-clamp
with quantity of
tt
yu . Therefore, instead of using real perturbation as feedback in equation
(3.1) and (3.4), we replace it with perceived perturbation:
2
2
ˆ () 1
( | ) exp
2(S ) 2
tt
t i
i t t
i i
xx
l x m
S
(3.8)
( | )
( | )
( | )
t
tt ii
ii N
t
jj
j
l x m
w p m x
l x m
(3.9)
The net perturbation prediction for next trial is given by the weighted average of modular
estimations:
1
ˆ ˆ
N
t t t
jj
j
x w x
(3.10)
From this prediction, the inverse model generates a next motor command to compensate the
predicted perturbation:
1 1 1
ˆ
t t t
u y x
(3.11)
where
1 t
y
is a target location at next trial. Actual hand direction is corrupted by motor noise:
, ~ (0, )
t t t t
u u u
h u N (3.12)
In addition, the gating controller selects one out of N modules to update its estimate and
uncertainty, while preventing other modules from being overwritten. The probability of specific
model selection is equal to the weight the model was assigned with:
1
1
ˆ ˆ ˆ ( ), if chosen
ˆ ˆ , otherwise
t t t t
i i i
tt
ii
x ax b x x
x ax
(3.13)
where 01 a is a retention parameter and b is a learning gain. Here, we add an assumption
that estimation uncertainty is also a subject of update when the model is chosen:
11
min
1
, if chosen and
, otherwise
t t t
i i i
tt
ii
k
(3.14)
40
where 01 k determines speed of uncertainty reduction. Such uncertainty reduction is a
common process in Bayesian state estimators such as Kalman filter, or in the uncertainty-based
competition system among multiple modules (Daw et al. 2005).
3.2.3. Simulation
We simulated the experimental condition with a reduced number of learning and washout blocks
followed by error-clamp (learning→washout→relearning→error-clamp paradigm). We tuned
parameters x (mean perturbation) and
x
(noise in perturbation) to simulate different
experimental groups 1a: large perturbation and small noise ( 1.0, 0.025
x
x ), 2a: large
perturbation and large noise ( 1.0, 0.2
x
x ), and 3: small perturbation and small noise (
0.5 0.025
x
x ). Mean and uncertainty parameters are normalized values, and motor
command from simulation was multiplied by a factor of 20 to match experimental conditions.
We also put two single learning trials at half and third quarters of error-clamp block to simulate
trigger trials (groups 1b and 2b). Parameters of group 1b were equal to those of group 1a, and
parameters of group 2b were equal to those of group 2a.
Unlike traditional mixture of expert models, we did not assume number of a-priori experts in the
system. Instead, we assumed our learner is naï ve and never experienced any perturbation before.
Thus, our learning system began with only one expert, which we call “baseline model” or
module 0, that assumes no external perturbation initially with small uncertainty (
11
00
ˆ 0.0, 0.20 xS ). Instead of having other developed experts from beginning, we assigned
one undeveloped module, which we call “novice model” that has large uncertainty in its
estimation (
11
00
ˆ 0.0, 0.45 xS ). This combination of one expert and one novice predicts two
completely different scenarios of how each module is updated after perturbation (Figure 3.3).
41
Figure 3.3. Two different scenarios of simulation. Curves represent posterior probability density of each
module. Top row: Baseline state before onset of perturbation. Baseline model has narrow distribution
around 0º , while novice model has wide distribution. Vertical dashed lines indicate decision boundaries of
model selection. Middle row: When a new perturbation (black vertical line) is large, and thus novice
model has higher probability density for that value, novice model is selected and updated to a new expert,
“perturbation model”. This is a scenario of a two-model learner. Bottom row: When a new perturbation is
small, baseline model is updated to shift its mean estimation around the new small perturbation. This is a
case of one-model learner.
If degree of perturbation is “large enough” that is far from the domain of baseline model,
likelihood of novice model is higher than that of baseline model due to novice model’s large
uncertainty. Even after combination with priors that usually assign higher value on baseline
model, there exists a decision threshold that posterior of novice model is equal or higher than
that of baseline model beyond the threshold. In this case, the gating controller selects novice
model to be updated exclusively while preventing baseline model from changing its estimation.
42
As a result, novice model grows to become a new expert in the system, with its mean estimation
close to external perturbation and uncertainty as small as baseline model. We call this updated
novice model as “perturbation model” as it predicts external perturbation. Consequently, the
system now contains two expert modules, baseline model and perturbation model. After full
update of perturbation model from initial learning block, following rapid change in motor
command is explained as switching between these two modules. Thus, this scenario of two-
model learner predicts savings due to rapid switching to perturbation model, short-lasting
aftereffect due to rapid switching to baseline model, and abrupt jumps in motor command in
error-clamp.
On the other hand, if degree of perturbation is “small enough” that lays within the domain of
baseline model, the gating controller selects baseline model to be updated, while preventing
novice model from being updated. Now baseline model actually changes its mean estimation
close to the external perturbation. Therefore, the system has a practically only one expert module
– baseline model, and following washout and relearning blocks affect baseline model directly:
i.e., baseline model is updated every time when perturbation changes. Therefore, this scenario of
one-model learner predicts no or little savings due to no switching, relatively long-lasting
aftereffect because of actual washout process, and gradual decay in error-clamp.
Note that switching in the two-model learner scenario is bidirectional in principle. However, we
assigned higher prior (0.9) to baseline model and lower prior (0.1) to perturbation model. This is
supported by observation that people choose more familiar module when environment is
unpredictable or corrupted by noise. With these priors, we predicted direction of switching in
error-clamp is mostly from learned direction (20º ) to baseline direction (0º ), and probability of
switching increase with high noise level of sensory feedback and motor command. Therefore, we
predict faster drop to baseline with the large noise group (2a) than the small noise groups (1a).
However, large noise can make occasional “jumping back” to the perturbation state. Trigger
trials were inserted to make this switching back to perturbation happen with higher probability. If
a learner developed two models, and perturbation model has not been decayed despite of
returning to baseline in error-clamp, then a single perturbation trial may act as a trigger to make
the system switch back to the perturbation model.
43
3.3. Results
3.3.1. Simulation example
Figure 3.4 presents single case of simulation for the first three groups: 1a (20º / 0.5º ), 2a (20º /
4.0º ), and 3 (10º / 0.5º ). The first column (Figure 3.4A) shows motor command for each group.
Simulations with large perturbation exhibits little or no aftereffect in washout, rapid rise in
relearning, and sudden jumps in error-clamp. Note that first drop in error-clamp happened earlier
in the high noise condition than the low noise condition. In addition, the high noise condition
showed autonomous jumping back to 20º right after the first drop. On the other hand, simulation
with small perturbation demonstrated long-lasting aftereffect, no or little savings, and gradual
decay in error-clamp. The second column (Figure 3.4B) displays mean and uncertainty of
perturbation estimation for the two modules. For the large perturbation conditions, novice model
was selected at the initial learning block and its mean estimate increased to the level of external
perturbation. Both models maintained their estimations for the rest of training schedules except
slow time-dependent decay of perturbation model. The small perturbation condition shows a
very different scenario in that baseline model updates its estimation up and down every time
perturbation or washout is introduced. Novice model remains unchanged in this scenario, thus
there occurs no switching between multiple modules. The third column (Figure 3.4C) visualizes
weight map of perturbation model. Weight is function of external perturbation in learning and
washout blocks, and of motor command in error-clamp: See equations (3.6), (3.7), (3.8), and
(3.9). When a new perturbation is larger than the decision boundary (green lines), a novice or
perturbation model is selected and updated. In error-clamp block, motor command replaces role
of perturbation. Therefore, transition occurs when motor command crosses the boundary. Small
noise makes such a crossing probability low, and thus enables staying at perturbation state long.
On the other hand, high noise increases probability of boundary crossing by chance, thus state
transition occurs relatively early in error-clamp. However, this large noise can also increases
chance of crossing back to perturbation state. Asymmetry in direction of switching was realized
by assigning higher prior to baseline model. In the small perturbation condition, baseline model
shifts its mean estimation, and thus weight map also follows the shift. This makes baseline model
keep selected throughout the entire schedule.
44
Figure 3.4. Simulation result of each experimental condition. Top row – group 1a, middle row – group
2a, bottom row – group 3. (A) Simulated hand direction represented by motor command u . Arrows 1 and
2 point slow and gradual state transition of group 3, compared to groups with large perturbations. Arrow 4
indicates the moment of abrupt drop of performance in group 1a. Arrow 5 indicates frequent state
transition in group 2a. (B) Perturbation estimation of each module. Blue – mean and uncertainty of
baseline model. Red – mean and uncertainty of (novice) perturbation model. (C) Weight map of
perturbation model. Areas in red color favors perturbation model, whereas area in blue colors favor
baseline model. Green lines indicate border of model selection.
3.3.2. Savings and aftereffect
Figure 3.5A shows group-averaged data of each experiment groups: 1a (20º / 0.5º ), 2a (20º /
4.0º ), and 3 (10º / 0.5º ). We fit an exponential curve of a single time constant within each
learning block (LB1, LB2, LB3, and LB4) and each washout block (WB1, WB2, and WB3) on
individual basis. Figure 3.5B shows resulting mean time constants within each block across
subjects. ANOVA and following multiple comparison revealed that time constants of the second,
45
third, and fourth learning blocks were significantly smaller than that of the initial learning block
for the large-perturbation-small-noise group (1a). Group 2a, the large-perturbation-large-noise
group, also showed reduced time constant in the fourth learning block compared to the initial
learning block. On the other hand, group 3 did not show any change in time constants across all
four learning blocks. This data shows that savings or faster relearning occurred only to large
perturbation conditions. Especially, the effect was bigger in the small noise condition of 1a. The
result supports the hypothesis that large perturbation establishes a newly updated perturbation
model, and fast switching from baseline to perturbation model enables savings after the initial
learning. On contrast, small perturbation condition had similar time constants across all four
learning blocks, indicating that memory of baseline model had to be reformed at every learning
block. The same logic can be applied to explain smaller time constants in washout blocks of
group 1a compared to larger time constants of group 3. Small time constant in washout means
short-lasting aftereffect, implying that washout in this condition is also a state transition rather
than forgetting of memory. Large time constant in washout indicates long-lasting aftereffect,
suggesting that updated memory is actually being washed out.
Figure 3.5. Group-averaged plot of hand direction and time constant analysis. (A) Group-averaged plot.
Red lines indicate perturbation angles. Gray shades around black lines represent plus and minus one
46
standard deviation. LB – learning block. WB – washout block. (B) Time constants estimated within each
learning and washout blocks. Time constants were estimated individually. ***:
3
10 p
, *: 0.05 p
3.3.3. Decay in error-clamp
Figure 3.6 isolates error-clamp trials from data and shows both group-averaged and individual
hand direction. Though average data show trend of continuous decay, individual data exhibit
diverse patterns. Subjects 1, 5, and 10 of group 1a decayed very little, not even crossing the half
angle (10º ) until the end. Subjects 2, 7, and 8 show sudden drop with certain lags since error-
clamp began. Others demonstrate mixture of continuous decay and abrupt change or fluctuation.
Overall, subjects in group 1a stayed near 20º for long time, and once they changed hand
direction, they had tendency to jump abruptly to near 0º , staying less between the two angles.
Figure 3.7A summarizes this trend by double peaks in hand distribution of group 1a (red curve).
Group 2a shows faster decay in early error-clamp trials in the mean plot (Figure 3.6B). However,
individual plots also display diverse patterns. Subjects 1, 5, 7, and 8 had spontaneous jumping
back to 20º , resulting in oscillatory patterns. Summarizing, group 2a with high noise also
revealed characteristics of sudden state transition, but such transition occurred more quickly than
group 1a with low noise. Also, it increased chance of returning to 20º as well due to high noise
level. Figure 3.7A shows hand distribution of this group (blue curve): they stayed much less in
20º than group 1a because of early transition.
Group 3 (small perturbation) had individual data with mixed trend of gradual decay and
oscillation. Because of its small perturbation, starting angle in error-clamp was around 10º , and
thus trial-by-trial noise made it difficult to distinguish state transition from simple noise. Hand
distribution in Figure 3.7A has a single peak around 8º (green curve), indicating that there was
no abrupt change. Furthermore, the distribution tells that decay was gradual and slow.
Considering both large perturbation groups 1a and 2a have at least one peak around 0º , we can
estimate actual memory decay in error-clamp takes more than 120 trials to be completely
decayed.
Figure 3.7B shows the corresponding hand distributions from simulation of each experiment
condition. Compared to data, simulations show relatively narrower distributions, but they still
replicate the general pattern of experimental data.
47
48
Figure 3.6. Mean and individual data in error-clamp. Trial numbers count from the onset of error-clamp.
(A) Group 1a (20º / 0.5º ): Blue arrows indicate no or little decay until the end of error-clamp. Red arrows
point sudden drop of performance. (B) Group 2a (20º / 4.0º ): Red arrows point oscillatory behavior or
sudden change. (C) Group 3 (10º / 0.5º ). Note the scale is half compare to the large perturbation groups.
Noisy fluctuation is difficult to be distinguished from actual performance change.
Figure 3.7. Distribution of hand direction in (A) data and (B) simulation. Simulation result came from 16
examples for each condition.
3.3.4. Triggers in error-clamp
Figure 3.8 illustrates simulation examples on the effect of trigger trials embedded in error-clamp.
Trigger 1 in the first row (simulating 1b with low noise) did not cause any change because hand
was still near 20º . On the other hand, trigger 2 when hand was near 0º caused sudden jump back
to 20º and hand direction was sustained at that level for subsequent trials. Triggers affected
similar sudden jumps for the high noise simulation, but the effect lasted only shortly because of
high noise. Weight maps show trigger forced hand direction back to the domain of perturbation
model, and the system switched to the learned state.
49
Figure 3.8. Simulation examples for trigger conditions in error-clamp. Top-row: simulation of group 1b
(20º / 0.5º ). Bottom-row: simulation of group 2b (20º / 4.0º).
Figure 3.9 shows group-averaged and individual data in error-clamp of subjects in trigger
conditions (1b and 2b). Though it looks as if trigger effect was only partial and decayed soon
after on average plot, individual data reveals that this was actually mixed effect of all or none.
Subjects 5, 6, and 8 in group 1b responded fully to trigger trials whereas subjects 2, 3, 7, and 9
showed no or partial reaction. Note that the latter subjects started decay at lower than 20º in
error-clamp, suggesting they are partial leaners who might not have fully developed two separate
modules. This pattern was similar to subjects in group 2b: subjects 1, 2, 5, 6, and 9 showed full
reaction to return to 20º if hand was not already in that level, whereas subjects 3, 4, and 6
showed no response to triggers. Again, the latter subjects started error-clamp with smaller angle
than 20º .
50
Figure 3.9. Mean and individual data in error-clamp with two trigger trials. Two vertical red lines
indicate trigger trials. (A) Group 1b (20º / 0.5º ): Red arrows point subjects who reacted to trigger trials,
increase to 20º right after triggers. (B) Group 2b (20º / 4.0º ): Red arrows point subjects who reacted to
trigger trials, increase to 20º right after triggers.
51
In calculating correlation in Figure 3.10A, we first integrated all subjects in 1b and 2b, then
divided them into two conditions: full learner vs. partial learner. Full learner was defined by their
asymptotic adaptation angle at the end of the last learning block (right before error-clamp)
exceeded or equal to 80% of the full angle (20º ), and partial learner was those who did not meet
this criterion. We suspected that only full learners might have developed new perturbation model
and have two expert modules, whereas partial learners might have updated baseline model, or not
fully updated perturbation model. Figure 3.10B shows simulation results from this hypothesis.
We tuned parameters to replicate two conditions of full learners vs. partial learners represented
as two-model learners vs. one-model learners, respectively. For two-model learners, triggers
acted as cue signal to switch back to perturbation model if hand was already near 0º by the time
of triggers. For one-model learners, trigger was only a single learning trial which made no or
little change.
Figure 3.10. Regression analysis on effect of triggers in (A) data and (B) simulation. X-axis represents
mean hand direction of 5 trials before each trigger, and Y-axis represents increase after trigger, calculated
as mean hand direction of 5 trials after each trigger. Slope of 1 indicates returning to 20º wherever hand
direction was before triggers.
52
3.4. Discussion
We introduced the modular decomposition of experts to explain puzzling experimental data in
washout, savings, and error-clamp. We explained behavioral data of hand direction (thus, motor
command) from weighted average of each modular estimation. In this framework, change in
behavior can be due to i) actual update and decay of modular memory, or ii) switching among
different modules, and iii) combination of the two. While most previous studies adopted modular
experts to explain adaptation to multiple tasks (Wolpert & Kawato 1998; Haruno et al. 2001;
Ghahramani & Wolpert 1997; Lee & Schweighofer 2009; Schweighofer et al. 2011), we applied
the theory to explain learning and forgetting of a single task in this study. Important feature in
our approach is explicit introduction of baseline model to a modular set. With baseline model,
now performance decrease can be understood either real forgetting or switching back to baseline
model, while leaving learned model protected. We also assumed existence of novice model, an
undeveloped module until perturbation comes in. Size of perturbation determines whether novice
model is developed to a new expert, or baseline model updates its estimation while leaving
novice model undeveloped.
In explaining conditional savings, we distinguished two scenarios of a two-model learner (novice
model updated to a new expert, perturbation model) and a one-model learner (baseline model
updated). We showed experimentally that large perturbation activated fast switching, in terms of
no or little aftereffect in washout, and savings in subsequent learning blocks. Our simulation
selected novice model when perturbation is large, due to its non-specificity, and the system
switched between baseline model and perturbation model after the first learning block. On the
other hand, if perturbation was small, experimental result showed that long-lasting aftereffect in
washout and no or little savings in subsequent learning blocks. This corresponded to our one-
model learner in simulation because baseline model was selected, updated, and erased every time
environment changed.
Distinction between two-model learner and one-model learner also made different predictions on
behavior in error-clamp. If a learner developed a new expert and thus has two models, then the
same kind of switching that occurred during learning and washout blocks can occur during error-
53
clamp. On the other hand, if a learner had only one expert, baseline model, then passive gradual
decay is expected during error-clamp. Here, we argue that error-clamp condition can give
learning or switching cues to expert modules. Assuming subjects cannot explicitly distinguish
nature of error-clamp trials (i.e., clamped error independent of their motor command), they can
perceive non-zero rotation depending on their reaching direction. This gives feedback to each
module, and thus modules encode different prediction errors in error-clamp. Mazzoni &
Krakauer (2006) showed evidence of non-zero prediction error and update of internal model
even when goal-directed error was zero at the beginning of adaptation. Considering perceived
rotation is equal to (negative) hand direction in error-clamp, hand direction replaces role of
external rotation in learning and washout blocks. Increase in feedback noise increases chance of
switching, and thus causes faster drop to baseline, but also increases chance of spontaneous
switching back to the learned state.
Although we did observe spontaneous return from some of individual data, the probability of
such event was low, possibly due to higher prior assigned to baseline model. In order to verify
the hypothesis that switching can occur in error-clamp, we devised a new condition of two single
trigger trials inserted at half and three quarters of error-clamp. If it was passive decay, trigger
trials could make only small and transient effect. However, data showed that quick jump and
sustained state right after this single learning trial. The effect was not global, though. Some
subjects did not react to trigger trials, and we found that these subjects can be considered as
partial learners and theoretically one-model learners.
In our simulation, we implemented very slow passive decay for all newly developed modules.
We assumed this is due to time-dependent decay of motor memory, and should be distinguished
from fast switching to baseline model. Along with higher prior for baseline model, this slow
passive decay is one of reasons why we observe general trend of performance decrease in error-
clamp, even for two-model learners. Therefore, our theory does not exclude passive forgetting.
Instead, we claim that change in performance can be due to two different sources, switching or
passive decay. In reality, these effects can appear intermixed, as can be seen diversity of
individual data.
54
Our implementation of modular experts is different in some aspects from previous studies.
MOSAIC model and its application (Wolpert & Kawato 1998; Haruno et al. 2001) shares a very
similar structure with our theory. However, MOASIC assumes existence of already developed N
experts a priori. On the other hand, we hypothesized our brain is more conservative, trying to
minimize number of experts. Naï ve subjects who never experienced perturbation before are not
likely to have prepared non-baseline modules in the brain’s learning system. Still, we know
human brain is flexible enough to be adapted to new environment, while preserving previous
internal models. For this reason, we suggested novice model that has potential to become a new
expert module if necessary. This “growing expert” system enabled distinction between two-
model learners and one-model learners corresponding to small perturbation and large
perturbation, respectively. More similarly to our approach, Berniker & Kording (2011) suggested
body-model and world-model to explain savings. In their theory, body-model is always relevant
and subject to update all the time. World-model, on the other hand, is only relevant that source of
perturbation is external, and thus is protected from unlearning in washout. In our theory, all
modules are equivalent in structure, and there is no special distinction to baseline model, except
that baseline model has higher prior. Relevance in our modules is determined purely from
posterior probability, where it is calculated from a combination of evidence (prediction error)
and prior beliefs. Therefore, our model can distinguish two scenarios of baseline model
(analogous to body model) be updated or not.
In this study, we only focused on error-based learning modules for simplicity. However, recent
studies showed that there may exist more than a single kind of learning mechanisms: learning
from reward signal (Izawa & Shadmehr 2011; Pekny et al. 2015; Galea et al. 2015; Nikooyan &
Ahmed 2015; Shmuelof et al. 2012), cognitive strategy (Taylor et al. 2014; Taylor & Ivry 2011),
and multiple time-constant models (Smith et al. 2006; Kording et al. 2007; Lee & Schweighofer
2009). Although we did not explicitly take into account of these multiple learning mechanisms in
the current study, potential correspondence still exists. As from the study that showed reward
feedback can form a new baseline (Shmuelof et al. 2012), repeated reward feedback in the
absence of sensory feedback may enhance prior for a newly developed model. In this case, a
third perturbation in error-clamp is likely to land at the level of the rewarded model. For
cognitive strategy, we can make analogy of switching process in our theory is a cognitive
55
decision making process, while leaving more internal part of adaptation represented by each
module. Similarly, update of fast and slow process can correspond to switching and actual update
of our modules.
Finally, the current theory has been developed and applied to one of the simplest adaptation
paradigm, visuomotor adaptation. In this task, a scalar number, such as angle, can parameterize
perturbation and state of each module easily. However, such a simple parameterization of
adaptation with more complicated tasks, such as velocity-dependent force field adaptation, may
not be accomplished easily, in terms of both theoretical models and real brain learning system. It
has been known that adaptation to two different tasks is very difficult in force field adaptation.
However, we expect with extensive training, the brain can eventually distinguish two different
motor experts. After all, this is how brain can switch among similar but parametrically different
tasks, such as a commercial truck driver can drive both a big truck and a compact sedan without
suffering aftereffects.
56
Chapter 4. Motor adaptation in high-dimensional
redundant joint-space
4.1. Introduction
Force-field adaptation using robotic manipulanda has been a popular experimental paradigm to
study human motor adaptation in the presence of external perturbation (Shadmehr & Mussa-
Ivaldi 1994; Gandolfo et al. 1996; Scheidt et al. 2001). Typically, subjects learn to compensate
applied velocity-dependent force field perpendicular to movement direction. Although this
paradigm has contributed significantly to understanding mechanism and dynamics of motor
adaptation, most of these studies confined reaching movement in 2-dimensional task space. The
primary reason was that most robot manipulanda had two planar joints corresponding to shoulder
and elbow movement, and it made analysis simple. However, such a confined movement in 2D
with only two joints has limitation to understand human motor adaptation because human
subjects move in 3D space with more DOF joints than three, resulting in motor redundancy
(Schaal & Schweighofer 2005; Haith & Krakauer 2012). Redundancy at joint and muscle levels
can be challenging problems for a controller, but studies have found that human brain may
exploit redundancy to achieve motor goal with flexibility and stability (Cusumano & Cesari 2006;
Latash et al. 2002; Schöner & Scholz 2007). This idea has been theoretically formalized as
uncontrolled manifolds or UCM (Schöner 1995; Scholz & Schöner 1999).
Understanding motor adaptation at redundant system is also critical for rehabilitation. Patients
post-stroke often have reaching deficit due to abnormal synergy patterns such as the flexor
synergy (coupling of elbow extension with shoulder abduction; Lum et al. 2003). Abnormal
synergies cause limited movement range, high-energy expenditure, faster fatigue and increased
risk of injury. These abnormal synergies may have been developed during recovery stage of
stroke in order to meet functional performance criteria at the cost of unnatural compensatory
movements (e.g., trunk movement; Huang & Krakauer 2009). Thus, in order to restore normal
synergies, it is important to train in joint space as well as in task space. A recent study using an
exoskeleton applied channeling “walls” to either end-effector only or to each joint (Brokaw et al.
2013). The result showed that training with joint space resulted in better performance in free
57
movement and inter-joint coordination. Another study showed that the desired synergy could be
learned by imposing a specific joint coordination pattern which was analyzed from therapists’
guiding movement onto patients (Crocher et al. 2012). Training schedule is also an important
factor. Kluzik and colleagues found that transition from end-effector training to free reaching
was better with gradually changing schedule of perturbation rather than abrupt change (Kluzik et
al. 2008). In another study, authors concluded that “Error patterns within a person's natural range
might be experienced as endogenous"(Torres-Oviedo & Bastian 2012).
In this study, we further investigated relationship between adaptation in task-space and joint-
space, after the original study that used a similar experimental paradigm (Mistry et al. 2005). We
used the multi-joint exoskeleton robot to generate velocity-dependent torque on elbow joint, and
recorded kinematic data from each joint. We analyzed trial-by-trial dynamics of movement
duration and similarity of trajectory to baseline in two different conditions: abrupt and gradual
increase of torque gain. We also investigated structure of variability in trajectories, and analyzed
how these structural components change with learning, washout, and relearning. We
hypothesized that there would be larger variability in joint-space than task-space to exploit
redundancy, so that performance criterion (movement duration) in task-space can be achieved
quickly in ever-changing environmental perturbation.
4.2. Materials and methods
4.2.1. Exoskeleton and control law
The experimental apparatus is a seven DOF exoskeleton arm actuated hydraulically (Sarcos
Master Arm, Sarcos Inc.). Each exoskeleton joint is mounted with a hydraulic actuator that can
apply torque individually and a potentiometer that can read joint angle with sampling frequency
of 960 Hz. The seven joints of the exoskeleton was designed to match major seven joints of the
human arm (Figure 4.1). There are three joints corresponding to shoulder movements: shoulder-
flexion-extension (SFE), shoulder-abduction-adduction (SAA), and humeral rotation (HR); a
single elbow joint: elbow-flexion-extension (EB); and three wrist joints: wrist rotation (WR),
wrist-flexion-extension (WFE), and wrist-abduction-adduction (WAA). The most proximal joint,
58
SFE, is mounted to a height-adjustable platform and the user wields the exoskeleton by holding a
handle at the most distal part and by strapping the right forearm near elbow. The shoulder
remains unconstrained, but the system places the center of human shoulder joint coincide with
three rotation axes of the exoskeleton shoulder joints.
Figure 4.1. Sarcos Master Arm. The picture locates approximate joint locations. WAA is hidden from the
picture.
The user controls the exoskeleton by holding a trigger at the handle, which activates control law
of inertia and gravity compensation. The purpose is to minimize burden of moving with the
exoskeleton such as inertial, centrifugal, Coriolis, and gravitational forces. We apply the
following control law after (Mistry et al. 2005):
( ) ( , ) ( ) ( )
DP
M C G K K
D
u q q q q q q q q (4.1)
where q is a vector of the current joint angle,
D
q is a filtered desired position, () M q is the
estimated inertia matrix, ( , ) C qq denotes the estimated centrifugal and Coriolis forces, () G q
denotes the estimated gravitational force,
D
K is a diagonal matrix of small damping gains, and
P
K is a diagonal matrix of position gain.
59
4.2.2. Experiment design
Ten young and healthy volunteers (6 males / 4 females) participated in the experiment. Subjects
visited for two consecutive days of experiment. On the first day, subjects experienced the
familiarization session followed by the baseline training (Figure 4.2A). The familiarization
session was designed to let subjects be experienced with moving under the exoskeleton dynamics.
Despite of the control law of compensation – equation (1), parameter estimation and dynamic
model of the exoskeleton is imperfect, and thus can cause small drifting movement. The
familiarization session consisted of three blocks. In the first block, subjects were asked to hold
their hand still while pulling the activation trigger. In the second block, subjects were allowed to
execute free movement for a given amount of time. In the last block, subjects were asked to
make directed movement in X (left-right), Y (front-back), and Z (up-down) directions.
The baseline training consisted of 80 trials of a point-to-point reaching task (Figures 4.2B and
4.2C). The exoskeleton robot brought subjects’ hand to the starting position (slightly above a
shoulder level, close to the right shoulder) after each trial, and generated the ready signal
(frequent beep sound). Subjects then pulled the activation trigger and waited for the go signal
(single beep sound). Upon hearing the go signal, subjects reached toward a physical target set up
in front of their torso, close to the navel level. We calibrated the position of the real target
(defined as a sphere in virtual 3D space) to be placed on top of the physical target (a ball with a
similar radius). Although we did not specify movement trajectory, we asked subjects to maintain
movement duration (time from go signal to first contact of the virtual target surface) between
300 ms and 1,000 ms. We provided oral feedback to subjects after each trial: “good” for
movement duration within the criterion, “slow” when movement duration exceeded 1,000 ms,
and “fast” when movement duration was shorter than 300 ms.
On the second day, subjects were exposed to velocity-dependent force field, defined as:
EB SFE SAA
()
v
u k q q (4.2)
where
EB
u indicates added torque to the elbow joint,
v
k is gain,
SFE
q is joint velocity of SFE,
and
SAA
q is joint velocity of SAA. Subjects were randomly assigned into one of two groups with
different gain increase schedules. In the abrupt group (N=5, 2F / 3M), gain increased from 0.0 to
60
4.9 abruptly at trial 41, maintained the value for next 80 trials, suddenly dropped to and
maintained at 0.0 for next 60 trials, and finally increased to 4.9 again suddenly for the remained
40 trials (Figure 4.2C). The gradual group (N=5, 2F / 3M) increased initial gain from 0.0 to 4.9
gradually over 60 trials, and then maintained the value for next 20 trials. The rest of the gain
schedule was identical to that of the abrupt group.
Figure 4.2. Experiment design. (A) First-day schedule. (B) Second-day schedule. (C) Perturbation
schedule of two groups: Black line – abrupt group. Red line – gradual group.
61
4.2.3. Data analysis
The SL simulation and real-time control software package (Stefan Schaal; Computational
Learning & Motor Control Lab, USC) recorded joint angles and velocities of all 7 joints with
sampling frequency of 480 Hz. The software also calculated endpoint (“hand”) position and
velocity in XYZ-coordinate, and Jacobian elements from the embedded geometric model of the
exoskeleton.
We examined raw data carefully to remove some non-valid trials. Non-valid trials had either all 0
values, or had one or more “jumping” values that displacement in one discrete time step
(1/ 480 2 ms ) was more than 100 mm, where typical displacement was in mm scale. Our basic
approach of analysis was to take an entire trajectory as a single entity, and to extract a certain
scalar quantity that defines characteristics of its shape. The other approach we added was
functional data analysis that investigates trial-by-trial variability in change of trajectory.
In defining a “trajectory”, we first took time series of X, Y, Z, and each of join coordinates cut
out at 1.5 s. Then we reconstructed 3D curvatures in hand-space (task space) and 7D curvatures
in joint-space. In practice, we reduced number of joints analyzed from 7 to 3 (4) for visualization
purpose and for paying attention to individual joint contribution. For this purpose, we analyzed
Jacobian elements to determine each joint’s relative contribution of endpoint displacement.
Jacobian elements are given as:
1 2 3 4 5 6 7
1 2 3 4 5 6 7
1 2 3 4 5 6 7
X X X X X X X
Y Y Y Y Y Y Y
Z Z Z Z Z Z Z
J J J J J J J
J J J J J J J J
J J J J J J J
(4.3)
Assuming all joints freeze except i -th one in a certain configuration (as in virtual displacement),
the incremental displacement of endpoint is determined from:
2 2 2
| | | | | |
Xi Yi Zi i i i
d J J J dq J dq r (4.4)
We calculated Jacobian conversion factor
i
J for each joint along trajectories, and evaluated their
overall magnitude and change to determine top 3 or 4 major joints contributing to endpoint
displacement, and discarded other joints from further analyses.
62
As a metric to define similarity to a given trajectory, we adopted the Procrustes analysis. The
Procrustes analysis matches two different curves as similar to each other by a set of linear
transformations: shift, scaling, reflection, and rotation. Let us assume we want to register a curve
to the reference curve
0
, where each curve consists of the same number of landmarks. We
transform the original curve to:
' b R c (4.5)
where b is a scaling factor, R is an orthogonal rotation and reflection matrix, and c is a
constant matrix for shifting points. Procrustes analysis tries to find parameters that minimize the
distance between the transformed curve and the reference curve:
02
1
( ' )
n
ii
i
(4.6)
where the index i denotes landmarks within each curve, and n is the number of total landmark
points. We repeated this process to trajectories in baseline of the second day (trials 11 to 30) until
we find the mean trajectory from which the summation of distance in equation (6) is minimum
for all transformed trajectories in baseline. Then we calculated the distance from the mean
trajectory for all trajectories in the second day.
In order to investigate structure in trial-by-trial variability, we introduced functional principal
component analysis (FPCA). FPCA is different from ordinary PCA in that FPCA takes an entire
trajectory as a single entity and explains variance around the mean trajectory, rather than
variance along each trajectory. The first principal component
1
of a set of trajectories () t x
maximizes variance:
1
Var ( ) ( )
i
t x t dt
(4.7)
where ( ) ( )
i
x t t x is a single trajectory in the set. The second principal component results in the
second largest variance while being orthonormal to the first component, and so on. For each
trajectory and each principal component, we calculate principal component score:
( ) ( ) ( )
ij j i
f t x t x t dt
(4.8)
63
where () xt is the mean trajectory. Score gives estimation on how much a specific principal
component contributes to deviation of a single trajectory from the mean trajectory. From these
principal components and their corresponding scores, we can reconstruct the original trajectory
as:
1
( ) ( ) ( )
i ij j
j
x t x t f t
(4.9)
Though the summation goes to infinity, we practically need first a few principal components to
explain around over 90~99% of variance. In applying FPCA to our data, we first smoothed each
hand and joint velocity by Fourier bases, and calculated first four principal components in time-
series. Then, we plotted the first four principal scores as a function of trials to observe how
perturbation schedule might affect variability.
64
4.3. Results
4.3.1. Movement duration
As the most direct measurement of performance, we first plotted median movement duration of
each group in Figure 4.3. As expected, the abrupt group showed initial rapid rise of movement
duration at the onset of perturbation. Over trials, movement duration decreased gradually and
reached at the asymptotic value. On the second learning block, movement duration raised again,
but then decreased faster than the first block. This indicates savings and sustained internal model
to reduce movement duration in the presence of external perturbation. Specifically, the level of
movement duration at last 20 trials of first learning block was similar to that of the initial 20
trials of the second block (mean of median: last 20 trials of the first learning block =
0.85 0.09 s , first 20 trials of the second learning block = 0.85 0.14 s ). These were marginally
and significantly smaller than movement duration of first 20 trials in the first learning block
( 0.06, 0.03 pp , respectively) in multiple comparison.
On the other hand in the gradual group, movement duration was maintained at baseline level
initially, then increased sharply, and decreased again in the first learning. The asymptotic value
was similar to that of the abrupt group (mean of median = 0.84 0.10 s ). Interestingly, both
groups had smaller asymptotic value of the second learning block than those of the first block
(mean of median: abrupt group = 0.80 0.12 s , gradual group = 0.80 0.05 s ). A sharp rise and
decrease of movement duration in the middle of the first learning block implies that there exists a
certain threshold, up to which the original internal model can compensate, but not beyond.
65
Figure 4.3. Movement duration, defined as time interval between the start signal and first time of contact
to the virtual target. Lines indicate median and shaded area around lines represent inter-quartile range.
Yellow background represents perturbation.
4.3.2. Jacobian conversion factors
Figure 4.4 shows Jacobian conversion factors of all joints from equation (4), drawn for
movement time to 1.5s, across trials and all subjects. The three shoulder joints (SFE, SAA, HR)
and the elbow joint (EB) had larger factors compared to those of the wrist joints (WFE, WAA,
WR) throughout the trajectory. This implies that change in the wrist joints affects less on
endpoint displacement compared to other joints. Two largest factors were from SFE and EB at
the beginning of movement, and SFE and SAA by the end of movement. From this analysis, we
chose SFE, SAA, and EB to compose a trajectory in joint-space. HR also had significant
contribution, but we limited DOF to be three to match it with a trajectory in hand-space, and for
a fair comparison of Procrustes analysis.
66
Figure 4.4. Jacobian conversion factors. Lines and shaded area represent mean and standard error across
all subjects and trials.
4.3.3. Trajectories in hand-space and joint-space
In order to visualize trajectories in hand-space and joint-space, we plotted sample trajectories of
one subject in the abrupt group (Figure 4.5). The general trend is that trajectories in hand-space
remain close to the mean trajectory in baseline throughout adaptation trials. Even trajectories
right after introduction of perturbation (red lines) came back quickly to baseline. Asymptotic
average trajectories in the first learning block and the washout block were also close to the
baseline trajectory. On the other hand, trajectories in joint-space were deviated from the mean
baseline trajectory. Initial trajectories in the first learning block were biased from the middle of
trajectories, and even asymptotic trajectories in the learning and washout blocks remained biased
from the baseline trajectory.
67
Figure 4.5. Sample trajectories from an individual subject in the abrupt group. (A) Hand trajectories.
Gray sphere indicates the virtual target. (B) Joint trajectories.
68
69
Figure 4.6. Group-averaged time-series of each variable. Lines and shades represent mean and standard
error, respectively. (A) Abrupt group. (B) Gradual group.
Figure 4.6 shows asymptotic time-series of variables in both hand-space and joint-space. Both
groups had largely overwrapping time-series in X, Y, and Z; except small undershoot of the
learning asymptote in Y. On the other hand, variables in joint-coordinate had distinguished
learning asymptotes with relatively large variance.
Figure 4.7 shows normalized variability of each variable, defined as standard deviation of
displacement divided by mean displacement. Displacement here indicates deviation of
coordinate value at 1.5 s from 0 s. Out of four joint variables, SFE and SAA showed largest
variability that are larger than all three hand coordinates X, T, and Z.
Figure 4.7. Normalized variability of each joint.
70
4.3.4. Procrustes analysis
We performed Procrustes analysis to quantify trial-by-trial adaptation in terms of similarity to
baseline. Figure 4.8 shows group-averaged results for both hand-space and task-space. OSS in y-
axis represents ordinary sum of squares in equation (6), and lower value means closer shape to
the mean trajectory. Overall, OSS in joint-space was bigger than that of hand-space throughout
trials in both groups. This indicates trajectories in joint-space did not resemble each other
regardless of conditions, while trajectories in hand-space remained very close to the baseline
shape. Upon introduction of abrupt perturbation, OSS further increased from its baseline in joint-
space, and then gradually decreased. However, it did not recover to baseline level. The second
learning block did not cause as much rise as in the first learning block, implying partial learning
effect in joint-space. On the other hand, OSS of joint-space in the gradual group maintained
similar level as that of baseline throughout the first learning block. The abrupt increase in the
second learning block caused small increase, but the effect was not distinguishably large.
Figure 4.8. Procrustes analysis. OSS represents ordinary sum of squares, distance between a trajectory
and the mean baseline trajectory, calculated at each points consisting a single trajectory. Line was
smoothed as local average across 10% of nearby trials.
71
4.3.5. Functional principal component analysis (FPCA)
We investigated structure of trial-by-trial variability in velocity profiles of joint variables using
FPCA. Figure 4.9 shows analysis for joint velocity of SFE. Figure 4.9A shows trajectories of all
trials of all subjects taken together, after smoothed by Fourier bases. FPCA reveals structure in
variability among these different curves, and can display its trial-by-trial change within each
group. Figure 4.9B shows first four principal components in form of harmonics. Blue curves
represent mean velocity profile, green curves represent mean plus positive principal component,
and red curves represent mean minus positive principal component. These first four principal
components explain 94% of total variability. The first and second components are major
components and take 45% and 37% of variability, respectively. The first component is related to
peak phase and location of undershoot (compensatory correction: positive component for post-
compensation and negative component for pre-compensation). The second component is largely
associated with peak amplitude with small phase shift. The third and fourth components are
related to pre- and post-adjustment while not affecting phase and amplitude of peaks. Figure
4.9C shows trial-by-trial change in principal component scores for each component, separately
drawn for the two groups. The abrupt group maintains similar level of PC1 score, while the
gradual group exhibits negative scores in baseline for this component, indicating early
undershoot followed by higher amplitude with phase lag (See red curve in PC1 of Figure 4.9B).
This trend is opposite in PC2 score for the gradual group, staying mostly at positive values in
baseline, while remaining near 0 in learning blocks after reaching to asymptote. The abrupt
group also had positive PC2 scores in baseline, and went to negative upon initial exposure to
perturbation, and then recovered to 0. This trend suggests that PC2 score may be directly
affected by perturbation that influences maximum speed of a trajectory. Figure 4.9D shows
negative correlation between the first two component scores for both groups (
4
30.0, 10 Fp
for the abrupt group,
4
75.9, 10 Fp
for the gradual group).
72
Figure 4.9. Functional PCA for SFE velocity. (A) Velocity profiles of all subjects and trials, smoothed by
Fourier bases. (B) First four principal components, expressed as harmonics. Blue lines indicate mean,
green lines represent mean plus one PC score of each principal component, and red lines represent mean
minus one PC score of each principal component. (C) PC scores plotted as a function of trials. Red lines:
abrupt group. Blue lines: gradual group. (D) Correlation between first two principal components. Red
dots: abrupt group. Blue dots: gradual group.
73
4.4. Discussion
We investigated trial-by-trial adaptation in hand-space and joint-space under two conditions of
abrupt and gradual perturbation increase. The perturbation was velocity-dependent torque
applied to the elbow joint (EB) from summation of the two shoulder joint velocities (SFE and
SAA). We measured trial-by-trial performance change from movement duration and Procrustes
analysis. Movement duration of the abrupt group raised sharply at the onset of perturbation, and
then decreased as learning takes place. However, the asymptotic level was still higher than the
baseline level. In the second learning block, reduction in movement duration was faster than the
first block, indicating savings. Movement duration of the gradual group also raised sharply, but
by the middle of the first learning block, where gain reached around 2/3 of its maximum value
(4.9). This implies existence of a perturbation threshold that the default internal model can
compensate.
After observing this overall performance level by movement duration, we examined different
trial-by-trial dynamics in hand-space and joint-space. We used kinematic data of “trajectory”,
which is a reconstructed 3D path in each space. In building the trajectory in joint-space, we
reduced DOF from seven to three by choosing top three significant joints to affect endpoint
displacement: SFE, SAA, and EB. These three joints were also involved in the way we generated
perturbation torque (equation 2). Procrustes analysis transformed trajectories to register to the
mean baseline trajectory. In this process, only topological properties of trajectories were
conserved, while size and direction had little importance. The distance between transformed
trajectories and the mean trajectory defined similarity, or closeness to the baseline trajectory.
Regardless of experiment conditions and different perturbations, joint trajectories had much
larger variability than hand trajectories. This implies subjects plan in task space while exploiting
redundancy in joint space, such as in theory of uncontrolled manifolds. Also, similarly to our
previous findings from qualitative observation of sample trajectories (Mistry et al. 2005),
similarity level of joint trajectories did not return to baseline level at asymptotic phase of
learning blocks. Furthermore, the washout block was not sufficient to return similarity back to
the baseline level. However, despite of large variability, joint space configuration also showed
some learning effect: distance to baseline decreased gradually with training, and remained at its
74
asymptotic level at the onset of second perturbation. In the gradual group, similarity in joint
trajectories did not deviate much from its baseline level in the first learning block. This suggests
that joint redundancy provides flexibility to counteract small increase of perturbation efficiently,
supporting the idea of exploiting redundancy to achieve the task goal.
In high-dimensional system with redundancy, large variability is very common and investigating
its source and structure is essential to understanding motor adaptation in redundant system. We
adopted the idea of functional data analysis that takes a single trajectory as a unit data entity.
With taking one trajectory as one data “point”, now interest is in investigating how it varies in
different trials, and understanding what the underlying structure of such variability is. For this
purpose, we introduced functional principal component analysis (FPCA), which is extension of
ordinary PCA to functional data. We applied FPCA to velocity profiles of SFE and SAA. Both
velocity profiles revealed similar principal components: first two components consisted of phase
and amplitude of maximum velocity peak. Furthermore, plotting principal component scores (PC
scores) as a function of trials showed learning effect in structure of variability. Though specific
interpretation of each PC score from learning needs more investigation, the result suggests that
trial-by-trial variability in trajectory is not random, but is affected by presence of perturbation
and adaptation to compensate it.
75
Chapter 5. Summary
In dissertation, I discussed computational approach to motor adaptation in three different topics:
sources of motor adaptation, modular structure of motor memories, and increased variability in
joint-space of redundant system.
In the first topic of multiple learning mechanisms, we dissociated error-based learning and
reward-based learning by a novel experimental paradigm of “no target”. To our best knowledge,
this was the first study that proved forward model update by isolated sensory prediction error
alone, in the absence of external target. We also showed that the updated forward model could be
transferred to inverse model to generate a motor command to achieve a goal-defined motor task.
Reward-based learning initially led completely different pattern of increased inter-trial variability,
indicating exploration in the absence of direct sensory information. Variability decreased again
as subjects found the hidden target location, which suggests now they turned to exploitation to
maximize reward probability. We found there exists a negative correlation between reward
probability and active searching behavior represented by exploration. In the later phase of
reward-based learning, we found evidence of partial update of forward model, though up to less
degree than update from sensory-feedback. This result implies that forward model always tried to
make a prediction, regardless of type of feedback available.
In the second topic of modular structure of motor memories, we predicted that formation of
motor memory is divided into one of two categories: two-model learner that generates a new
expert responsible for perturbation, and one-model learner that updated the existing expert –
baseline model. From the theory of mixture of expert modules, we provided a theoretical
criterion that distinguishes two scenarios, and experimental results replicated this theoretical
prediction. Subjects with large perturbation mostly generated two-model learning system, and
showed quick transition between different states, realized by short aftereffect, quick savings
(close to one-step change), and abrupt switching in error-clamp. On the other hand, subjects with
small perturbation mostly turned out to be one-model learners, who actually updated and erased
learned memory every time environment changes. As a result, these subjects exhibited long-
lasting aftereffect, no or little savings, and gradual passive decay in error-clamp.
76
In the third topic of adaptation in redundant-system, we investigated trial-by-trial adaptation and
variability in both task- and joint-space. We applied two different adaptation schedules: the one
with abrupt increase in the initial learning block, and the other with gradual increase. We
adopted Procrustes analysis to define similarity between trajectories in high-dimensional space.
Regardless of gain schedules, trajectories in joint-space had large deviation from baseline and
higher inter-trial variability than trajectories in hand-space. The result implies that our brain
actively exploits redundancy in joint-space to quickly achieve goal in task-space. Actually, hand
trajectories remained very close to baseline shape throughout perturbation trials. We also
analyzed structure in variability of joint velocities. Despite of overall large variability, it had
clearly separable structures represented by principal components. Principal scores, a
measurement on how each trajectory follows deviation of principal components, changed
systematically as perturbation appears and disappears. Moreover, some of these principal
components were correlated to each other.
Overall, the dissertation showed that important aspects of motor adaptation can be well described
by computational theories and analyses. One of the greatest advantages in computational theories
is that they can make predictions before observation is available. In academic research, these
predictions make testable hypotheses. In practical and clinical applications, theories can help
design more efficient training methods and schedules, and can provide individualized and
adaptive diagnoses or treatment by estimating patients’ current movement and behavior. I hope
my studies contribute both to academic and clinical field to improve motor rehabilitation.
77
Bibliography
Van Beers, R.J., Sittig, A.C. & Gon, J.J.D. Van Der, 1999. Integration of Proprioceptive and
Visual Position-Information : An Experimentally Supported Model Integration of
Proprioceptive and Visual Position-Information : An Experimentally Supported Model.
Journal of Neurophysiology, 81, pp.1355–1364.
Van Beers, R.J., Wolpert, D.M. & Haggard, P., 2002. When feeling is more important than
seeing in sensorimotor adaptation. Current biology : CB, 12(10), pp.834–7.
Berniker, M. & Kording, K., 2008. Estimating the sources of motor errors for adaptation and
generalization. Nature neuroscience, 11(12), pp.1454–61.
Berniker, M. & Kording, K.P., 2011. Estimating the relevance of world disturbances to explain
savings, interference and long-term motor adaptation effects. PLoS computational biology,
7(10), p.e1002210.
Brokaw, E.B., Holley, R.J. & Lum, P.S., 2013. Comparison of joint space and end point space
robotic training modalities for rehabilitation of interjoint coordination in individuals with
moderate to severe impairment from chronic stroke. IEEE transactions on neural systems
and rehabilitation engineering, 21(5), pp.787–95.
Burge, J., Ernst, M.O. & Banks, M.S., 2008. The statistical determinants of adaptation rate in
human reaching. Journal of Vision, 8, pp.1–19.
Choi, J.T. et al., 2009. Walking flexibility after hemispherectomy: Split-belt treadmill adaptation
and feedback control. Brain, 132(3), pp.722–733.
Choi, Y. et al., 2008. Performance-Based Adaptive Schedules Enhance Motor Learning. Journal
of Motor Behavior, 40(4), pp.273–280.
Crocher, V. et al., 2012. Constraining upper limb synergies of hemiparetic patients using a
robotic exoskeleton in the perspective of neuro-rehabilitation. IEEE transactions on neural
systems and rehabilitation engineering, 20(3), pp.247–57.
Cusumano, J.P. & Cesari, P., 2006. Body-goal variability mapping in an aiming task. Biological
Cybernetics, 94(5), pp.367–379.
Daw, N.D., Niv, Y. & Dayan, P., 2005. Uncertainty-based competition between prefrontal and
dorsolateral striatal systems for behavioral control. Nature neuroscience, 8(12), pp.1704–
1711.
Galea, J.M. et al., 2015. The dissociable effects of punishment and reward on motor learning.
Nature Neuroscience, 18, pp.597–602.
78
Gandolfo, F., Mussa-Ivaldi, F. a & Bizzi, E., 1996. Motor learning by field approximation.
Proceedings of the National Academy of Sciences of the United States of America, 93(9),
pp.3843–3846.
Ghahramani, Z. & Wolpert, D.M., 1997. Modular decomposition in visuomotor learning. Nature,
386(6623), pp.392–395.
Haith, A.M. & Krakauer, J.W., 2012. Theoretical models of motor control and motor learning. In
A. Gollhofer, W. Taube, & J. B. Nielsen, eds. The Routledge Handbook of Motor Control
and Motor Learning. New York: Routledge, pp. 7–28.
Haruno, M., Wolpert, D.M. & Kawato, M., 2001. Mosaic model for sensorimotor learning and
control. Neural computation, 13(10), pp.2201–20.
Herzfeld, D.J. et al., 2014. A memory of errors in sensorimotor learning. Science, 345, pp.1349–
1353.
Hidaka, Y. et al., 2012. Use it and improve it or lose it: interactions between arm function and
use in humans post-stroke. PLoS computational biology, 8(2), p.e1002343.
Huang, V.S. & Krakauer, J.W., 2009. Robotic neurorehabilitation: a computational motor
learning perspective. Journal of neuroengineering and rehabilitation, 6(5), pp.1–13.
Imamizu, H. et al., 2000. Human cerebellar activity reflecting an acquired internal model of a
new tool. Nature, 403(6766), pp.192–5.
Izawa, J., Criscimagna-Hemminger, S.E. & Shadmehr, R., 2012. Cerebellar contributions to
reach adaptation and learning sensory consequences of action. The Journal of neuroscience :
the official journal of the Society for Neuroscience, 32(12), pp.4230–9.
Izawa, J. & Shadmehr, R., 2011. Learning from sensory and reward prediction errors during
motor adaptation. PLoS Computational Biology, 7(3), p.e1002012.
Jacobs, R. a. et al., 1991. Adaptive Mixtures of Local Experts. Neural Computation, 3(1), pp.79–
87.
Jordan, M. & Rumelhart, D., 1992. Forward Models : Supervised Learning with a Distal Teacher.
Cognitive Science, 16, pp.307–354.
Jordan, M.I., 1996. Computational aspects of motor control and motor learning H. Heuer & S.
Keele, eds., New York: Academic Press.
Jordan, M.I. & Jacobs, R. a., 1994. Hierarchical mixtures of experts and the EM algorithm.
Neural Computation, 6, pp.181–214.
79
Kalman, R.E., 1960. A New Approach to Linear Filtering and Prediction Problems. Journal of
Basic Engineering, 82(Series D), pp.35–45.
Kawato, M., 1999. Internal models for motor control and trajectory planning. Current Opinion in
Neurobiology, 9, pp.718–727.
Kluzik, J. et al., 2008. Reach adaptation: what determines whether we learn an internal model of
the tool or adapt the model of our arm? Journal of neurophysiology, 100(3), pp.1455–64.
Kording, K.P., Tenenbaum, J.B. & Shadmehr, R., 2007. The dynamics of memory as a
consequence of optimal adaptation to a changing body. Nature neuroscience, 10(6),
pp.779–86.
Krakauer, J.W. et al., 2006. Generalization of motor learning depends on the history of prior
action. PLoS Biology, 4(10), p.e316.
Krakauer, J.W., Ghez, C. & Ghilardi, M.F., 2005. Adaptation to visuomotor transformations:
consolidation, interference, and forgetting. The Journal of Neuroscience, 25(2), pp.473–8.
Krakauer, J.W. & Mazzoni, P., 2011. Human sensorimotor learning: adaptation, skill, and
beyond. Current opinion in neurobiology, 21(4), pp.636–44.
Latash, M.L., Scholz, J.P. & Schöner, G., 2002. Motor control strategies revealed in the structure
of motor variability. Exercise and sport sciences reviews, 30(1), pp.26–31.
Lee, J.-Y. & Schweighofer, N., 2009. Dual adaptation supports a parallel architecture of motor
memory. The Journal of Neuroscience, 29(33), pp.10396–404.
Lum, P.S., Burgar, C.G. & Shor, P.C., 2003. Evidence for strength imbalances as a significant
contributor to abnormal synergies in hemiparetic subjects. Muscle & Nerve, 27(2), pp.211–
21.
Mazzoni, P. & Krakauer, J.W., 2006. An implicit plan overrides an explicit strategy during
visuomotor adaptation. The Journal of Neuroscience, 26(14), pp.3642–5.
Mehta, B. & Schaal, S., 2002. Forward models in visuomotor control. Journal of
neurophysiology, 88(2), pp.942–953.
Miall, R.C. & Wolpert, D.M., 1996. Forward Models for Physiological Motor Control. Neural
Networks, 9(8), pp.1265–1279.
Mistry, M., Mohajerian, P. & Schaal, S., 2005. Arm movement experiments with joint space
force fields using an exoskeleton robot. In IEEE Ninth International Conference on
Rehabilitation Robotics. pp. 408–413.
80
Nikooyan, A. a & Ahmed, A. a, 2015. Reward feedback accelerates motor learning. Journal of
neurophysiology, 113, pp.633–646.
Pekny, S.E., Criscimagna-Hemminger, S.E. & Shadmehr, R., 2011. Protection and expression of
human motor memories. The Journal of Neuroscience, 31(39), pp.13829–39.
Pekny, S.E., Izawa, J. & Shadmehr, R., 2015. Reward-Dependent Modulation of Movement
Variability. The Journal of Neuroscience, 35(9), pp.4015–4024.
Peters, J. & Schaal, S., 2008. Reinforcement learning of motor skills with policy gradients.
Neural networks : the official journal of the International Neural Network Society, 21(4),
pp.682–97.
Reisman, D.S. et al., 2009. Split-belt treadmill adaptation transfers to overground walking in
persons poststroke. Neurorehabilitation and neural repair, 23(7), pp.735–744.
Schaal, S. & Schweighofer, N., 2005. Computational motor control in humans and robots.
Current opinion in neurobiology, 15(6), pp.675–82.
Scheidt, R.A. et al., 2000. Persistence of Motor Adaptation During Constrained , Multi-Joint ,
Arm Movements. Journal of Neurophysiology, 84, pp.853–862.
Scheidt, R.A., Dingwell, J.B. & Mussa-ivaldi, F.A., 2001. Learning to Move Amid Uncertainty.
Journal of Neurophysiology, 86, pp.971–985.
Scholz, J.P. & Schöner, G., 1999. The uncontrolled manifold concept: Identifying control
variables for a functional task. Experimental Brain Research, 126(3), pp.289–306.
Schöner, G., 1995. Recent Developments and Problems in Human Movement Science and Their
Conceptual Implications. Ecological Psychology, 7(4), pp.291–314.
Schöner, G. & Scholz, J.P., 2007. Analyzing variance in multi-degree-of-freedom movements:
uncovering structure versus extracting correlations. Motor control, 11(3), pp.259–275.
Schweighofer, N. et al., 2011. Mechanisms of the contextual interference effect in individuals
poststroke. Journal of neurophysiology, 106(5), pp.2632–41.
Shadmehr, R. & Krakauer, J.W., 2008. A computational neuroanatomy for motor control.
Experimental brain research. Experimentelle Hirnforschung. Expérimentation cérébrale,
185(3), pp.359–81.
Shadmehr, R. & Mussa-Ivaldi, F., 1994. Adaptive Task of Dynamics during Learning of a Motor
Task. The Journal of Neuroscience, 14(5), pp.3208–3224.
81
Shadmehr, R., Smith, M. a & Krakauer, J.W., 2010. Error correction, sensory prediction, and
adaptation in motor control. Annual review of neuroscience, 33, pp.89–108.
Shmuelof, L. et al., 2012. Overcoming motor “forgetting” through reinforcement of learned
actions. The Journal of Neuroscience, 32(42), pp.14617–21.
Smith, M. a, Ghazizadeh, A. & Shadmehr, R., 2006. Interacting adaptive processes with different
timescales underlie short-term motor learning. PLoS Biology, 4(6), p.e179.
Taylor, J. a, Krakauer, J.W. & Ivry, R.B., 2014. Explicit and implicit contributions to learning in
a sensorimotor adaptation task. The Journal of neuroscience : the official journal of the
Society for Neuroscience, 34(8), pp.3023–32.
Taylor, J. a. & Ivry, R.B., 2011. Flexible Cognitive Strategies during Motor Learning J.
Diedrichsen, ed. PLoS Computational Biology, 7(3), p.e1001096.
Todorov, E., 2004. Optimality principles in sensorimotor control. Nature neuroscience, 7(9),
pp.907–15.
Torres-Oviedo, G. & Bastian, A.J., 2012. Natural error patterns enable transfer of motor learning
to novel contexts. Journal of neurophysiology, 107(1), pp.346–56.
Vaswani, P. a & Shadmehr, R., 2013. Decay of motor memories in the absence of error. The
Journal of Neuroscience, 33(18), pp.7700–9.
Wilks, S.S., 1938. The Large-Sample Distribution of the Likelihood Ratio for Testing Composite
Hypotheses. The Annals of Mathematical Statistics, 9(1), pp.60–62.
Williams, R.J., 1992. Simple statistical gradient-following algorithms for connectionist
reinforcement learning. Machine Learning, 8(3-4), pp.229–256.
Wolpert, D., Ghahramani, Z. & Jordan, M., 1995. An Internal Model for Sensorimotor
Integration. Science, 269, pp.1880–1882.
Wolpert, D.M. & Kawato, M., 1998. Multiple paired forward and inverse models for motor
control. Neural networks : the official journal of the International Neural Network Society,
11(7-8), pp.1317–29.
Zarahn, E. et al., 2008. Explaining savings for visuomotor adaptation: linear time-invariant state-
space models are not sufficient. Journal of neurophysiology, 100(5), pp.2537–48.
Abstract (if available)
Abstract
In this dissertation study, I conducted behavioral experiment and applied computational theories to understand human motor adaptation. Motor adaptation is one kind of motor learning in which learners return their performance gradually back to baseline level in the presence of external perturbation. Among various motor adaptation paradigm, I conducted visuomotor rotation for the first two studies and force-field adaptation for the third study. The central motor behavior I studied was volunteer reaching. ❧ Topics of each study are as following: i) Dissociation of different sources of motor adaptation: sensory feedback for error-based learning and reward feedback for reward-base learning. ii) Modular decomposition of motor memories to account for repeated learning and washout data, and stochastic behavior in error-clamp. iii) Trial-by-trial adaptation dynamics in high-dimensional redundant system, and structure in variability among trajectories during adaptation. ❧ Overall, the dissertation discovered i) dissociative learning mechanisms and their interaction, ii) modular structure of motor memories and conditions to distinguish two-model learners and one-model learners, and iii) higher deviation and variability of joint trajectories compared to hand trajectories and structured variability in trial-by-trial adaptation. These computational understanding on various aspects of motor adaptation have rich potentials to be applicable to clinical diagnosis and treatment by analyzing kinetic and kinematic data.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Computational models and model-based fMRI studies in motor learning
PDF
Modeling motor memory to enhance multiple task learning
PDF
The representation, learning, and control of dexterous motor skills in humans and humanoid robots
PDF
Computational model of stroke therapy and long term recovery
PDF
Experimental and computational explorations of different forms of plasticity in motor learning and stroke recovery
PDF
Iterative path integral stochastic optimal control: theory and applications to motor control
PDF
Computational transcranial magnetic stimulation (TMS)
PDF
Deficits and rehabilitation of upper-extremity multi-joint movements in individuals with chronic stroke
PDF
Reaching decisions in dynamic environments
PDF
Minimum jerk model for control and coarticulation of arm movements with multiple via-points
PDF
Model-based approaches to objective inference during steady-state and adaptive locomotor control
PDF
Learning reaching skills in non-disabled and post-stroke individuals
PDF
Hemisphere-specific deficits in the control of bimanual movements after stroke
PDF
Microdevelopments in adaptive expertise in STEM-based, ill-structured problem solving
PDF
Neuromuscular dynamics in the context of motor redundancy
Asset Metadata
Creator
Oh, Youngmin
(author)
Core Title
Computational principles in human motor adaptation: sources, memories, and variability
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Neuroscience
Publication Date
07/10/2015
Defense Date
05/04/2015
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Kalman filter,mixture of experts,motor adaptation,OAI-PMH Harvest,optimal feedback control,reinforcement learning,supervised learning
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Schaal, Stefan (
committee chair
), Schweighofer, Nicolas (
committee member
), Winstein, Carolee J. (
committee member
)
Creator Email
youngminbrain@gmail.com,youngmio@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-591951
Unique identifier
UC11300189
Identifier
etd-OhYoungmin-3587.pdf (filename),usctheses-c3-591951 (legacy record id)
Legacy Identifier
etd-OhYoungmin-3587.pdf
Dmrecord
591951
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Oh, Youngmin
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
Kalman filter
mixture of experts
motor adaptation
optimal feedback control
reinforcement learning
supervised learning