Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Modeling human reaching and grasping: cortex, rehabilitation and lateralization
(USC Thesis Other)
Modeling human reaching and grasping: cortex, rehabilitation and lateralization
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
MODELING HUMAN REACHING AND GRASPING:
CORTEX, REHABILITATION AND LATERALIZATION
by
Cheol Han
________________________________________________________________________
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(COMPUTER SCIENCE)
December 2009
Copyright 2009 Cheol Han
ii
Acknowledgments
I first appreciate to my advisor Dr. Arbib for giving me a chance to start my life
in academia and for providing valuable insights on research. Without his patience,
insight and knowledge, I couldn’t have finished my PhD. Dr. Schweighofer
motivated me and guided detail of research skills with patience. His open mind to
discussion and great regard on his students’ allowed to enjoy research with him and
his colleagues. I would like to show my gratitude to Dr. Winstein, Dr. Gordon,
Dr. Tretriluxana and Dr. Wolf for their data available to me and for their valuable
comments. I also thank to the rest of my dissertation committee, Dr. Schaal. This
study was supported by NIH P20 RR020700-01 partially.
My friends in USC community and in my home country encouraged and supported
me. I am thankful for exciting discussion and enjoyable research atmosphere with Dr
Schweighofer’s students, Younggeun Choi, Jeongyoun Lee, Feng Qi, Yukikazu Hidaka,
Jihye Lee, and Duckho Kim. Shu-ya Chen, Yi-Hsuan Lai, and Parikh Neerav collected
experimental data of BART instead of me, which reoriented my research practically.
James Bonaiuto, Dr. Jeff Begley, Anon Prangprasopchok and Jinyong Lee, students of Dr.
Arbib, provided a warm environment of our lab, USC brain project. Early in my life in
USC, I cannot forget Sunnie Chung and Kwangjin Han. I would remember the brightest
time with them during my early academic life filled with chaos and uncertainty. My
friends in my home country, Hyungmook Lee and Hyunho Kim innervated me when I
lost courage to finish my PhD. Whenever I need a rest in my home country, my old
friends always gave a shelter to recharge myself: Junwon Lee, Hyungjin Kim, Jinwoo
iii
Shin, Jihyuk Moon, Eunkyu Park, Seungmo Kang, Sehoon Cho, Joonseok Lim, Sinhwa
Kim and all that I do not remember now.
I thank to my family: my parents–Soonjung Han, Aejin Kim, my sister–Songyee
Han & her husband–Dr. Tai Gyu Kim, my wife–Ashley Hyunjin Park and her family–
John, Yookyung, Ellen&Jake, and James. Especially, my parents’ endless support,
patience and love before and during my PhD life made it all possible.
iv
Table of Contents
Acknowledgements
List of Tables
List of Figures
Abstract
Chapter 1. Introduction
1.1. Human reach-to-grasp behavior
1.2. Organization of the thesis
Chapter 2. Background: Physiology, cortical map and their models
2.1. Physiology of neural systems relevant to trajectory control
2.1.1. Distributed processing in the brain
2.1.2. Physiology of the primary motor cortex
2.1.3. Inflow of motor information
towards the primary motor cortex
2.2. Cortical maps and their plasticity
2.2.1. Organization of the primary motor cortex
2.2.2. Previous computational models of the cortex
2.2.3. Advantages of population coding
2.3. Models of trajectory control
2.3.1. Optimality of motor control
2.3.2. Motor control procedures
2.3.3. Integrated models
2.3.4. Analytical model
2.3.5. Limitations of computational models
Chapter 3. Improving spontaneous use of the affected arm after stroke:
predictions from a computational model
3.1. Introduction
3.2. Methods
3.2.1. Behavioral setup
3.2.2. Computational model
3.3. Simulation
3.4. Result
3.5. Discussion and predictions
3.5.1. Cortical reorganization after stroke and therapy
3.5.2. Strengths and limitations of the model
3.5.3. Specific and testable predictions derived from the model
3.5.4. Implication for rehabilitation
ii
vii
viii
xii
1
1
3
7
7
7
8
13
15
15
17
20
21
21
22
30
37
39
41
41
44
44
45
51
53
68
68
70
73
74
v
Chapter 4. Understanding the functional threshold:
predictions from a computational model and,
supporting data from Extremity Constraint-induced Therapy
Evaluation (EXCITE) Trial
4.1. Introduction
4.2. Methods
4.2.1. Computed simulation methods
4.2.2. Re-analysis of clinical data from the EXCITE trial
4.3. Results
4.3.1. Computed simulation results
4.3.2. Re-analysis of clinical data from the EXCITE trial
4.4. Discussion
4.4.1. Limitations and future work
Chapter 5. Variability in detouring in reach-to-grasp: individualized strategies
with virtual targets
5.1. Introduction
5.2. Experimental data
5.2.1. Experimental setup
5.2.2. Six patterns in the grasping module
5.2.3. Two patterns in the reaching module
5.2.4. Coordination patterns relate more to grasp strategy
than reach strategy
5.2.5. Variability of patterns over subjects, hands and trials
5.3. A computational model and its implications
5.3.1. The Hoff-Arbib model
5.3.2. Extending the model to detouring in the reach-to-grasp
5.3.3. Simulation results
5.4. Discussion
Chapter 6. A cortical model of motor control
with extrinsic and intrinsic neural representation
6.1. Introduction
6.2. Methods
6.2.1. Arm model
6.2.2. Cortical model of the motor cortex and motor input map
6.2.3. kinematic coding for actor-critic framework and,
motor input map
6.2.4. Simulation
6.3. Results
6.3.1. Motor output map
6.3.2. Motor input map
6.3.3. Stroke rehabilitation
6.4. Discussion
76
76
80
80
82
85
85
88
91
93
95
95
98
98
101
106
107
109
110
111
113
119
123
128
128
131
134
135
139
143
144
144
145
148
150
vi
Chapter 7. Conclusion: Optimization through learnable representations
7.1. Summary
7.2. Relationship of models
7.3. Optimization through learnable representation
7.4. Future work
Bibliography
Appendices
Appendix A. Learning rule derivation
of a simplified motor cortex model
Appendix B. Effects of learning process on bistability
in use and motor performance
Appendix C. Trial numbers of each group, simulation parameters
and histogram diagram for reach-to-grasp coordination
model
Appendix D. A two link arm Model with six Hill-type muscles
151
156
158
160
164
167
189
189
192
194
197
vii
List of Tables
Table 5.1. Summary of features used in assessing reaching and grasping
modules.
Table 5.2. Six different groups in grasping module with four criteria
Table 5.3. Parameters for the extended Hoff-Arbib model
Table 5.4. Coordination strategies for the four groups of detour trajectories
Table 6.1. Parameters of the dynamics of cortical models and learnable
networks
Table C.1. Trial numbers which are classified to each group
Table C.2. Parameters used for each pattern
Table D.1 Parameters of six muscles: shoulder extensor (E), shoulder flexor
(F), elbow opener (O), elbow closer (C), biarticular bicep (B), and
biarticular triceps (D)
Table D.2. Parameters of two-link arm
101
103
117
118
143
194
195
198
199
viii
List of Figures
Figure 1.1. Scope of the current work.
Figure 3.1. (a) Experimental setup, (b) Model structure
Figure 3.2. Neuronal population coding (a) and spontaneous use (b) over the
workspace for the affected arm (1) before stroke, (2) after stroke, (3)
after 3000 free choice trials and (4) after 1000 forced used trials
followed by 2000 free choice trials.
Figure 3.3. Time course of directional error (A), normalized population vector
(PV) (B) and spontaneous arm choice (C) in the affected range just
before stroke, following stroke (“acute stroke”), during rehabilitation,
and after rehabilitation (“chronic stroke”).
Figure 3.4. Long-term effect of therapy as a function of the duration of therapy.
Figure 3.5. Directional error (A), normalized population vector (PV) (B), and
spontaneous arm use (C) in the immediate and follow-up tests.
Figure 3.6. Effect of stroke size. (A) Number of rehabilitation trials required to
reach the effective rehabilitation threshold, as a function of lesion sizes.
(B) Normalized population vector (PV) as a function of lesion size in
the follow-up test after 800
Figure 3.7. Changes in reach precision (standard deviation of directional error)
in relation to changes in accuracy (mean of directional error) for (A) the
contralateral (affected) arm, and (B) the ipsilateral (non-affected) arm.
Figure 3.8. Reorganization of the affected (left) hemisphere (A), and non-
affected (right) hemisphere (B) after stroke followed by therapy or no
therapy.
Figure 3.9. Cortical reorganization without the unsupervised learning term in
Equation (3.3). Reorganization of the affected (left) hemisphere (A), and
non-affected (right) hemisphere (B) after stroke followed by therapy or
no therapy.
Figure 3.10. Effect of the supervised learning rate (A), the unsupervised
learning rate (B) and the reinforcement learning rate (C) on directional
error after different durations of therapy (200, 400 and 800 therapy
trials) followed by 3000 free choice condition .
4
45
54
57
58
58
59
61
63
65
67
ix
Figure 4.1. Simulated data of use of the affected arm (A) just after therapy and
(B) 3000 trials after therapy as a function of performance (reach
directional error in the model) just after therapy for 125 simulated
subjects with different lesion sizes and locations.
Figure 4.2. Use of the more affected arm (as recorded by the MAL AOU
subscale) (A) just after therapy and (B) 1 year after therapy as a function
of arm and hand function (Functional Ability Scale) just after therapy
for subjects of the EXCITE trial.
Figure 5.1. Experimental setup (Adapted from Tretriluxana et al.2009) in a
lateral view and a top view.
Figure 5.2. Major features of an experimental trial (C004, left hand, 17
th
trial)
in reach velocity profile (a), in aperture size profile (b), trajectory (c)
and coordination of reach velocity over aperture size (d).
Figure 5.3. Six different groups of coordination patterns, showing transport
velocity profile (the first column), aperture size profile (the second
column), and coordination between transport velocity profile and
aperture profile (the third column).
Figure 5.4. (a) Pattern separability of grasping features only with t(MAP) and
t(PI), and (b) pattern separability of coordination with hyper-parameter
t(MAP)-t(FVP) and t(PI)-t(MDP).
Figure 5.5. Two types of reach velocity profile.
Figure 5.6. Inter-subject, inter-hand and inter-trial variability
Figure 5.7. (a) A model of time-based coordination between feedback
controllers for reach and grasp. (b) Time-activation based coordination
of reach to grasp with detour around a barrier.
Figure 5.8. Virtual target hypothesis for reaching
Figure 5.9. Comparison of a typical pattern of movement and the result
obtained from the model
Figure 6.1. Overall structure with dual map hypothesis
Figure 6.2. (a) Arm model with six muscles. (b) 8 targets on a circle 15cm
farm from a resting posture’s endpoint.
Figure 6.3. (a) cortical model, (b) Topology of neighborhood neurons.
87
89
99
100
104
105
107
110
112
115
120
131
134
135
x
Figure 6.4. transforming distorted input space (a) in joint coordinate to a more
balanced input space (b)
Figure 6.5. developed motor output map for six muscles before training (a) and
after training (b), where white denotes strong connection and black
denotes no connection between each neuron in the motor cortex and a
motoneuron in the spinal cord, which controls six different muscles:
shoulder extensor (E), shoulder flexor (F), elbow opener (O), elbow
closer (C), biarticular bicep (B), and biarticular tricep (D).
Figure 6.6. Averaged motor activation pattern for 8 different directions
between 0 msec and 300 msec after movement initiation (a), where
white denotes strong activation (0.15) and black denotes no activation
and each small box represents 20 by 20 neurons on the motor cortex for
8 different directions, and (b) corresponding movements with their
velocity profile.
Figure 6.7. Directional coding of selected neurons (a) and population vector for
eight different directions (b) based on averaged activation between 0
msec and 300 msec after movement initialization. In (b), thin blue lines
represent individual neurons’ activation on the direction of their own
preferred direction and thick red lines represent population vectors for
each movement direction, which fairly indicate movement direction.
Figure 6.8. Motor performance just after stroke (a) and after re-training (b),
without reorganization of muscle synergies.
Figure 6.9. Motor cortex activation pattern changes just after stroke (a) and
after retraining (b), without reorganization of muscle synergies, where
white denotes strong activation (0.15) and black denotes no activation
and each small box represents 20 by 20 neurons on the motor cortex for
8 different directions.
Figure 7.1. Relationship between models in chapter 5 and chapter 6.
Figure A.1. Approximation of
i
p e
δθ δθ / .
Figure B.1. Directional error, the normalized population vector (PV) and
spontaneous arm use after different durations of therapy followed by 0
free choice trials (immediate) and 3000 free choice therapy (follow-up)
without supervised learning.
141
145
146
148
149
150
160
189
192
xi
Figure B.2. Directional error, the normalized population vector (PV) and
spontaneous arm use after different durations of therapy followed by 0
free choice trials (immediate) and 3000 free choice therapy (follow-up)
without unsupervised learning.
Figure B.3. Directional error, the normalized population vector (PV) and
spontaneous arm use after different durations of therapy followed by 0
free choice trials (immediate) and 3000 free choice therapy (follow-up)
without reinforcement learning.
Figure C.1. Histogram of t(FVP), t(MDP), t(PI) and t(MAP) for EP+DEP
patterns and LP+DLP patterns
193
193
196
xii
Abstract
Reach-to-grasp action is a principal action in our daily lives; writing, eating, and
using tools, first of all, start with a reach-to-grasp. Despite abundant experiments and
models, underlying mechanism of reach-to-grasp action is still unclear. This study
proposed three different models to identifying underlying mechanisms of reach-to-grasp
in terms of how action selection facilitates skill learning, how an action is planned, and
how a plan is executed.
The first model studied nonlinear interaction between action selection and motor
execution. Constraint-induced therapy reverses unilateral effects of stroke in terms of arm
use and motor function. If daily uses may form training, motor function may be regained
without further expensive therapy (Winstein et al., 2004). Using a computational model, I
showed that if affected arm use exceeds a threshold after a certain amount of therapy,
effects of stroke may be reversed through daily activities.
The second model is about variability in reach-to-grasp coordination. Contrasting
that previous models capture a typical behavior, a new experimental data (Tretriluxana et
al.), which researched right-handers' reach-to-grasp directly or detouring to avoid an
obstacle, showed that healthy adults do not exhibit this typical reach-to-grasp
coordination. I quantized the data and showed existence of distinct coordination patterns.
Then extending Hoff-Arbib model (Hoff & Arbib, 1993), I reproduced those different
patterns.
The third one hypothesized the role of the motor cortex with its high level coding
and low level coding in reaching. Despites abundance of experimental results on the
xiii
motor cortex, underlying motor control mechanism with a map representation is still in
debate, whether the motor cortex encodes reaching movements as movement directions
or muscle synergies. Considering that motor control procedure may include coordinates
transformation, I hypothesized that the motor cortex contains both coding, where muscle
synergies are primary. A biologically plausible cortical model, first, learned a motor
output map (projection from the motor cortex to the spinal cord), encoding muscle
synergies, and learned a motor input map (projection from supra-motor cortices to the
motor cortex) which activates the motor cortex appropriately, aligning a given movement
direction to a muscle synergy direction. I called this structure ‘dual map’.
1
Chapter 1
Introduction
1.1. Human reach-to-grasp behavior
In the late nineteenth century, Charles Darwin observed that humans became
dominant on the earth after bipedalism freed our hands. Our daily lives are filled with
actions using our upper limbs, including eating with our hands, using tools, and rotating
steering wheels. The principal sub-action behind these actions is a reach-to-grasp action.
For example, tool use requires, first of all, reaching for and then grasping the tool.
Both the importance of reach-to-grasp action and ease of designing experiments led
to numerous studies during and since the 1980s (Buccino et al., 2001; Buneo, Jarvis,
Batista, & Andersen, 2002; Cisek, Grossberg, & Bullock, 1998; Cuijpers, Smeets, &
Brenner, 2004; Desmurget et al., 1996; Desmurget et al., 1995; Flash & Hogan, 1985;
Georgopoulos, Schwartz, & Kettner, 1986; Graziano, Cooke, & Taylor, 2000; P Haggard
& Wing, 1995; Harris & Wolpert, 1998; Hoff, 1992; Hogan, 1984, 1985; Jeannerod,
1981; Jeannerod, Paulignan, & Weiss, 1998; Morasso, 1981, 1983; Mussa Ivaldi,
Morasso, & Zaccaria, 1988; Paulignan, Dufosse, Hugon, & Massion, 1989; Paulignan,
Frak, Toni, & Jeannerod, 1997; Paulignan, Jeannerod, MacKenzie, & Marteniuk, 1991;
Paulignan, MacKenzie, Marteniuk, & Jeannerod, 1990, 1991; Reinkensmeyer, Iobbi,
Kahn, Kamper, & Takahashi, 2003; Reinkensmeyer, McKenna Cole, Kahn, & Kamper,
2002; Todorov, 2000a; Tretriluxana, Gordon, & Winstein, 2004; Tretriluxana, Winstein,
& Gordon, 2005). Because the reach-to-grasp action is common among mammals, it can
be studied in other primates such as monkeys and chimpanzees, or rodents such as rats
2
and mice. As a result, it is possible to record directly the activation of the central nervous
system (CNS) and to dissect of the brains of these animals in studies; direct recording and
dissecting are not possible in human subjects. These techniques of recording and
dissecting have made it possible to increase the number of experimental results and
expand our understanding of reach-to-grasp action.
The reach-to-grasp action recruits various brain regions. The parietal cortex extracts
features of an object, including the location of the object (VIP), its orientation (PIP), and
its size (PIP). The motor cortex controls simple movements of the limbs and the premotor
cortex is thought to have a role in complex movement. The precise roles of the motor
cortex and the premotor cortex are still unclear due to its parallelly distributed structure.
An abundance of experimental results with varying interpretations have triggered many
controversies, even as the research yields greater understanding. One famous controversy
is the “muscle-versus-movement” debate (Asanuma, 1989; Georgopoulos et al., 1986).
This controversy concerned the issue of whether or not the activity of neurons in the
motor cortex correlates with muscle contraction or movement of the limbs. Georgopoulos
et al. (1986) reported that neurons in the motor cortex of a monkey activate differently
according to different movement directions, a finding yielded by recording neuronal
activity during active movements.
A human can learn a new motor skill or improve a previously obtained skill. And
certainly, skill learning induces plasticity modification of the relevant neural system.
However, its underlying mechanism is still on research with debates. Practically, this
underlying mechanism helped therapist to provide stroke survivors with a better
rehabilitation program which would recover stroke survivors from their motor
3
impairment after stroke. In modern western societies, risk of stroke has been building up
due to increases of cholesterol in the diet and increases in stress. Strokes result in
disabilities among stroke survivors. These disabilities include important reach-to-grasp
actions. To assure better life for stroke survivors, the importance of rehabilitation
programs has been emphasized.
1.2. Organization of the thesis
In this thesis proposal, I would like to explore roles of the motor cortex and related
planning in human reach-to-grasp behavior by suggesting and simulating neural models.
Chapter 2 is a review of literature relevant to the physiology of the neural system
and theory of motor control. The first part of the review is physiology of brain regions
related to motor control. Then, related to the neural representation of the primary motor
cortex, biologically plausible cortical models of the motor cortex is followed. Last, the
third part includes computational models of motor control, which are especially related to
a role of the motor cortex in reach-to-grasp action.
In the last section of literature review, I followed decomposition of reach-to-grasp,
including motor control procedures (Kawato, Furukawa, & Suzuki, 1987); trajectory
formation, transformation of coordinates, and generation of motor commands. Before
initiating the movement, planning of trajectory is required to select a target, to determine
the speed of movement, and to execute the movement efficiently. Then, the plan is
projected to the motor cortex. This projection may include transformation of coordinates
from high level ‘directional’ coding to intermediate level ‘joint’ coding or low level
‘muscle coding’. The motor cortex executes the actual movement through the spinal cord
4
and musculoskeletal system with a certain motor command. Followed by decomposition,
in this study, I propose three computational models of the role of the motor cortex in
human reach-to-grasp action. Each used its own hypothesis to explaining a procedure of
reach-to-grasp action and accounts for chapter 3, chapter 5 and chapter 6. Each model is
simulated in order to investigate its output. And this output is compared with result of the
appropriate experiments. The model described in chapter 3 focused on the prediction over
nonlinear characteristics of neurorehabilitation. Chapter 4 included a simulation which
matched the experimental data.
Figure 1.1. Scope of the current work. Three computational models try to answer the
following questions: How does action selection facilitate skill learning? (Chapter 3), How
is an action planned? (Chapter 5) and How is a plan executed? (Chapter 6)
In Chapter 3, the first model integrates simple bilateral motor cortical models with
higher level coding and an action choice module. This study indicates that the
rehabilitation of the cortical model and adjusting of action selection interact each other.
The mechanism underlying this model is that spontaneous hand-use affects the
rehabilitation of stroke survivors. Stroke survivors tend to use the less affected limb and
this tendency accelerates degeneration of the affected limb, which is called as ‘learned
non-use’ (E. Taub & Uswatt, 2006). This model showed that spontaneous hand use
5
increased the probability of selection of the more affected limb and this change of
selection increased chance to rehabilitate the affected limb more often in their daily lives
without a lengthy and expensive therapy. Then, we can formulate an optimal dose
hypothesis, that is, existence of the optimal length of therapy where spontaneous arm use
and motor performance improved even after termination of therapy, because high
spontaneous arm use facilitates motor learning.
In Chapter 4, using the model described in Chapter 3, a new simulation was
conducted in order to confirm the previous chapter’s hypothesis more directly with
EXCITE trials (S. L. Wolf et al., 2006), which contains 169 stroke subjects with 2 year of
arm use measure changes and functionality measure changes with a fixed (immediate or
1-year-delayed) length of therapy. The new simulation contains 125 artificial stroke
subjects, varying their stroke locations and sizes, with a fixed duration of therapy. Then,
average threshold was estimated through a novel way to use difference between arm use
just after therapy and long-term arm use. A re-analysis of EXCITE trials with our method
also showed a similar characteristic of the threshold shown in the simulation study.
In Chapter 5, the second model is extended Hoff-Arbib model to reproduce
variability in motor planning of detouring reach-to-grasp movements. Contrasting that
previous models of reach-to-grasp action capture the typical behavior patterns of these
actions, experimental data from the interdisciplinary study in the University of Southern
California (Tretriluxana, Gordon, Arbib, Fisher, & Winstein, 2007; Tretriluxana, Gordon,
Fisher, & Winstein, 2007; Tretriluxana, Gordon, & Winstein, 2007) have researched
right-handers' reaching to grasp an object detouring to avoid an obstacle, and showed that
healthy adults do not exhibit this typical reach-to-grasp strategy. Here, I hypothesized that
6
variability in reach-to-grasp coordination is based on the different equifinality of
subschemas; i.e. evading the obstacle starts and ends together with preshaping vs.
preshaping starts after evading the obstacle ends. I also hypothesized that a virtual-target
is used to plan detouring reaching movements to reproduce the experiment data with
extended Hoff-Arbib model.
In Chapter 6, the third model is a cortical model of motor cortex coding,
investigating a biologically plausible role of the cortex. This third model attempts to
answer the “muscle vs. movement” debate, and also to provide insight into rehabilitation
after stroke. I hypothesize that the motor cortex encodes both muscle coding and
movement coding instead of encoding only one of them; I called this “dual map.” This
dual map idea has been carefully mentioned by Kakei et al. (1999) with their experiment.
Later, a computational model of the dual map has been suggested carefully mentioned by
Todorov (2000a). However, the model he proposed was an analytical model. I suggest a
developmental model of the dual map.
Conclusions follow in Chapter 7.
7
Chapter 2.
Background: Physiology, cortical map and their models
2.1. Physiology of neural systems relevant to trajectory control
In this short review of physiology of neural systems, I focused on the primary cortex
and its peripheral cortex, including the sensory cortex, the premotor cortex and the
posterior parietal cortex. Even though the reinforcement learning in the Chapter 3 may be
relevant to the basal ganglia, I will exclude review of physiology of the basal ganglia here
because modeling a precise model of the basal ganglia is not a concern of the current
study. Thus, I provided a short review of the basal ganglia, especially its functionality in
Chapter 3. Similarly, even though the Chapter 4 may be relevant to the frontal cortex,
which accounts for planning, I excluded a review for the frontal cortex. The scope of this
review includes the primary motor cortex, the sensory cortex, the premotor cortex, the
posterior parietal cortex and the spinal cord.
2.1.1. Distributed processing in the brain
Even before Brodmann (1909) announced fifty-two anatomically and functionally
distinct brain areas with cytoarchitectonic technique, functional decomposition of a
human brain had fascinated researchers (Gall and Spurzheim, 1810; Flourens 1824). The
function decomposition of the human brain mentioned that a sub-region of the brain
correlates with its special function, which accounts for a following behavior and, that the
region may have specialized structure, which may be affected by its function and
distinguishable to the peripheral regions (Brodmann, 1909). I provided a physiological
8
review for the primary motor cortex (section 2.1.2). Then, I briefly reviewed
characteristics of the sensory cortex, the premotor cortex and the posterior parietal cortex,
which provide inflow of motor information towards the primary motor cortex (section
2.1.3). The outflow towards the spinal cord from the primary motor cortex is briefly
reviewed in section 2.1.2. The other outflow including projection towards the basal
ganglia and cerebellum is out-of-scope.
2.1.2. Physiology of the primary motor cortex
The primary motor cortex, also known as M1, is approximately matched with
Brodmann area 4. Its laminated organization contains giant pyramidal cells on the fifth
layer, which are the major outputs of the primary motor cortex. Especially, projection
towards α motoneurons in the spinal cord is called as the corticospinal tract and may
account for the somatotopic or topographical organization of the map representation.
However, the corticospinal projection directly towards the α motoneurons is not dominant
among projections towards the spinal cord; projection towards interneurons in the spinal
cord is dominant (Kirkwood, Maier, & Lemon, 2002; RN Lemon, Kirkwood, Maier,
Nakajima, & Nathan, 2004; Shadmehr & Wise, 2005). Besides the spinal cords, targets of
M1 projection include the basal ganglia and the cerebellum (Kandel, Schwartz, & Jessell,
2000).
Since Evarts (1966; E. V . Evarts, 1968) correlated the firing rate of a single neuron
in the primary motor cortex with the exerted force amount, it has been researched what
the neurons in the primary motor cortex encodes during four decades. However, the
precise role of the motor cortex is still unclear. An abundance of experimental results
9
with varying interpretations have triggered many controversies, even as the research
yields greater understanding. One famous controversy is the “muscle-versus-movement”
debate (Asanuma, 1989; Georgopoulos et al., 1986; R. Lemon, 1988). This controversy
concerned whether or not the activity of neurons in the motor cortex correlates with
muscle contraction or movement of the limbs. Georgopoulos et al. (1986) reported that
neurons in the motor cortex of a monkey activate differently according to different
movement directions, a finding yielded by recording neuronal activity during active
movements.
First, even though muscle coding had been supported with the intracortical
microstimulus techniques, which directly stimuli the motor cortex neurons with a
electrode, suggested by Asanuma and Sakata (1967), finally, Lemon (1988) provided a
strong evidence of muscle representation with ‘spike-triggered averaging’ technique (Fetz
& Cheney, 1980). Because the previous intracortical microstimulus (ICMS) technique
may induce indirect activation besides direct activation, the motor output map had been
coarse. With the ‘spike-triggered averaging’ technique, Lemon correlated a single neuron
with a muscle’s EMG activation, emphasizing cortico-motoneuronal (cortico-spinal)
tracts, suggested by Fetz & Cheney (1980). Lemon also suggested that a set of muscles
activated by a single tract may represent a complex muscle synergy. Donoghue identified
muscle representation of the individual muscle — FCR, biceps, and triceps from the
squirrel monkey motor cortex (Donoghue, Leibovic, & Sanes, 1992). They reported the
most evoked site activated several muscles even though the average number of activated
muscles is small (about three muscles per site).
1 0
In the 80s, Georgopoulos et al. (Georgopoulos, Kalaska, Caminiti, & Massey, 1982)
measured the firing rate of neurons in the primary motor cortex of monkeys during a
reaching task on two-dimensional horizontal space with a fixed initial location. Then,
they found that the each neuron’s firing rate is correlated with the movement direction. A
neuron contains a preferred direction of movement in the extrinsic coordinate and its
tuning curve is cosine function of difference between desired movement direction and its
preferred direction. In other words, if the movement direction and preferred direction of a
neuron, the neuron is maximally activated. On the contrary, if the movement direction is
opposite of preferred direction of a neuron, the neuron is minimally activated or not
activated. Georgopoulos (1986) extended his directional coding model, introducing
population vector, that is, the desired movement direction coincides with the direction of
summation of individual vectors, whose direction is a neuron’s preferred direction and
whose length is a neuron’s firing rate.
At the end of the twentieth century, Kakei et al. (1999) reported an experimental
evidence that the primary motor cortex may encode both muscles and directions. During
wrist movement task with three different wrist postures (pronation, middle, and
supination), they measured preferred direction of each neuron in the ventral premotor
cortex and the primary motor cortex, and preferred direction of certain muscles of
monkeys. Then they found that, according to the postures, some of neurons in the primary
motor cortex showed rotated preferred directions, contrasting no change in the preferred
directions of neurons of the ventral premotor cortex. They interpreted it as existence of
muscle coding, because extrinsic coordinate coding would not be affected by different
postures. Kakei et al. (1999) did not insist that the motor cortex coding is dominantly
1 1
based on a muscle coding. Instead, they suggested that the motor cortex may contain both
muscle coding and directional coding. Their suggestion of blending two views was
strongly supported by their results and is distinguishable from Scott’s experiments (Scott
& Kalaska, 1995, 1997; Scott, Sergio, & Kalaska, 1997) before Kakei et al’s. Scott et al.
measured the firing rate of neurons in the primary motor cortex as same as Georgopoulos
(1986) for two different postures. Then they negated correlation between coding of
neurons and movement directions. Because an experiment of Kakei et al. (1999) tested a
simple movement with an isolated single muscle, Kakei et al. (1999)’s provides stronger
evidence of existence of muscle coding, keeping open possibility of movement direction
coding in the extrinsic coordinate.
The coding in the motor cortex is still controversial. On the year of 2000, Todorov
(2000a) and Scott (2000a) independently insisted that a dominant encoding of neurons in
the primary motor cortex is related to the muscle, and the directional coding is an ‘epi-
phenomenon’. Todorov suggested a computational model of primary motor cortex, whose
dominant neural coding is muscle coding, with a biologically plausible but simple (not
including the musculoskeletal system) arm model. He reproduced the result of population
coding as shown by Georgopoulos (1986). Then he asserted that the linear component of
the muscle force coding, which is a direction of movement, may be measured even
though there only exists a muscle coding. The detail review of Todorov’s model is shown
in section 2.2.3. After an extensive debate for ‘one motor cortex, two different view’ on
that year (Georgopoulos & Ashe, 2000; Moran & Schwartz, 2000; Scott, 2000b; Todorov,
2000b), there has not been a radical debate so far until 2007. However, both sides still
gather evidence for their views.
1 2
There are several other views of neural coding, torque coding (Kurtzer, Herter, &
Scott, 2006; Scott & Kalaska, 1997) for the joint coordinate, joint ‘power’ coding for
efficiency of movement (Scott, Gribble, Graham, & Cabel, 2001), and complex
movement coding, also known as preferred trajectory coding for habitual movements
with longer duration of ICMS (Graziano, Taylor, & Moore, 2002; Graziano, Taylor,
Moore, & Cooke, 2002),
Another characteristic of the primary motor cortex is that it may contain dynamics
of the movement, with a light debate that dynamics may reside in the cerebellum not in
the motor cortex. Shadmehr et al. (1994) introduced the force field reaching adaptation
task, which exposes a novel environment to subjects using a robot manipulator. A subject
held the end-effector of the robot manipulator and was asked to reach to a target. During
the movement, robot manipulator perturbed the movement with a force which is
modulated by the hand characteristic including velocity of the movement or location of
the hand. They showed that this new and novel dynamics environment can be learned.
Then, the force field task helped us to get the fact that the primary motor cortex may
contain this adaptation of dynamics (Conditt, Gandolfo, & Mussa-Ivaldi, 1997). Li et al.
(2001) performed the force field adaptation task with measuring the preferred direction of
neurons in the primary motor cortex. They found that about 70 % of neurons in the
primary motor cortex would be affected with respect to the preferred direction either in a
force field condition or in a wash-out condition. Shadmehr and Wise (2005) concluded
that these neurons (about 70%) may be related to adaptation of dynamics and the other
neurons (about 30%) may be kinematics cells. Li et al. (2001)’s data did not connect this
preferred direction shift and neural coding in ‘muscle vs. movement debate’. In my
1 3
opinion, it depends on whether participants’ cognitive processing affect on the planning
or not. If participants understand what the force field affects their movement, they may
change their high level planning of the movement which is correlated with directional
coding. Thus, the preferred direction shift merely showed this new planning. However, as
Shadmehr et al. (1994) expected, if participants could not understand a novel
environment and tried to change a motor program keeping with the target direction, this
preferred direction shift may be evidence that the primary motor cortex contains both
muscle coding (about 70%, muscle coordinate or joint coordinate) and directional coding
(about 30%, extrinsic coordinate).
2.1.3. Inflow of motor information towards the primary motor cortex
There are three major inflows from the cortical areas towards the primary motor
cortex: the sensory cortex, the posterior parietal cortex and the premotor cortex.
First, the sensory cortex is the nearby cortical area to the primary motor cortex and
similarly contains topographical organization. It includes various type of sensory
information through the spinal cord and thalamus (Kandel et al., 2000). Here, I focused
on the proprioceptive information of the upper limb, which provides the current status of
the posture. This proprioceptive information comes from the spinal cord, more
fundamentally from muscles. A muscle has three types of afferents: Ia (primary muscle
spindles), Ib (Golgi tendon organs) and II (secondary muscle spindles). The muscle
spindles receive information using gamma motoneurons and closely related to the length
of muscles. Ia is sensitive to the velocity of the muscle length and II is sensitive to the
muscle length itself (P. B. C. Matthews, 1972; Shadmehr & Wise, 2005). On the other
1 4
hand, the Golgi tendon organs sensed the size of force exerted (Schafer, Berkelmann, &
Schuppan, 1999; Shadmehr & Wise, 2005). The details of these afferents are out-of-scope.
Here, I would like to emphasize that the sensory cortex, especially related to the reaching
and grasping, may encode lengths, velocity and force exerted of muscles through the
spinal cord.
Second, the posterior parietal cortex is close to the visual cortex and responds to the
sensory information, especially visual information. Battaglini et al. (2002) showed that
the lesion of the posterior parietal cortex (V6A) deteriorates visually guided reaching.
The intraparietal areas are very crucial for reaching and grasping. Its medial part (MIP),
lateral part (LIP) and ventral part (VIP) account for localizing the positional target
location to be reached with multimodal sensory information for reaching movement
(Shadmehr & Wise, 2005). On the contrary, the posterior intraparietal area (PIP) and the
anterior intraparietal area (AIP) account for the target location in terms of size, shape and
orientation of an object for grasping movement (Fagg & Arbib, 1998; Sakata et al., 1998;
Sakata et al., 1999; Sakata, Taira, Murata, & Mine, 1995), which allow the ventral
premotor cortex to plan a fine grasping (Fagg & Arbib, 1998).
Third, for the premotor cortex, the dorsal premotor cortex (PMd) and the ventral
premotor cortex (PMv) would be reviewed with respect to reaching and grasping. These
lateral parts of premotor cortex are supposed to account for the externally triggered
movements and may contain association between external sensory cue and a specific
movement (Kandel et al., 2000). The ventral premotor cortex is supposed to contain
mirror neurons from the monkey experiment, which respond to execution of its own
grasping action and observation of other’s grasping action (Gallese, Fadiga, Fogassi, &
1 5
Rizzolatti, 1996; Oztop & Arbib, 2002; Rizzolatti & Fadiga, 1998; Rizzolatti, Fadiga,
Gallese, & Fogassi, 1996). On the other hand, the dorsal premotor cortex may account for
reaching movement.
2.2. Cortical maps and their plasticity
The review of cortical maps is aimed for modeling a biologically plausible cortical
model of the primary motor cortex in Chapter 5. Here I would not provide a debatable
hypothesis of neural representation of the primary motor cortex. Instead, after reviewing
lamination of the cortex (section 2.2.1) and computational models of the cortex were
reviewed (section 2.2.2). Then, I introduced general advantages of population coding in a
theoretical view, not a neural view (section 2.2.3).
2.2.1. Organization of the primary motor cortex
Even though there are different parcellation schemes to define brain areas, the
primary motor cortex is defined clearly, being characterized by giant pyramidal neurons
in the layer V, that is, the fifth layer of the neocortex. A neo-cortex is an outer part of a
mammalian brain, and means a new cortex, which evolved latest (Douglas & Martin,
2004).
In general, each layer in the neocortex has different types of neurons and its own
functionality. The following paragraph is a summary of laminated organization in the
visual cortex (Douglas & Martin, 2004; Gilbert, 1983; Gilbert & Wiesel, 1983). The layer
I is the outermost layer which receives most of projection from other cortical areas and
passes the information to the layer V’s neurons. The neurons in the layer V projects to the
1 6
layer VI’s neurons and they projects to the layer VI’s neurons in turn. The layer IV also
receives thalamocortical projection and projects from both thalamus and layer VI’s
neurons to the layer I’s neurons. Even though this description of layers is for the visual
cortex, the primary motor cortex may be similar because both the visual cortex and the
primary motor cortex are neocortex.
The neocortex also contains columnar structure, that accounts for locality of neurons,
connecting neurons vertically across layers (Hubel & Wiesel, 1962). Those columns are
also connected each other with lateral connectivity, either intracortical or intercortical.
This lateral connectivity can be classified to intercortical connectivity and intracortical
connectivity. Intercortical lateral connectivity has been found in the layer III towards
other cortical areas (Douglas & Martin, 2004). Intracortical lateral connectivity has been
found in the layer I with arborization, exhibiting a patchy through retrograde labeling
(Douglas & Martin, 2004; E. G. Jones & Wise, 1977). In a reaching task of a forelimb
with a rat, skill learning induces increased strength of horizontal (lateral) intracortical
connection in the layer II and III, confirmed through slices of the rat’s motor cortex
(Rioult-Pedotti, Friedman, Hess, & Donoghue, 1998). The long-range lateral connection
is excitatory and to obtain inhibitory connectivity, a local inhibitory circuit through
GABA interneurons may be intervened (Donoghue, 1995). Possibly, this strengthened or
excitatory modified lateral connectivity may correlated with expansion of map
representations in recovery after motor cortex deficits (Nudo, Plautz, & Frost, 2001;
Nudo, Wise, SiFuentes, & Milliken, 1996) or in learning, not just repetitions (E. J. Plautz,
Milliken, & Nudo, 2000).
1 7
2.2.2. Previous computational models of the cortex
There are few models which directly reproduce a role of the primary motor cortex.
On the contrary, models for sensory cortex have been suggested. More theoretically, this
cortical model has been conceptualized as a Self-Organized Feature Map, also known as
SOFM or SOM (Kohonen, 1973; von der Malsburg, 1973).
Due to its simplicity and theoretic clarity, it is better to start with SOM,
understanding a common role of cortical models. First of all, the purpose of cortical
models is a topographical map formation as a brain exhibits. Even though the meaning of
map representation is closely correlated with what the motor cortex encodes, we can call
it a ‘feature’ here to avoid a debate, in most cases, that is the input pattern. Then
developing a topographic map is a kind of feature extraction. The Kohonen map
(Kohonen, 1973) has been used to visualize high dimensional data to the lower
dimensional space, especially two dimensional or one dimensional space. During this
mapping, similar input patterns will be located close to each other and opposite input
pattern will be dissociated. In other words, if the input patterns are similar, a location and
evaluation of mapping on two-dimensional space should be similar each other.
The basic configuration contains one layer of neurons which receives an input
pattern, another layer of neurons which represent network output, and synaptic weight
between two layers. The two principal components of the cortical model, which asserts
SOM, are (competitive) Hebbian learning and a local activation rule (von der Malsburg,
1973). The Hebbian learning rule strengthens synaptic weights between neurons in its
input layer and neurons in its output layer. After sufficient but not-excessive learning, the
weight represents correlation between an input pattern and a most frequently activated
1 8
neuron for the pattern. The local activation rule is a source of spatial ordering. If a neuron
is activated saliently with correlation of a certain input pattern, the neighborhood neurons
would be activated together because the similar input pattern would activate the
neighborhood neurons. Commonly used local activation rules are Gaussian diffusion,
which induces activation on neighborhood neurons and the Mexican hat activation, which
has on-center and off-surround activation pattern (Kuffler, 1953).
Even though the primary cortex contains six layers, most of computational models
tend to simplify it as one layer or two layers of neurons, which accounts for the layer V,
due to its majority in the primary motor cortex. However, interestingly, all the models
reviewed here contain lateral connectivity or following collective excitatory characteristic,
which both correspond to the local activation rule. A positive short range excitatory
connectivity accounts for the simple Gaussian diffusion and center-on activation of the
Mexican hat activation in the local activation rule. On the hand, combination of a long
range excitatory connectivity and a short range inhibitory connectivity account for the
surround-off of the Mexican hat activation.
Chernjavsky and Moody (Chernjavsky & Moody, 1990a; 1990b) suggested and
compared models for the sensory cortex, which contain three layers: pyramidal neurons
and GABA inhibitory neurons besides an input layer, followed by Pearson et al. (1987).
They proposed two different ways, that GABA neurons inhibit pyramidal neurons; an
addictive way—inhibit with negative weights, and a shunting way, that is, activation of a
GABA neuron affects the associated pyramidal neurons. They emphasized fixed
introcortical property not fixed introcortical plasticity, that is, the Mexican hat activation
pattern to obtain the map formation. Indeed, their shunting inhibitory circuit allowed us
1 9
to modify intracortical connectivity, keeping the characteristic of the Mexican hat
activation and increasing size of modularity. As modulating lateral connectivity, they
constructed the model whose neurons may have various extent of lateral connectivity.
A competitive distribution theory (Reggia, D'Autrechy, Sutton III, & Weinrich,
1992) is a suggested mechanism of a neocortex, involving multiple cortical areas, with
assumption that there is very few negative connectivity. Each cortical area was abstracted
as single-layered neurons on a two dimensional sheet. Each neuron connected with its
neighbors with lateral connectivity and with neurons on the other cortical area with
projection. They asserted that each neuron’s projection strength may be affected by its
neighbors’ activation level. In other words, when a neuron is surrounded by its
neighborhood neurons, if their activation is high, the neuron would innervate other
neurons less. A model with the theory exhibits the Mexican hat activation pattern on a
positive bias without inhibitory connectivity as modulating projection patterns. The
theory was applied for thalamocortical reorganization of the somatosensory cortex after a
focal lesion with a cone-shape cortico-cortical projection (Armentrout, Reggia, &
Weinrich, 1994; Sutton III, Reggia, Armentrout, & D'Autrechy, 1994) and development
of the somatosensory cortex for proprioceptive sensory information (Cho & Reggia,
1994). With a static but biologically plausible arm model and, a loop including the motor
cortex, motoneurons in spinal cords, the arm model, proprioceptive neurons (Cho &
Reggia, 1994) and a proprioceptive cortex in order was suggested with respect to
development (Yinong Chen, 1997; Y. Chen & Reggia, 1996) and with respect to
reorganization after a focal lesion (Goodall, Reggia, Chen, Ruppin, & Whitney, 1997).
2 0
There is another approach to solve a network’s asymptotic status with multiple
modes (Dayan, 2003; Olshausen & Field, 1996; von der Malsburg, 1973). With a fixed
structure and dynamics of a neural network, they solved sets of differential equations.
Then they found standing wave modes on two dimensional spaces; that is a survived
cortical representation. Olhausen et al. (1996) introduced a sparseness of coding and it
shed a light on how the cortical model can obtain multiple level of sparseness.
2.2.3. Advantages of population coding
The central nervous system is noisy both in inflow sensory information and in
outflow motor commands. However, our perception of the outer space is quite precise
and our actions are not trembling as much as a neuronal signal does. This insensitivity to
noise of the central nervous system may owe to population coding (Knill & Pouget, 2004;
Pouget, Dayan, & Zemel, 2003). With summation of neural information of the multiple
neurons which receive the same information (Miller, Jacobs, & Theunissen, 1991;
Theunissen & Miller, 1991) with independent noise, the collapsed noise cancels each
other because their mean may be zero (Pouget et al., 2003).
The Bayesian approach to analyze the brain’s behavior (Knill & Pouget, 2004) is
recently popularized even though the probabilistic approach started at 1925 (Helmholts).
Due to a noise, uncertainty obscured a precise and clear environmental state. The optimal
way to infer certain information from the uncertainty is Bayesian approach (Knill &
Pouget, 2004; Pouget et al., 2003). The cortical map representation also can be explained
with a similar way (Barber, Clark, & Anderson, 2003; Zemel & Dayan, 1997), that is, a
Bayesian coding hypothesis (Pouget et al., 2003). Though integration of two probabilistic
2 1
coding requires multiplication or convolution of two probabilistic distributions, if each
distribution is transformed to its log, the integration can be done only with a summation
due to characteristic of a log function (Rao, 2004).
2.3. Models of motor control
2.3.1. Optimality of motor control
Nelson (1983) first suggested the idea that planning in the CNS may be optimization
of movement to minimize energy consumption, satisfying the goal of movement. In a
reaching movement, the optimal movement consumes the least energy among all the
movements which accurately reach to a target. Following his idea, to measure energy
consumption, several kinematic characteristics have been proposed: minimum torque
change (Uno, Kawato, & Suzuki, 1989); minimum muscle tension-change (Masazumi
Katayama, 1993; Uno, Suzuki, & Kawato, 1989); minimum jerk (Flash & Hogan, 1985);
and minimum variance (Harris & Wolpert, 1998).
The purpose of these models of optimized planning is not to illustrate development,
and therefore it does not provide a neural structure. Instead, it provides clues to what
characteristics may reveal an optimized behavior. In other words, these models showed
that energy consumption minimization with a specific characteristic may explain human
beings’ optimal behaviors. This is not a model, but an idea which may be used when we
analyze the behavior, generate a trajectory, or even generate a set of motor commands.
2 2
2.3.2. Motor control procedures
A computation model of voluntary movement suggested by Kawato, Furukawa, &
Suzuki (1987), provided a framework which contained speculative descriptions of
procedures of the central nervous system, which included: trajectory formation,
transformation of coordinates, and generation of motor commands, whether or not the
central nervous system (CNS) uses feedback control or feedforward control (Masazumi
Katayama, 1993). The first of these procedures, trajectory formation, includes the
planning of voluntary movement. For visually-guided reaching, this planning appears to
be done in task-oriented (peripersonal) coordinates, that is, either shoulder-centered
(Cartesian) coordinates or fixation-centered (visual) coordinates (Buneo et al., 2002;
Shadmehr & Wise, 2005; Wolpert, Ghahramani, & Jordan, 1995a) because recognition of
the target is usually processed according to the visual coordinate. Planning for reach-to-
grasp starts with the current status of the body, that is, the location and orientation of arm
and hand, and the current location of the target. The following step, transformation of
coordinates, involves transformation of information from the task-oriented planning level
to such that it can be transmutes lower levels such as joint coordinates and muscle
coordinates. Finally, the information must guide to generation of motor commands to
activate motoneuron pools of the level of the spinal cord. Wolpert (1997) also suggested a
sequential motor hierarchy of motor control procedures for reaching including: extrinsic
task goals, hand path, hand trajectory, joint kinematics, muscle activation, and neural
commands.
Feedforward control is distinguishable according to whether or not, after initiating
the movement, the central nervous system considers feedback from the current status of
2 3
the body; feedforward control rarely uses such feedback. The model of Kawato et al.
(1987) implies a control system of multiple feedback. First, feedback used in the planning
level is a long-loop (conceptual) feedback system using the visual process. Second,
feedback in the subsequent level (generation of motor command) is produced by a short-
loop feedback system using the spinal cord.
Following are brief introductions of each procedure proposed by Kawato et al.
(1987) and reviews of corresponding control models explained by their framework.
2.3.2.1. Modeling procedures of trajectory formation
As introduced in the section 2.3.1, optimality principle may generate an optimal
trajectory, as minimizing a certain cost function which is related to kinematic
characteristics. The most famous cost function among kinematic characteristics is
provided by the minimum jerk model (Flash & Hogan, 1985). The minimum jerk model
states that optimal movement has the characteristic of minimization of the jerk, that is,
the square of the third derivative of the movement, over a whole trajectory. This well
predicts stereotyped smooth reaching movement in Cartesian coordinates, that is, the
almost straight trajectory from the initial position to the target with symmetric bell-
shaped velocity profile, reported by many authors (Flash & Hogan, 1985; Morasso, 1981;
Paulignan et al., 1997). I noted that the model was not exactly matched a procedure of
trajectory formation in the motor control procedure by Kawato et al. (1987). However,
necessity of pre-computed trajectory in feedforward control framework may lead use of
the model for trajectory formation.
2 4
The vector integration to endpoint (Bullock & Grossberg, 1988; Bullock, Grossberg,
& Guenther, 1993), also called as VITE, provided the basic idea of a trajectory generator
based on a difference vector. The difference vector, also known as the displacement
vector, is a vector from a hand (end-effector) to the target. This difference innervates the
reaching movement. If the hand is far from the target, the size of the difference vector
toward the target is large. In this case, the movement is fast. If the hand is close to the
target, the size of the difference vector is small. The movement is slow and precise to
decrease end-point error. The vector integration of piecewise difference vectors
constructs the trajectory with appropriate scaling. However, this naïve difference vector is
exclusively inspired by a feedback controller and the movement has an asymmetric
velocity profile. In the case of a fast ballistic reaching movement, the velocity profile of
the fast reaching movement is generally symmetric and bell-shaped, and it is assumed
that a feedforward controller is responsible for the fast movement, because sensory
feedback delay is usually quite long. Thus, Bullock et al. modulated the difference vector
with a GO signal in order to adapt the movement based on a feedback loop to one based
on a feedforward loop. Although Cisek et al. (1998) indicated possible corresponding
neural areas such as area 5d of the posterior parietal cortex for computing the difference
vector, there is no explicit evidence for the GO signal.
Another model based on the difference vector is a motor primitive (Ijspeert,
Nakanishi, & Schaal, 2002; S. Schaal, 2003). Instead of using the GO signal, the motor
primitive is modulated through another system, which accounts for movement progress.
In other words, the modulation function is an arbitrary function of how much of the
movement is finished in position. The virtue of this model is the flexibility of the
2 5
modulation function. It can be fit to any given trajectory, including a reaching trajectory
with a bell-shaped velocity profile and periodic movement with a sine function velocity
profile. This flexibility opens the model to the motor command generation. Because the
model can learn a sequence of location, that is, a trajectory, when we fed a sequence of
motor commands as a target to learn, the model will learn a sequence of motor command
in internal coordinate, which leads the motor primitive to imitation learning (S. Schaal,
Ijspeert, & Billard, 2004; S. Schaal, Peters, Nakanishi, & Ijspeert, 2004).
2.3.2.2. Coordinate transformation from plan to control: Inverse kinematics, inverse
dynamics and the moment arm matrix
It is widely accepted in robotics that the transformation from peripersional
coordinates to motor commands contains successive procedures, 1) inverse kinematics
for transformation from peripersonal coordinates to joint coordinates, and 2) inverse
dynamics for transformation from joint coordinates to joint force (torque) coordinates. In
robotics, the inverse dynamics occur at almost the last step, that is, at the generation of
motor commands, because motor commands require joint force, that is, torque. However,
the human body is controlled with muscles, not motors. As Wolpert (1997) noted, there
are two more levels of generating motor commands: desired muscle lengths or desired
muscle tension, and motoneuron activation.
Muscle coordinates (Masazumi Katayama, 1993; M. Katayama & Kawato, 1993;
Schweighofer, Arbib, & Kawato, 1998; Schweighofer, Doya, & Lay, 2001; Schweighofer,
Spoelstra, Arbib, & Kawato, 1998; Spoelstra, Schweighofer, & Arbib, 2000) are related to
the attachment site of a muscle on a bone and to the current posture of the skeleton. The
2 6
desired muscle lengths may be transformed to gamma motor commands for slow
feedback control. The desired tension of muscles from the joint torques is information
which is related to feedforward control, which is computed in a moment arm matrix
(Masazumi Katayama, 1993; M. Katayama & Kawato, 1993; Lan, 2002; Schweighofer,
Arbib et al., 1998; Schweighofer et al., 2001; Schweighofer, Spoelstra et al., 1998).
Theoretically, a neural network with three layers of weights can represent any non-
linear function (Bishop, 1995; Kolmogorov & Fomin, 1957). Kolmogorov’s theorem is
that any continuous function can be reconstructed with superposition of small linear
functions. The error back-propagation algorithm (Rumelhart, Hinton, & Willams, 1986)
provided a way to adapt a multilayer neural network. The algorithm uses the values of
error between outputs of a network and the desired outputs of the network. These values
are used to adjust the last layer of weights, and are back-propagated to adjust the other
layers of weights.
Widrow & Stearns (1985) and Jordan & Rosenbaum (1989) applied a neural
network to the motor control problem, without considering neural structures. They
indicated that a ‘learner’ of the brain obtained an associate mapping from the current state
of the plant, that is, an arm posture in reaching behavior and the desired state of the
trajectory to the motor command. This mapping is one form of an inverse model. If a
learner is a multilayer neural network, the inverse model includes both inverse kinematics
and inverse dynamics (Note, in their approach, Widrow & Stearns (1985) and Jordan &
Rosenbaum (1989), that the control signal is a torque, even though in the framework of
the inverse model it does not matter whether the control signal is a torque or not). Jordan
& Rumelhart (1992) indicated that the neural network of the inverse model may fail
2 7
because inverse kinematics with multiple links is generally ill-conditioned. In other
words, the inverse kinematics may be not-a-function. This ill-condition problem of the
inverse model emerges not only because of inverse kinematics, but also because of other
coordinate transformations such as inverse muscle transformation. As pointed out by
Wolpert (1997), the inverse model generally has one-to-many redundancy and this
redundancy means that the inverse model may be not-a-function. In most cases,
additional constraints resolve this redundancy problem. These constraints may be
characteristics of a human behavior, for example, a posture with minimum displacement
in a joint angle representing efficiency in movement for the ill-conditioned inverse
kinematics; or a neural structure in the human brain, such as that which accounts for
distal supervised learning (Jordan & Rumelhart, 1992). Jordan and Rumelhart suggested
a serialized forward model of distributed adaptation. This is the many-to-one mapping
(function). It provides constraints on the inverse model, and solves the ill-condition
problem of the inverse model. This conceptual schema of the forward model and the
inverse model was extended to multiple instances of the forward model and inverse
model. This extension is called modular selection and identification for control (Haruno,
Wolpert, & Kawato, 2001; Wolpert & Kawato, 1998).
According to Masazumi Katayama (1993), computational implementations of motor
control in human reaching support the idea that coordinate transformation is a possible
motor control procedure, even though he did not indicate a corresponding neural structure.
His modeling approach includes procedures using inverse kinematics, inverse dynamics,
inverse of the moment arm matrix, and inverse muscle. Each procedure usually has a
redundancy. Katayama provided a possible constraint on each procedure: smoothness of
2 8
trajectory in the case of inverse kinematics, minimum motor-command-change in the
case of inverse dynamics, and the least residue through pseudo inverse for the inverse of
the moment arm matrix and inverse muscle.
Katayama (1993) also provided integrated procedures describing trajectory
formation and partly describing coordinate transformation. First, he decomposed the
inverse model into 1) the inverse static model (ISM) and 2) the inverse dynamic model
(IDM). This decomposition reveals that the IDM may correspond to the virtual trajectory
control hypothesis (Bizzi, Accornero, Chapple, & Hogan, 1984; Hogan, 1984). Katayama
used this virtual trajectory control hypothesis, which is based on the idea of the
equilibrium-point hypothesis (Feldman, 1981, 1986). If a motoneuron is activated for a
long time, an arm controlled by muscles, converges to a certain posture at an equilibrium-
point (EP) in a reaching behavior due to the viscoelasticity of the muscle. When the
movement is slow, the trajectory of a series of these EPs is the same as the actual desired
trajectory because ISM is responsible for a slow movement. On the other hand, when the
movement is fast, the trajectory of a series of these EPs is different from the actual
desired trajectory because IDM is responsible for a fast movement. The virtual trajectory
hypothesis seeks a trajectory of a series of EPs which generates the desired trajectory
after accounting for dynamic forces, including centrifugal force and the Coriolis force
(Masazumi Katayama, 1993; M. Katayama & Kawato, 1993). Other researchers
(Bhushan & Shadmehr, 1999; Shadmehr & Wise, 2005) also modeled a possible structure
of this procedure, transformation of coordinates.
The feedback error learning proposed by Kawato & Gomi (1992) showed that motor
command of a feedback control module may teach a feedforward control module –an
2 9
application of the inverse dynamics – in a structure with both a feedforward control
module and a feedback control module. It has been suggested that this feedback error
learning procedure may reside in the cerebellum (Schweighofer, Arbib et al., 1998;
Schweighofer et al., 2001; Schweighofer, Spoelstra et al., 1998). These authors indicated
that the neural structure of the cerebellum may account for the feedback error learning
procedure; for example, inferior olives carry error signals. James S. Albus (James S.
Albus, 1975; 1975; Albus, 1979) proposed the cerebellar model articulation controller
(CMAC) as another model of motor control using the cerebellum.
2.3.2.3. Generation of motor commands
The generation of motor commands in the CNS may take place along two different
pathways, a descending pathway and a local-loop pathway. The spinal cord is the neural
structure closest to an actuator, a muscle. The descending signal from the motor cortex
innervates motoneurons in the spinal cord, which contains both feedforward control
signals and long-loop feedback control signals (long-loop reflex). The local-loop pathway
(short-loop reflex) includes the gamma-motoneuron circuit, which stabilizes the human
body. Computational models of this local-loop pathway have been proposed (Shadmehr
& Wise, 2005; Song, LAN, & Gordon, 2006).
The focus of the study related to generation of motor command is how the brain
generates the descending signal using the motor cortex. This generally contains two
different views. The first view uses the inverse muscle model, which conceptualizes a
procedure from the joint force to motoneuron activation (Masazumi Katayama, 1993; M.
Katayama & Kawato, 1993; Schweighofer, Arbib et al., 1998; Schweighofer et al., 2001;
3 0
Schweighofer, Spoelstra et al., 1998). The second view is based on the possibility that
simple adaptive weights between the motor cortex and motoneurons of the spinal cord
learn this association (Armentrout et al., 1994; Yinong Chen, 1997; Y . Chen & Reggia,
1996; Goodall et al., 1997; Spoelstra et al., 2000).
2.3.3. Integrated models of motor control procedures
Section 2.3.2 described the decomposition of motor control procedures using
Kawato et al. (1987)’s framework. However, it is still unclear whether the central nervous
system uses this procedure (Kawato et al., 1987) in the exact sequence they described. A
process may exist which includes both trajectory formation and transformation of
coordinates, or the process may include all the separated motor control procedures (Uno,
Kawato et al., 1989). Instead, a few procedures were fused together or were not required
for a certain model. Models which do not follow the framework (Kawato et al., 1987),
will be reviewed in this section.
2.3.3.1. Optimality based controllers
Hoff-Arbib model (Hoff, 1992; Hoff & Arbib, 1993) do not control a multi-link arm.
Thus, it does not produce a motor command, which controls muscles or joint directly.
Instead, it produced third derivative of the movement as a control signal and the result of
the model is a trajectory of point-mass. Its authors extended the minimum jerk model
(Flash & Hogan, 1985) with an estimation of the time required to finish the movement,
and applied the estimated time to the reach-to-grasp problem. In the original minimum
jerk model, the target position and the movement time were fixed. However, the Hoff-
3 1
Arbib model adapted the value of the time required so that it was based on the current
state of the hand and the state of the target. This allowed the model to support a perturbed
target experiment. (Paulignan, Jeannerod et al., 1991; Paulignan et al., 1990; Paulignan,
MacKenzie et al., 1991) introduced the experiment of the perturbation of a target after
initiation of a movement. The Hoff-Arbib model showed that grasping affects reaching.
The authors investigated coordination between reaching and grasping in cases in which
the required time for reaching and the required time for grasping do not agree, through
selecting the movement time of both over longer time periods.
Hoff-Arbib model (Hoff, 1992; Hoff & Arbib, 1993) derived a control law from a
simple cost function, which is sum of jerks. The difference from the minimum jerk model
(Flash & Hogan, 1985) is that Hoff-Arbib model used (filtered) feedback signal to
generate motor command online, while minimum jerk model used a target location as a
boundary condition of optimization, which cannot update on-line. Basically, Hoff-Arbib
model is a feedback controller. However, the resultant trajectory is almost identical with
the trajectory of minimum jerk model that shows feedforward controller’s characteristics:
a bell shaped velocity profile. The feedback controller which shows characteristic of
feedforward controller inspired the optimal feedback controller framework (Todorov,
2004, 2005; Todorov & Jordan, 2002) later.
The minimum variance theory (MVT) was proposed by Harris & Wolpert (1998).
The theory is that minimizing the motor variance during the post-movement period will
determine the optimal trajectory. Here the motor variance is connected to the signal
dependent noise (Clammam, 1969; Harris & Wolpert, 1998; K. E. Jones, Hamilton, &
Wolpert, 2002; P. B. Matthews, 1996), given that a neural signal has signal dependent
3 2
noise proportional to the square of the neural signal’s magnitude. This cost function
provides sufficient speed without under-shooting the reaching trajectory, because a signal
with a larger magnitude induces greater noise and the greater noise induces wider
variance. Minimizing motor variance during the post-movement period regulates the size
of the neural command. Harris & Wolpert (1998) showed that MVT produces stereotyped
bell-shaped velocity profiles with agonist-antagonist motoneuron activation time courses,
similar to the profiles produced by EMG data. Simmons & Demiris (Simmons & Demiris,
2004, 2005, 2006) suggested a variation of the minimum variance theory by introducing
an adaptive (optimal) controller. The original MVT model (Harris & Wolpert, 1998)
generated a feedforward signal with cost optimization. In other words, the original MVT
model did not use a feedback signal at all. However, Simmons and Demiris’ modified
model computed the adaptive gain which is multiplied to the feedback signal.
2.3.3.2. Direct mapping from displacement vector to motor command
Kuperstein (Kuperstein, 1988a, 1988b; 1988c) proposed a model of direct mapping
from the visual displacement map. In his model, the mapping was obtained through a
target map and an input map, and applied to the muscle map. He used the supervised
learning rule for adapting weights between the target map and the muscle map. What is
novel in this model is that there is no outer teaching signal. Instead, using the motor
babbling phase, the model internally generates an error signal to adapt the weights,
similar to the unsupervised learning. In summary, the model learns a direct mapping from
the target to the muscle motor command for reaching. This direct visuomotor
transformation has also been researched by Baraduc et al. (2001) and Mel (1991).
3 3
If this visual displacement map is updated online, this network works identically
with a position-based feedback controller. The difference of this network from a classical
feedback controller in control theory is that feedback gain varies dependent on
displacement, where the connection strength represents a specific feedback gain
associated with a certain amount of displacement. To include derivative term of feedback
controller, we may extend the visual displacement map to visual displacement and
velocity map.
To obtain association between feedback gain and amount of displacement, there are
two major way to solve it. The first is optimization through vector of calculus, as Hoff
and Arbib (Hoff & Arbib, 1993) did, or through linear quadratic regularization (LQR), as
Todorov and Jordan (Todorov & Jordan, 2002) did. The second is using reinforcement
learning.
First, optimization can associate the difference vector to the motor command with
minimizing a certain cost function. Hoff-Arbib model used vector of calculus to minimize
sum of jerks (for detail, see the section 2.3.3.1.). Another method is a linear quadratic
regularization (LQR) under existence of signal dependent noise (Todorov, 2004, 2005;
Todorov & Jordan, 2002). Because the feedback signal is noisy and contains a delay, a
feedback controller might be unstable. To compensate this instability, we should estimate
the current location through an optimal estimator like the Kalman filter (Kalman, 1960;
Wolpert, Ghahramani, & Jordan, 1995b) or the Bayesian sensory integration (Kording &
Wolpert, 2004; van Beers, Wolpert, & Haggard, 2002). Todorov and his colleague
(Todorov, 2004, 2005; Todorov & Jordan, 2002) called a motor system which contains
both LQR and Kalman filter ‘optimal feedback controller’. While early optimal feedback
3 4
controllers controlled a mass-point instead of a realistic arm – thus it was not a direct
mapping to the motor command for muscles or joints, recently, it can generate muscle
commands through hierarchical structure (W. Li, Todorov, & Pan, 2004, 2005).
Second, it is also possible to obtain such a feedback controller through
reinforcement learning (Richard S. Sutton & Barto, 1998). Reinforcement learning may
be a variation of unsupervised learning which takes the form of a learning rule (Schultz,
1998), but it has a ‘goal-directed’ reward signal. This signal modulates the unsupervised
learning negatively when the movement is unsuccessful, or positively when the
movement is successful. Reinforcement learning with a temporal difference (TD) reward
signal works well in classical control problems, such as that of an inverted pendulum or a
cart-pole (Barto, Sutton, & Anderson, 1983; Doya, 2000b; R. S. Sutton, 1988; Richard S.
Sutton & Barto, 1998). Usually, the reward signal is given after a long delay. Due to this
delay, temporal credit assignment is an issue of reinforcement learning, that is, when the
motor command affected on unsuccessful movement or successful movement. This
evaluation is only available when the movement is finished. Thus, there may be a
learning system to estimate the contribution of a current motor command, a critic in the
conceptual schema, and perhaps in the basal ganglia of the brain. The existence of
temporal difference reinforcement learning is supported by research on dopamine
conditioning in the basal ganglia (Barto & Sutton, 1982; Schultz, 1998; Schultz,
Tremblay, & Hollerman, 1998). If there is an unexpected reward with a delay, the basal
ganglia first emit dopamine. However, after a few trials with delayed reward, the basal
ganglia expect that the reward to be delayed, and dopamine emission decreases.
3 5
Reinforcement learning has been applied to obtain a direct mapping from the
displacement map to motor commands for the arm’s reaching movement (Bissmarck,
Nakahara, Doya, & Hikosaka, 2008; Franklin, 1988; Izawa, Kondo, & Ito, 2004;
Kambara, Kim, Shin, Sato, & Koike, 2009; Shibata & Ito, 2003; Shibata, Sugisaka, & Ito,
2000). Kambara et al. (2009) emphasized that a feedback controller can be obtained
through the reinforcement learning and can teach the forward model more precisely
through feedback error learning (Kawato & Gomi, 1992). Earlier, Bissmarck et al. (2008)
also obtained this feedback controller from a feedback signal to the motor command,
which is torque, while outputs of Kambara et al. were muscle activation patterns.
Bissmarck et al. (2004) showed that the different time lengths of sensory feedback delays
induce switching from a visual feedback controller to a somatosensory (proprioceptive)
feedback controller. At the beginning of a training session, the visual feedback controller
works better than the somatosensory feedback controller, which is not yet adapted, even
though the visual feedback controller is still imperfect. After some trials with
reinforcement learning, due to the shorter delay of somatosensory feedback, the
performance of the somatosensory feedback controller improves. Then the learning
procedure concentrates on adapting the somatosensory feedback controller. As a result,
Bissmarck et al.’s model (2004) can generate appropriate motor commands without a
visual feedback module. In summary, the coarse visual feedback module with a longer
delay helps adaptation of the somatosensory feedback module with a shorter delay and
finally the latter dominates. These results may imply switching from a feedback
controller to the feedforward controller, as shown in feedback error learning (Kawato &
Gomi, 1992).
3 6
A virtue of reinforcement learning is that it may also include optimality through a
reward signal. When the reward signal accounts for how efficient movements were, it can
capture various characteristic of motor behavior. Because the objective of the
reinforcement learning was to maximize this reward, the learning procedure is considered
as an optimization procedure (Richard S. Sutton & Barto, 1998). Combining with the
optimality principle, reinforcement learning may lead efficient and human-like motor
behaviors through an appropriate reward function.
2.3.3.3. Motor cortex models based on directional coding
The “muscle-versus-movement” debate (Asanuma, 1989; Georgopoulos et al., 1986)
concerned the issue of whether or not the activity of neurons in the motor cortex
correlates with muscle contraction or movement of the limbs. Georgopoulos et al. (1986)
reported that neurons in the motor cortex of a monkey activate differently according to
different directions of movement, a finding yielded by recording neuronal activity during
an active movement. This directional tuning of neurons collectively generates
motoneuron activation. The collective integration of directional tuning implies population
coding. The reconstruction of a trajectory may be obtained through integration of a
population coding with appropriate scaling. Lukashin & Georgopoulos (1993) proposed a
model of integrating the population vector and reconstructing the trajectory using neuron-
like units. The model does not explain how the motor cortex activation innervates
motoneuron activation in the spinal cord because it is focused on projection from a supra
level of the motor cortex, such as the premotor cortex or the prefrontal cortex.
3 7
Lukashin, Amirikian, & Georgopoulos (1996) complemented the model of
population coding (Georgopoulos et al., 1986) by adding a two-layered neural network
consisting of an interneuron (IN) layer and a motoneuron (MN) layer. Here, the supra-
spinal units (SS) have characteristics of population coding. In this model, projection from
SS to IN activates the IN neurons, which correspond to specific equilibrium positions in
the workspace. Then, the IN layer projects to the MN layer in order to exert the force
command to the muscles.
Reinkensmeyer and his colleagues (Reinkensmeyer et al., 2003) developed a model
of motor cortex to explain a previous data (Reinkensmeyer et al., 2002), which shows
that stroke affects distribution of population coding. Their model did not have a multi-
link arm, nor a point mass. So, the model did not generate any of motor command for the
trajectory. Instead, the population vector which indicates an initial movement direction is
constructed before and after stroke, explaining more variability of initial directions in
stroke subjects shown in (Reinkensmeyer et al., 2002).
2.3.4. Analytical model of the motor procedure in the motor cortex
Todorov (2000a) proposed an analytical model of the primary motor cortex, whose
dominant neural coding is muscle coding, with biologically plausible but simple (not
including the musculoskeletal system) arm model.. This model is not developmental but
optimal, not causal but predictive, which propagated reaching behavior up to the primary
motor cortex through the muscle model and cortico-motoneuronal projection with
physiologically sound assumptions of each step and theoretically optimal principles for
motor control. He started from a desired trajectory. Then, considering an arm model with
3 8
muscles, he obtained a desired motoneuron activation, which generates a desired
trajectory. Using a principle of optimality, which is called cosine tuning, he derived the
desired motor cortex activation pattern from the desired motoneuron activation. That is, if
a motor cortex activation pattern generates a movement, the movement helps to find a
motor cortex activation pattern. This predictive (reverse of causal link) procedure
indicated that the motor cortex activation still contains the population coding, as
suggested by Georgopoulos et al. (1986).
However, Todorov’s (2000a) conclusion differs from conclusion of Georgopoulos et
al. (1986). After statistical analysis of his computational model, Todorov (2000a)
concludes that the motor cortex activation pattern which is based on the arm model with
muscles and population coding is only one aspect of the activation pattern. He
statistically analyzed neuronal activities with velocity (direction), position and
acceleration of movements and classified them. Then he asserted that the linear
component of the muscle coding, which is a direction of movement, may be measure
even though there only exists a muscle coding. Todorov’s model (2000) showed motor
cortex activation which corresponds to a specific movement, but did not explain how the
motor cortex activation pattern originates.
Guigon and his colleagues (Guigon, Baraduc, & Desmurget, 2007a, 2007b)
developed an analytical model of the motor cortex activation. The major difference of
this model from Todorov’s model (Todorov, 2000a) is whether the trajectory is optimized
or not. Todorov’s model generated a sequence of motoneuron activation along a given
trajectory and estimated a sequence of neurons’ activation in the motor cortex through
cosine tuning hypothesis (Todorov, 2002). On the contrary, Guigon and his colleagues,
3 9
first, optimized a trajectory and corresponding motoneuron activation sequence with a
given target information only (Guigon et al., 2007b). They assumed that each neuron on
the motor cortex has its fully connected to the motoneuron with random weights
uniformly distributed over [-1, 1], which may represents a muscle synergy pattern. Then,
they computed motor cortex activation sequence from the motoneuron activation
sequence, which minimizes a sum of total activation over the motor cortex, with given
sequence of motoneuron activation and weight matrices between motor cortex neurons
and motoneurons. Though this model reproduced many physiological data,
fundamentally, this model has the same structure with Todorov’s model, in terms that the
motor cortex activation is computed from the motoneuron patterns (indeed, an optimal
trajectory). In other words, both models (Guigon et al., 2007a; Todorov, 2000a) did not
compute motor cortex activation patterns over time only for a given target in a straight
forward way.
2.3.5. Limitation of computational models
In summary, the current models, shown in Chapter 2, have limitations. First, in most
cases, they are not biologically plausible. Most approaches of the supervised learning in
the motor cortex (neo-cortex) using a direct error signal are not possible (Doya, 2000a)
because the motor cortex does not have a neural structure such as the inferior olives,
which carries error signals. Moreover, the back-propagation algorithm assumes that the
learning algorithm knows all the activation of neurons. This assumption does not apply to
the neo-cortex, because the learning rule is not local; the unsupervised learning or
reinforcement learning rule is local (Barto, 1985; Crick, 1989; Roelfsema & van Ooyen,
4 0
2005). Second, models of motor control based on the optimality principle or population
coding are not developmental. Third, no model except that of Reggia group (Armentrout
et al., 1994; Yinong Chen, 1997; Y . Chen & Reggia, 1996; Cho & Reggia, 1994; Goodall
et al., 1997) introduces map formation in the motor cortex. If there is no map formation,
obviously, it is impossible to study reorganization of the motor cortex following a stroke.
Fourth and finally, the Reggia group’s models have limitation also. While Reggia group’s
models can show reorganization of the motor cortex after stroke, because they mostly do
not have goal-directed signals, the map formation was explained not functionally, but
developmentally. Moreover, it is hard to say how ‘movement’ is retrained, even though
the model showed retraining enabled an arm to reach a location, which a stroke disabled
the arm to reach. Yinong Chen (Yinong Chen, 1997; Y. Chen & Reggia, 1996) modeled
mapping from a goal-directed displacement map to motor cortex activation through
unsupervised learning. However, the coding of the displacement map is too simple –just
nine units for eight different directions and the null direction; moreover, unsupervised
learning cannot learn a convex problem.
4 1
Chapter 3.
Improving Spontaneous Use of the Affected Arm after Stroke:
Predictions from a Computational Model
1
3.1. Introduction
Stroke is the leading cause of disability in the US, and about 65% of stroke survivors
experience long-term upper extremity functional
limitations (Dobkin, 2005). Although
patients may regain some motor functions in the months following stroke due to
spontaneous recovery, stroke often leaves patients with predominantly unilateral motor
impairments. Indeed, recovery of upper extremity function in more than half of patients
after stroke with severe paresis is achieved solely by compensatory use of the less-
affected limb (Nakayama, Jorgensen, Raaschou, & Olsen, 1994). Improving use of the
more affected arm is important however, because difficulty to use this arm in daily tasks
has been associated with reduced quality of life (Duncan et al., 1999).
There is now definite evidence however that physical therapy interventions targeted
at the more affected arm can improve both the amount of spontaneous arm use and arm
and hand function after stroke (S. L. Wolf et al., 2006). Further, even after motor re-
training is terminated, performance can further improve in patients with less severe
strokes in the months following therapy (C. J. Winstein et al., 2004; S. L. Wolf et al.,
2007). A possible interpretation of this result is that the repeated attempts to use the
1
Stroke rehabilitation reaches a threshold, CE Han, MA Arbib, N Schweighofer, PLoS
Comput Biol., 2008 Aug 22; 4(8): e1000133 (selected as a featured research of the
month)
4 2
affected arm in daily activities are a form of motor practice that can lead to further
improvements in motor performance (C. J. Winstein et al., 2004).
The neural correlates of motor training after stroke have been investigated in
animals with motor cortex lesions (Kleim, Barbay, & Nudo, 1998; Nudo et al., 1996).
Specifically, a focal infarct within the hand region of the primary motor cortex causes a
loss of hand representations that extends beyond the infarction. However, several weeks
of rehabilitative training can overcome this loss of representation, and yield an expansion
of the hand area to its pre-lesion size; the larger area in turn has been correlated with
higher level of performance (Conner, Culberson, Packowski, Chiba, & Tuszynski, 2003).
Long-term potentiation in pyramidal neuron to pyramidal neuron synapses has been
demonstrated in horizontal lateral connections (Rioult-Pedotti, Friedman, & Donoghue,
2000), and may provide the basis for map formation and reorganization in the motor
cortex (Sanes, Suner, Lando, & Donoghue, 1988), and motor skill learning (Rioult-
Pedotti et al., 2000).
Contrasting with the increase in performance due to spontaneous recovery, a
concurrent decrease of spontaneous arm use has been proposed to occur following stroke.
This decrease may be due both to the higher effort and attention required for successful
use of the impaired hand and to the development of learned non-use (Sunderland & Tuke,
2005), in that the preference for the less affected arm is learned as a result of unsuccessful
repeated attempts in using the affected arm (Sterr, Freivogel, & Schmalohr, 2002; E. Taub
& Uswatte, 2003; E. Taub, Uswatte, Mark, & Morris, 2006). The Constraint Induced
Therapy (CIT) protocol, which forces the use of the affected limb by restraining the use
of the less affected limb with a mitt, has been specifically developed to reverse learned
4 3
non-use (E. Taub, Uswatte, & Elbert, 2002). Although its “active ingredients” are still not
well understood (Luft & Hanley, 2006), CIT has been shown to be effective in the
recovery of arm and hand functions after stroke in multi-site randomized clinical trials (S.
L. Wolf et al., 2006). Because 50% of the eventual improvement in use (as measured by
the questionnaire-based “motor activity log”) is seen at the end of the first day of CIT, it
has been suggested that CIT is effective in reversing learned non-use (E. Taub, Miller NE,
Novack TA, Cook EW III, Fleming WC, Nepomuceno CS, Connell JS, Crago JE, 1993).
To our knowledge, however, there are no longitudinal data tracking the development of
learned non-use just after stroke and during recovery.
In summary, increase in performance after stroke due to spontaneous recovery,
rehabilitation, or both does not appear to correlate simply with spontaneous arm use, and
a yet-to-be clarified non-linear mechanism seems to be at play. Here, we focus on
rehabilitation in the control of reaching post-stroke, a prerequisite for successful
manipulation. We developed a biologically plausible model of bilateral control of
reaching movements to investigate the mechanisms and conditions leading to such
positive or negative changes in spontaneous choice of which arm to use. Our central
hypothesis, based on the above observations, is the existence of a threshold in
spontaneous arm use: If re-training after brain lesion (or spontaneous recovery) increases
spontaneous arm use above this threshold, performance will keep increasing, as each
attempt to use the affected arm will act as a form of motor re-learning. The patient will
then enter a virtuous circle of improved performance and spontaneous use of the affected
arm, and therapy can be terminated. In contrast, if spontaneous use of the arm does not
reach this threshold after either natural recovery or rehabilitation, or both, performance
4 4
will not improve after stroke, and compensatory strategies with greater reliance on the
less affected arm will either remain or even develop further.
3.2. Methods
3.2.1. Behavioral Set-up
To model spontaneous use of one arm or the other, and changes in motor
performance, we simulated horizontal reaching movements towards targets distributed
along a circle centered on the initial (overlapping) positions of the two arms (Figure
3.1.A). Our computational model of bilateral arm use in arm reaching contains a left and
a right motor cortex, and a single action choice module (Figure 3.1.B). We first trained
the full model (the “normal subject”) to reach with either hand, but with a bias for using
the hand closer to the eventual target. Spontaneous arm use was recorded in a free choice
condition, in which the action choice module can select either arm to reach targets that
are randomly generated anywhere along the circle. Motor performance was evaluated by
the directional error between the desired movement direction and the actual hand
direction.
To simulate stroke, we partly lesion one hemisphere (i.e., remove a set of simulated
neurons from the simulation). We first simulate a spontaneous recovery period in which
the action choice module determines the choice of arm, and the state of motor cortex
determines error in reaching, with consequent changes in synaptic weights. We then
mimic CIT with a forced use condition in which only the use of the affected arm (i.e., that
contralateral to the lesioned cortex) was allowed. We study in simulations the conditions
that lead to successful recovery, that is, to high levels of spontaneous use and
4 5
performance with the affected arm in appropriate regions of space, and low reliance on
compensatory movements with the less affected arm.
Figure 3.1. (a) Experimental setup. (b) Model structure. Solid line: information
signal; dashed line: activation signal; dotted line: reward-based (reinforcement) learning;
double dotted line: error-based (supervised) learning.
3.2.2. Computational model
Our model has two distributed interacting and adaptive systems: the motor cortex for
motor execution and the action choice module for decision-making.
3.2.2.1. Motor cortex model
We made two assumptions to model the motor cortex with a left and a right module
for control of the contralateral arm:
1) The motor cortex contains neurons coding direction of hand movement
(Georgopoulos et al., 1986) with signal dependent noise (Lee, Port, Kruse, &
Georgopoulos, 1998; Reinkensmeyer et al., 2003). Although the issue of correlation
versus coding for hand directions is a subject of intense debate (Georgopoulos & Ashe,
4 6
2000; Moran & Schwartz, 2000; Scott, 2000b; Todorov, 2000b), computational models
have developed the view that motor cortex neurons linked to arm muscles exhibit activity
strongly correlated with hand direction in the initial phase of the movement (Guigon et al.,
2007a; Todorov, 2000a). This assumption allowed us to simplify the model considerably
by not requiring us to model a spinal cord, muscles, and arms linking the output of the
motor cortex to the behavior.
The activation rule of each motor neuron is given by a truncated cosine function
(Todorov, 2002) based on the empirical data of (Georgopoulos et al., 1986) which
correlates the firing rate of neuron i with the difference between the “preferred direction”
θ
p
i
(that associated with maximal firing of this neuron) and the currently chosen hand
direction, θ
d
:
[ ] [ ]
≤
>
= + − =
+
+
, 0 if , 0
, 0 if ,
re whe , ) , 0 ( ) cos(
x
x x
x N y SDN
i i
p d
i
σ θ θ (3.1)
where y
i
is the firing rate of the i
th
neuron. ) , 0 ( SDN
i
N σ is normally distributed signal
dependent noise with zero mean and standard deviation proportional to the mean signal
size (Harris & Wolpert, 1998; K. E. Jones et al., 2002; Lee et al., 1998; Reinkensmeyer et
al., 2003; Todorov, 2002), that is,
i
SDN
i
y k = σ where
i
y is the noiseless activation
[ ]
+
− = ) cos(
i
p d
i
y θ θ .
Summation of individual neuron vectors (with each vector length given by Equation
(3.1), and the vector direction given by the preferred direction) yields a population vector
that has been shown to be well aligned with the initial actual (executed) hand direction θ
e
(Georgopoulos et al., 1986). In our model, at each action, one half (left or right) of motor
4 7
cortex is chosen to control the next reaching movement (see below). Thus, we take the
actual reaching direction to be that given by the direction of the population vector of the
chosen motor hemi-cortex.
2) The motor system learns to generate reaching movements by minimizing error
bias and by recruiting more neurons for frequently used movement, in effect minimizing
directional variance (Reinkensmeyer et al., 2003) . We now specify how neurons’
preferred directions in the active hemisphere are slowly modified after each trial.
Mathematically, we view a learning rule as an adjustment of parameters that serves to
improve the performance of the system with respect to some criterion. As we shall see
below, such learning is not always best for other behavioral criteria. For the motor cortex,
we measure performance with the following cost function, which is a function of
reaching error and total neuronal activity:
∑
− − =
i
i
d e d
y E
2
2
) (
2
1
) ( λ θ θ θ (3.2)
where θ
d
is the desired direction, θ
e
is the direction specified by the population vector of
the motor cortex (a function of the synaptic weights therein), and λ is a free parameter.
The first term of the right hand side of equation (3.2) measures the directional error, and
the second part the total neural activity, which is related to the magnitude of the
population vector.
The cost can, with some approximation, be decreased by applying the following
motor cortex learning rule (see Appendix part A):
i i
p d UL
i
e d SL
i
p
i
p
y y ⋅ − ⋅ + ⋅ − ⋅ + ← ) ( ) ( θ θ α θ θ α θ θ (3.3)
4 8
where
SL
α and
UL
α are learning rates. The first term of the learning rule, a supervised
learning term that resembles a standard supervised learning rule in linear neurons (Hertz,
Krogh, & Palmer, 1991), decreases the global directional error. Support for this term of
the rule stems from monkey experiments, in which adaptation to an external force field or
to visuo-motor rotations induces neuronal reorganization of preferred direction in primary
motor cortex neurons (C. S. Li et al., 2001; Paz, Boraud, Natan, Bergman, & Vaadia,
2003). The second term of the learning rule, an unsupervised learning term that resembles
the standard unsupervised competitive learning rule (Hertz et al., 1991), orients the
neurons’ preferred directions towards the desired reaching direction.
3.2.2.2 Action choice module
In reinforcement learning, actions that maximize outcomes are selected based on
estimates of future cumulative rewards, or “values” (Richard S. Sutton & Barto, 1998).
Reinforcement learning provides a plausible framework for human adaptive decision-
making with desirable theoretical and biological properties, (Kawato & Samejima, 2007;
Samejima, Ueda, Doya, & Kimura, 2005; Schweighofer et al., 2006). There is evidence
that values are acquired by cortico-basal ganglia networks (Knutson, Taylor, Kaufman,
Peterson, & Glover, 2005; O'Doherty, 2004; Samejima et al., 2005), under the influence
of the dopaminergic system (Dominey, Arbib, & Joseph, 1995; Reynolds & Wickens,
2002). Further, it is likely that basal ganglia output releases inhibition of the motor cortex
for selected actions (Mink, 2003). Our action choice module (Figure 3. 1.B) thus utilizes
reinforcement learning to learn how to choose which arm to use in reaching each target
based on a comparison of the values of using one arm or the other. Such “action” values
4 9
have been recently shown to be represented in the striatum (Samejima et al., 2005). The
action values are learned from the reward prediction error δ, the difference between the
actual reward, which evaluates the executed action, and the predicted reward, as
estimated by the action value (Richard S. Sutton & Barto, 1998). We now turn to the
definition of these quantities.
Here, we use a total (internal) reward r
total
with two components: First, healthy
subjects tend to use the left arm to reach to the left, and similarly for the right, but with a
handedness preference near the midline (Mamolo, Roy, Bryden, & Rohr, 2005). As each
subject’s level of comfort correlates with arm use (Mamolo et al., 2005), we model
workspace preference of hand with a reward term that is positive if the right arm is used
in the right hand side workspace (RHS) or the left arm is used in the left hand side
workspace (LHS). Second, we use a performance-related reward term, which is high
when the executed direction θ
e
is close to the given desired direction θ
d
and low if the
direction of the actual movement deviates from the desired direction. The total reward is
thus given by:
=
∈ ∈ >
−
=
+ =
0
LHS & arm left or RHS, & arm right 0,
,
) (
exp ) , ( where
) , (
2
2
d d
reward
e d
e d direction
e d direction total
r
r r
θ θ
ρ
σ
θ θ
θ θ
ρ θ θ
(3.4)
where
reward
σ is the broadness of the reward function and ρ gives the workspace
preference of the hand.
The action choice module selects one of the arms for movement execution by
comparing the action values Q(a
i
, θ
d
), that is, the reward expected by selecting arm a
i
for
5 0
the desired direction θ
d
, with a
i
∈ [left, right] and θ
d
∈[0,360°]. Although a number of
function approximators can be used to learn the action values, our results are not
dependent on the exact choice of approximators. Here we used two radial basis function
(RBF) networks to estimate the action values, one for each of the two possible actions.
RBF is a form of linear regression with exponential basis functions; the estimated values
are thus computed with:
1
2
2
( , ) ( )
( )
where ( ) exp
n
j
j d i i d
i
j d i
j p
ACM
Q a w θ φ θ
θ θ
φ θ
σ
=
=
−
= −
∑
(3.5)
where Q is the estimated action value, w
i
j
are tunable weights for action a, n is the
number of RBFs, θ
i
is the center of the ith RBF, and
ACM
σ is the broadness of each RBF,
which is chosen to be equal to π/n as this allows good generalization (Doya, 2000b).
After each movement, the action value of each arm is updated with the reward
prediction error, that is, the difference δ = r
total
- Q(a, θ
d
) between the actual reward and
the expected reward. The weights w
i
a
are updated to minimize the square of the reward
prediction error.
) (
d i ACM
a
i
a
i
w w θ φ δ α ⋅ ⋅ + ← (3.6)
where
ACM
α is a learning rate.
Based on the action values, the module probabilistically selects which motor cortex
will be used to execute a movement according to the softmax function (Richard S. Sutton
& Barto, 1998):
))) , ( ) , ( ( exp( 1
1
) | (
d j d i
d i
a Q a Q
a p
θ θ β
θ
− ⋅ − +
= (3.7)
5 1
where the parameter β controls the variability of action choice, with a large β yielding
less variability, a
i
∈ [left, right] and θ
d
∈[0,360°].
3.3. Simulations
Strokes seem to affect only a certain range of movement directions. Outside this
range, reaching is relatively spared (Beer, Dewald, Dawson, & Rymer, 2004). To model
this effect, we removed the neurons with preferred directions in the first quadrant of the
left motor cortex (50% of the neural population coding for the right hand side workspace,
as shown in Figure 3.2.A.5), which controls the right arm (unless otherwise noted). The
results would be the same had we chosen the other arm, or any other quadrant. We also
tested stroke models in which neurons were affected probabilistically as a function of the
range angle (with neurons being removed with 100% probability for the central angle of
the simulated lesion and then with lower probability as the angles on each side of the
lesion center increase); simulation results with these stroke models were qualitatively
similar to those with the “hard boundary” model and thus for simplicity are not presented
here. We also tested different stroke patterns, including a lesion ranging from 45 degree
to 145 degree, and lesions with asymmetric bimodal distributions. Simulations (results
not shown) confirmed that such lesions did not produce results qualitatively different
from those presented here.
We used two measures of motor performance:
1. The absolute value of the directional error between the intended reach direction
and the population vector direction.
5 2
2. The magnitude of the population vector, normalized by the magnitude of the
population vector before stroke.
We chose these two performance measures in our model because they can be linked
to actual patient performance measures. Initial directional error has been used in
characterizing reaching in stroke patients (e.g., (Reinkensmeyer et al., 2002)). Although
the population vector is normally not directly observable in patients, it can be regarded as
a measure of force exerted by arm muscles on the hand (E. V . Evarts, 1968; Kalaska,
Cohen, Hyde, & Prud'homme, 1989; Todorov, 2000a), and low force generation is a
characteristics of stroke (Chae, Yang, Park, & Labatia, 2002). Because both use and
performance are stochastic, we report averages of 10 uniformly distributed samples over
the affected range in all graphs (except the pie charts of Figures 3.2, 3.8 and 3.9).
The changes in performance and spontaneous arm use of the affected arm were
recorded in four consecutive phases: (i) an acquisition phase of normal bilateral reaching
behavior in 2000 free choice trials (partially shown), (ii) an acute stroke phase of 500 free
choice trials, (iii) a rehabilitation phase in a forced use condition (variable number of
trials) and (iv) a chronic stroke phase consisting of 3000 free choice trials. Values of
performance and spontaneous use just after rehabilitation are called “immediate”; their
long-term values at the end of the chronic phase are called “follow-up.”
In all phases, targets were randomly generated at the start of each trial, distributed
uniformly across all possible angles. Unless otherwise stated, we used the following
parameters: Each motor cortex had 500 neurons, with initial preferred directions θ
p
uniformly distributed. The coefficient of variation of the signal-dependent noise ratio k
was 0.15. The motor cortex learning rates were
SL
α = 0.005 and
UL
α = 0.002. The action
5 3
choice module contained two networks of 20 radial basis function neurons with
reward
σ =
0.2 (in radians, ≈11.46°), ρ = 0.2,
ACM
σ = π/10 (in radians, =18°),
ACM
α = 0.1, and β = 10.
3.4. Results
The first (pre-lesion) phase provided a normal baseline for reaching behavior. For
each desired direction, learning achieved zero mean directional error (Figure 3.2.A.1) and
a tendency of right arm use for the right-hand-side workspace, and left arm use for the
left-hand-side workspace (Figure 3.2.B.1).
Just after stroke, however, the population vectors showed directional errors in and
around the affected range (Figure 3.2.A.2). Sufficient therapy (1000 forced use trials,
Figure 3.2.A.4) resulted in redistributing the preferred directions within the affected side
of motor cortex, with the population vectors re-aligned to the desired directions. Although
the re-alignment was not perfect, and a small range of preferred directions was still
missing, the directional errors were much reduced. This resulted in increased rewards in
these directions, thus increasing the action value for the affected arm, preparing the way
for increased use of the affected arm once free choice was allowed. Lack of therapy on
the contrary resulted in a still large missing range of directions (Figure 3.2.A.3).
At the end of the “acute stroke” period, the less affected arm largely compensated
for the more affected arm in the affected range (Figure 3.2.B.2). If no therapy followed,
this behavioral compensation remained (Figure 3.2.B.3). Sufficient therapy, however, led
on the resumption of free choice trials to increased spontaneous arm use of the more
affected arm (right arm) in the affected range (Figure 3.2.B.4) and almost restored it to its
pre-stroke levels.
5 4
Figure 3.2. Neuronal population coding (A) and spontaneous use (B) over the
workspace for the affected arm (1) before stroke, (2) after stroke, (3) after 3000 free
choice trials and (4) after 1000 forced used trials followed by 2000 free choice trials. In
(A), each population vector figure shows the desired reach directions (thin black arrows),
the neuron activation levels along their preferred directions (thin gray lines), and the
resulting population vector (thick black arrows). Note that there are no “votes” for
directions corresponding to the lesioned directions in A.2 and A.3 but that in A.4, many
neurons have become retuned to yield votes in the lesioned directions. In (B), the pie
plots show the probability of using the unaffected right arm to reach to targets arrayed on
a circle around the central position. In B.2 and B.3, the less affected arm reaches into the
lesioned quadrant, but this effect is reversed with therapy (B.4).
5 5
We then studied the time courses of motor performance measures and spontaneous
arm use (Figure 3.3). In the acute stroke phase, the free choice condition resulted in some
spontaneous recovery in performance, as the repeated attempts to use the arm, although
generated with poor performance, produced directional errors that re-tuned the motor
cortex. However, the poor performance of these initial repeated attempts to use the
affected arm caused a decrease in the action value for this arm in the affected directions,
leading in turn to a reduction in spontaneous arm use. Thus, a “learned non-use”
phenomenon occurred despite improving performance. After 500 trials of natural
recovery, a number of rehabilitation trials were given in the forced use condition.
Rehabilitation improved performance as expected, but its lasting effects on spontaneous
arm choice depended on the intensity of therapy. The increase in spontaneous arm use
returned close to 0% soon after the end of therapy if only 200 trials of therapy were given.
If 400 trials of therapy were given, spontaneous arm use held steady after therapy. If more
therapy was given, spontaneous arm use was high after therapy and kept improving for a
large number of trials thereafter.
The model thus exhibits a threshold for the intensity of rehabilitation. To precisely
quantify the threshold, we computed the change in spontaneous arm use following
rehabilitation by fitting a simple linear model with trials post stroke as predictor; the
number of trials corresponding to a null slope corresponds to this threshold. As shown in
Figure 3.4, with the default parameter set, there was a threshold at 420 trials of forced
used trials, above which spontaneous arm use increased even after therapy was
discontinued. Below this number of forced used trials, spontaneous arm use decreased to
minimal levels after rehabilitation – it was “in vain.” The zero crossing in the slope in
5 6
Figure 3.4 implies bistability of spontaneous arm use: when the number of rehabilitation
trials is larger than the number of trials required to reach the threshold (420 trials), the
spontaneous arm use improves in the following free choice condition until it saturates;
conversely, when the number of therapy rehabilitation is less than the number of trials
required to reach the threshold, the spontaneous arm use deteriorates (Figure 3.5.C).
Similar bistability is also shown in the directional error (Figure 3.5.A) and normalized
population vector (Figure 3.5.B).
5 7
Figure 3.3. Time course of directional error (A), normalized population vector (PV)
(B) and spontaneous arm choice (C) in the affected range just before stroke, following
stroke (“acute stroke”), during rehabilitation, and after rehabilitation (“chronic stroke”).
Five different durations of therapy were used (0, 200, 400, 800, or 3000 trials). The
spontaneous arm use is an average selection probability from 10 uniformly distributed
desired directions on the affected range. The threshold of effective rehabilitation for this
stroke size is shown in the horizontal dotted line of (C). If the rehabilitation leads to
performance above this threshold, then a virtuous circle between spontaneous arm use
and performance will take place and performances will continue to improve without the
need for further rehabilitation.
5 8
Figure 3.4. Long-term effect of therapy as a function of the duration of therapy. We
plotted the average slope of spontaneous arm use in the 1000 trials following
rehabilitation as a function of the intensity of therapy. Above 420 trials (with the default
parameter set), spontaneous arm use increases after therapy. Below this number of trials,
it decreases.
Figure 3.5. Directional error (A), normalized population vector (PV) (B), and
spontaneous arm use (C) in the immediate and follow-up tests. The directional error
performance following few rehabilitation trials worsens after therapy. On the contrary, the
directional error performance after sufficient rehabilitation trials improves even after
therapy. Similar bistable patterns are shown for the normalized population vector and
spontaneous use shown in (B) and (C).
5 9
Figure 3.6. Effect of stroke size. (A) Number of rehabilitation trials required to reach
the effective rehabilitation threshold, as a function of lesion sizes. (B) Normalized
population vector (PV) as a function of lesion size in the follow-up test after 800.
As expected, the minimal intensity of effective therapy depends on lesion size
(Figure 3.6.A). Compared to smaller lesions, large lesions require longer rehabilitation
sessions to reach the threshold of spontaneous arm use above which therapy can be
terminated. In our model, although directional error recovered almost perfectly for
lesions sizes smaller than 50% for the right hand side workspace (follow-up test after 800
rehabilitation trials; results not shown), the long-term normalized population vector
correlates almost linearly to the lesion size (same simulations conditions, see Figure
3.6.B).
Motor performance can be judged according to two different criteria: accuracy (low
bias of error) and precision (low variance of error). Figure 3.7 shows the effects of stroke
and therapy, or the lack of it (‘no therapy’), on the accuracy and precision of the reach
directional error over the affected range for the affected arm (contralateral to the lesion,
Figure 3.7.A) and for the non-affected arm (ipsilateral to the lesion; Figure 3.7.B).
Although, stroke leads to an immediate and large deterioration of accuracy and precision
6 0
for reaching movements with the affected arm (Figure 3.7.A, thick solid line), therapy
restores accuracy to near pre-stroke level (Figures 3.7.A, dotted line). Because the
number of available neurons is reduced after stroke, however, precision remains low after
therapy compared to pre-stroke levels (Figure 3.7.A). Lack of therapy (‘no therapy’ in
Figure 3.7.A, thin solid line) results in further deterioration of accuracy and precision for
the affected (right) arm after stroke. In contrast, while stroke and therapy have almost no
effect on performance of the non-affected arm in our model (Figure 3.7.B, dotted line),
the increased frequency of compensatory reaching movements in the no therapy
condition results in an increase of accuracy on these reaching movements (Figure 3.7.B,
thin solid line).
6 1
Figure 3.7. Changes in reach precision (standard deviation of directional error) in
relation to changes in accuracy (mean of directional error) for (A) the contralateral
(affected) arm, and (B) the ipsilateral (non-affected) arm. In each panel, the thick solid
line corresponds to the changes occurring from just before stroke to the 500th free choice
trials following stroke onset. The thin solid line represents additional changes in a no
therapy condition (3000 free choice trials). The dotted line represents additional changes
in a therapy condition (1000 therapy trials followed by 2000 free choice trials). After
stroke, accuracy and variability of the contralateral arm worsened. Following therapy,
accuracy improved but with little change in variability. With no therapy, behavioral
compensation with the non-affected arm further developed, resulting in improved
accuracy for this arm (B).
We then studied the organization and reorganization of the cells’ preferred directions
in each hemisphere before lesion, after lesion, and after therapy. Using pie histograms
(Figure 3.8) which show the number of neurons whose preferred directions are in a
certain range of directions, we observed a cortical reorganization pattern similar to that
observed in animals that undergo rehabilitation or not after motor cortex lesions (See
Discussion). Before lesion, more cells coded for the movements that were more often
6 2
performed. After lesion, therapy or the lack of it affects the reorganization of neurons’
preferred directions in both hemispheres.
Therapy: Motor training with the affected arm has a profound effect on
reorganization in the affected hemisphere. After sufficient therapy, the distribution of the
surviving cells’ preferred directions is similar to the pre-lesion distribution; with, however,
fewer cells coding each direction, because the total number of cells is reduced (Figure
3.8.A.4). During therapy, the directional error decreases, ensuring concordance of the
supervised and unsupervised learning rules; the unsupervised learning rule is “adaptive”
as it reinforces the supervised learning rule (Figure 3.8.A.4). Conversely, motor training
has almost no effect on the cell population of the non-affected arm (Figure 3.8.B.4).
No therapy: Two patterns of reorganization are noteworthy in the affected
hemisphere. First, the size of the affected range increased compared to just after the
lesion; second, a large number of cells now code for movements in the fourth quadrant. If
no therapy or insufficient therapy is provided, the directional error of the affected arm
does not decrease (Figures 3.3.A and 3.7.A). This results in discordance between the
supervised and unsupervised learning rules, and the unsupervised learning rule, based on
desired but not actual directions, becomes “maladaptive,” further increasing the lesion
size (Figure 3.8.A.3) and largely increasing the representation of compensatory
movements (Figure 3.8.B.3) whose performance improves (decrease both in directional
error bias and in directional error variability, and increase in normalized population
vector). In the non-affected hemisphere, a number of cells shift their preferred directions
to the first quadrant, because the non-affected arm must now compensate for the
movements previously performed by the affected arm (Figure 3.8.B.3).
6 3
Figure 3.8. Reorganization of the affected (left) hemisphere (A), and non-affected
(right) hemisphere (B) after stroke followed by therapy or no therapy. In each panel,
histograms of the cells’ preferred directions are shown (1) before stroke, (2) after stroke
with 500 free choice trials, and (3) after 3000 free choice trials or (4) after 1000 forced
used training trials and subsequent 2000 free choice trials. The gray area in (A.2) shows
the lesion site. Before the lesion, the left hemisphere contains more neurons with
preferred directions in the right workspace, and the right hemisphere contains more
neurons for the left workspace because of the bias for workspace preference. Just after
lesion, the left hemisphere is affected. If no therapy follows, the size of the affected range
increases, and the number of neurons for the fourth quadrant increases in the affected
hemisphere (mal-adaptation) and in the first quadrant in the non-affected hemispheres
(A.3). On the contrary, the number of neurons for the first quadrant in the right
hemisphere increases due to compensation. After therapy (1000 forced use trials followed
by 2000 free choice trials), however, the distributions of directions are similar to the pre-
lesion distribution in both hemispheres.
6 4
Figure 3.8, continued
6 5
Figure 3.9. Cortical reorganization without the unsupervised learning term in
Equation (3.3). Reorganization of the affected (left) hemisphere (A), and non-affected
(right) hemisphere (B) after stroke followed by therapy or no therapy. In each panel,
histograms of the cells’ preferred directions are shown (1) before stroke, (2) after stroke
with 500 free choice trials, and (3) after 3000 free choice trials or (4) after 1000 forced
used training trials and subsequent 2000 free choice trials. The gray area in (A.2) shows
the lesion site.
6 6
Without the unsupervised learning term, reorganization follows different patterns:
Therapy has less of an effect on reorganization, and lack of therapy does not lead to over-
representation of compensatory movements in the affected hemisphere or in the non-
affected hemisphere (see Figure 3.9).
To better understand the respective roles of each of the supervised, unsupervised,
and reinforcement learning rates on behavior we then performed a sensitivity analysis for
these three parameters on directional error for different durations of therapy (200, 400
and 800 therapy trials) followed by 3000 free choice condition. As shown in Figure
3.10.A, directional error decreased as the supervised learning rate increased for any
amount of therapy. Figure 3.10.B shows, however, a more complex pattern for the
unsupervised learning rate. For a number of rehabilitation trials sufficient to reach
threshold in the default parameter set (420 therapy trials on the threshold with 0.002 for
the unsupervised learning rate), there is an optimal unsupervised learning rate for which
long-term performance (after 3000 free choice trials) is enhanced compared to either zero
unsupervised learning or too large unsupervised learning. Thus, for appropriate learning
rates, unsupervised learning is “adaptive,” as it enhances performance. No unsupervised
learning or too large unsupervised learning rates are detrimental to performance however.
A similar pattern is shown for the reinforcement learning rate, although the interpretation
is more arduous as very little spontaneous use occurs with a reinforcement learning rate
set at 0 (to perform the sensitivity analysis for the reinforcement learning rate, we used
the default parameter set until the end of the acute-stroke phase, then the different
reinforcement learning rates were tested starting with therapy condition).
6 7
Figure 3.10. Effect of the supervised learning rate (A), the unsupervised learning rate
(B) and the reinforcement learning rate (C) on directional error after different durations of
therapy (200, 400 and 800 therapy trials) followed by 3000 free choice condition . The
default parameters used in simulations are shown with the gray vertical lines.
We further studied the conditions under which the threshold appears by setting each
of the three rates to 0 and keeping the other two to the default values. With such learning
rate settings, we plotted the directional error, normalized population vector, and
spontaneous hand use (See Figure B.1, B.2 and B.3 in Appendix part B) just after therapy
and 3000 trials after therapy as a function of the number of rehabilitation trials, as in
Figure 3.5. Unlike for the full default parameter set (Figure 3.5), if one of the learning
rates is set to zero, the bistable behavior disappears, as shown by the non-crossing of the
curves for 0 (immediate test) and 3000 free choice trials (follow-up test). In other words,
the threshold observed in the complete model is an emergent property of the three types
of learning. If supervised learning or reinforcement learning is not present, directional
error worsens after 3000 free choice trials compared to just after rehabilitation, for any
number of rehabilitation trials. If unsupervised learning is not present, however,
6 8
directional error improves after 3000 free choice trials for any amount of rehabilitation
trials.
3.5. Discussion and Predictions
We proposed a novel model of bilateral reaching that links different levels of analysis,
as it combines a simplified but biologically plausible neural model of the motor cortex, a
biologically plausible (but non-neural) model of reward-based decision-making, and
physical therapy intervention at the behavioral level. Because our model is based on
sound theoretical principles and neural mechanisms, it allows us to explore the non-linear
interactions between performance and spontaneous use in stroke recovery.
3.5.1. Cortical reorganization after stroke and therapy
Our motor cortex model, by learning to minimize both directional errors and
variability, accounts for the reversal of the loss of cortical representation after
rehabilitation, and the increase of this loss together with the increase of the representation
of neighboring areas without rehabilitation (Goodall et al., 1997; Nudo et al., 1996).
In the lesioned cortex, during therapy, the supervised learning rule ensures that under-
represented directions are “repopulated,” decreasing average reaching errors. However,
because there are fewer surviving neurons overall after stroke, stroke leads to a decrease
in population vector magnitude (Figure 3.3.B) and increased movement variability
(Figure 3.7.A) – as previously shown in (Reinkensmeyer et al., 2003). The supervised
learning component of our rule is consistent with monkey data showing that learning new
skills, but not repetitive use, leads to motor cortical reorganization (E. Plautz, Milliken
6 9
GW, Nudo RJ, 2000). Supervised learning-like plasticity has not been reported in the
cerebral cortex however, but it is thought to occur in the cerebellum (Schweighofer,
Arbib et al., 1998). A possibility is that the reduction of error due to rehabilitation, and
the associated cortical reorganization, is driven by important cerebellar projections to the
motor cortex. Lesion of the error signal driving cerebellar learning, presumably carried
by the inferior olive (Kitazawa, Kimura, & Yin, 1998), could be performed in animal
models of stroke to test this possibility.
During therapy, the unsupervised learning rule is “adaptive” as its effect reinforces
that of the supervised learning rule (compare Figures 3.8.A.4 and 3.9.A.4). By recruiting
a greater number of neurons for often-performed actions it can counter neuronal noise
and decrease directional error (Reinkensmeyer et al., 2003); it is thus an adaptive
process in the normal brain. After stroke, however, such unsupervised plasticity may
become mal-adaptive. A comparison of Figures 3.8.A.3 and 3.9.A.3 shows that
unsupervised learning further augments the effect of stroke if no therapy is given. As
compensatory movements, or movements unaffected by the stroke, compete for the
surviving neurons, fewer neurons code for directions around the affected area (Figure
3.8.A.3), leading to further deterioration of performance (Figures 3.3.A & B and Figure
3.7.A). The representation of compensatory movements is increased and performance of
these movements improves (Figure 3.7, decreased directional error bias). Without the
unsupervised learning term, reorganization follows different patterns: Therapy has less of
an effect on reorganization, and lack of therapy does not lead to over-representation of
compensatory movements in the affected hemisphere or in the non-affected hemisphere
(see Figure 3.8).
7 0
3.5.2. Strengths and limitations of the model
To our knowledge, the present computational neural model is the first developed to
make specific behavioral and neuronal predictions on the efficacy of physical therapy
interventions. Two previous models have been developed to account for behavior after
stroke (Reinkensmeyer et al., 2003; Scheidt & Stoeckmann, 2007), but these models do
not address plastic changes. The model by Goodall et al. (Goodall et al., 1997) predicts
that focal lesions result
in a two-phase map reorganization process in the intact peri-lesion
cortical region, but this model does not account for the development of compensatory
movements and reorganization of choice after training.
Our model is in accord with the most recent understanding and comprehensive view
of the basal ganglia function in adaptive selection of alternative actions (Bogacz &
Gurney, 2007; Dominey et al., 1995; Lo & Wang, 2006) via release of inhibition of motor
cortex activity (Mink, 2003). A different decision making mechanisms was however
recently proposed by Cisek (Cisek, 2006), who analyzed the time-course of cortical
activation before and after decision to reach one of two targets with a single arm. Unlike
in our model, target choice was resolved in a distributed manner, by competition between
neurons within cortical layers. Further experiments are needed to study how targets are
selected when both limbs can be used, and how this selection is reorganized after lesion
and therapy.
In a recent motor cortex model (Rokni, Richardson, Bizzi, & Seung, 2007), as in our
model, reorganization of preferred directions is due to a learning rule containing two
terms: a supervised error correcting term, and a (unsupervised) weight decay term.
Because our unsupervised learning rule is based on the activation of neighboring neurons
7 1
however, it explains mal-adaptation and increase of lesion size in the no-therapy
condition (Figure 3.8.A.3). Furthermore, the sensitivity analysis of the three learning
rates (supervised, unsupervised and reinforcement learning, Figure 3.10) showed that the
bistability of performance and spontaneous arm use (Figures 3.4 and 3.5) requires the
combination of all three types of learning (Figures B.1, B.2 and B.3 in appendix part B).
Because of its simplicity, our model provides clear insights into a range of factors
affecting recovery of arm use after stroke. however, it does suffer from a number of
limitations:
1. The simplistic coding of the reach movements by the motor cortex neurons does
not account for how activity of motor cortex neurons also correlates with joint torque and
muscle activity (E. V . Evarts, 1968; Herter, Kurtzer, Cabel, Haunts, & Scott, 2007; Kakei
et al., 1999). The current motor cortex model was based on the directional coding of hand
movement (Georgopoulos et al., 1986). Even though a possible mechanism behind
execution of directional coding on the motor cortex was set forth (Georgopoulos, 1996)
and computational models have suggested correlation between directional coding of a
neuron and a linear component the direction of force which the neuron exerted (Guigon et
al., 2007a; Todorov, 2000a), there is little evidence, except (Beer et al., 2004), of stroke
lesions impairing specific hand directions. The key point is not the actual coding
(important though directional coding undoubtedly is) but rather to see how a lesion
affects a range of movements, and how learning may be maladaptive or adaptive by
returning some control of that range to the unaffected or affected hand, respectively. Our
assumption, how a lesion affects the distribution of neurons in the motor cortex, may be
valid, only when neurons on the motor cortex form topography of directional coding. Our
7 2
unpublished computational model of the motor cortex showed there exists topography of
direction of population vector and this direction of force would be correlated with
directional coding. Nevertheless, in the present model, as a results of such simplistic
coding, directional error is highly correlated with lesion size; this may not be highly
realistic as directional error after mild or moderate stroke in humans is not much affected
(Reinkensmeyer et al., 2002).
2. A related limitation is the lack of proximal and distal representation in our motor
cortex model. In the biological motor cortex, individual joints are controlled by
somewhat overlapping neural groupings forming somatotopically organized and plastic
motor cortical maps. Empirical results of map reorganization after lesion have focused on
remapping of the hand region (Kleim et al., 1998; Nudo et al., 1996). It is to be noted
however, that although our model focuses on redistribution of the representation of
reaching directions within the area of cortex, our results accord well with the type of
reorganization shown in these empirical results.
3. A third limitation is our simplistic model of stroke, akin to that used in animal
models of stroke. These ignore the motor impairments due to diffuse lesions to a number
of brain areas and tracts, and not just to the motor cortex. In particular, our model cannot
study the differential effect of cortical, sub-cortical and combined cortical-sub-cortical
strokes and thus cannot account for differential response to rehabilitation for different
stroke locations (e.g. (Miyai, Blau, Reding, & V olpe, 1997)).
To resolve the limitations, in the future we will expand our model by adding arm and
muscle models controlled by neurons grouped in adaptive motor cortical maps. We plan
to investigate the tradeoff between proximal and distal regions, with cortical motor maps
7 3
that change during training on tasks that require more skilled use of the hand itself.
Moreover, the notion that the action choice model may correspond to the basal ganglia
opens up promising lines of investigation.
In summary, despite our considerable simplifications of movement representation in
the motor cortex and of the simulated lesions, our results show that our proposed
mechanism of motor learning and plasticity, and the ensuing results (recovery, threshold,
and neural reorganization) are general and not particular to the specifics of our model.
3.5.3. Specific and testable predictions derived from the model
Our model makes the following testable behavioral and neural predictions.
Prediction 1. If spontaneous use of the affected arm is above a threshold level after
therapy, repeated spontaneous attempts to use the affected arm leads to further
improvements in motor performance, which in turn increase the ‘value’ of using the arm
(Figure 3.3).
Prediction 2. If spontaneous arm use is below this threshold after therapy,
compensatory movements are reinforced. Consequently, spontaneous use and motor
performance of the affected limb decrease (Figure 3.3).
Prediction 3. The dose of task practice necessary to reach the threshold depends on
stroke severity, and no amount of rehabilitation will be sufficient to reach this threshold
for most strokes that are classified as severe (Figures 3.4 and 3.6).
Prediction 4. Unless the stroke impairment is too severe, the dose o f rehabilitation
can be adjusted for each patient such that spontaneous arm use reaches this critical
threshold after rehabilitation. If the stroke is too severe however, motor re-training is “in
7 4
vain” (Figures 3.4 and 3.6). Of course, the dose of task practice also depends on
parameters within the model, and these may represent inter-subject variability of stroke
patients that complements the effects of lesion size.
Prediction 5. After effective motor re-training, movement accuracy can return close to
its pre-stroke levels, but movement variability will be higher than pre-stroke (Figure 3.7).
Prediction 6. After non-effective re-training, compensatory movements, either with
the same limb or the other limb or both, will become less variable (Figure 3.7).
Prediction 7. The hemisphere contralateral to the lesion undergoes reorganization of
preferred reach directions along with the development of compensatory reach movements
in the affected range (Figure 3.8).
Prediction 8. Both supervised learning–like (error driven) and unsupervised learning-
like (use driven) plastic phenomena drive reorganization in the motor cortex during skill
learning in the normal brain and after stroke (Figures 3.8 and 3.9).
3.5.4. Implication for rehabilitation
In our model, neural reorganization generates bi-stability at the behavioral level: after
therapy, spontaneous arm use will stabilize at either a low or a high value, depending on
the amount of therapy. Specifically, therapy is effective and could be stopped if
spontaneous arm use reaches a certain threshold, as the repeated spontaneous arm use
following therapy provides a form of motor learning that further “bootstraps”
performance. Below this threshold, however, motor re-training is “in vain” – there is no
or little long-term spontaneous arm use after training, and the model exhibit “learned
non-use”, as has been proposed in patients with brain lesions (E. Taub et al., 2006).
7 5
We thus predict that a measure of spontaneous arm use may be a good indicator to
determine optimal duration of the therapy. In current rehabilitation practice, all
rehabilitation is concentrated in the weeks following stroke. Our model suggests that
rehabilitation protocols adopt instead a spaced and adaptive Train-TestA-Wait-TestB-
Train Paradigm: short bouts of training (Train) are followed by a spontaneous arm use
test (TestA), no training for several weeks (Wait), and another spontaneous arm use tests
(TestB). If spontaneous arm use measured on TestB has increased since that on test TestA,
the threshold is reached, and rehabilitation can be terminated. If spontaneous arm use is
still low or has decreased since TestA, another bout of rehabilitation is called for. This
pattern is repeated until the threshold is reached. Note that such a training paradigm will
have the additional benefit of making use of the “spacing effect”, in which spaced
training lead to superior retention of learned skills (Schmidt & Lee, 2005). We plan to put
this hypothesis to empirical test using a novel laboratory-based objective test of bilateral
limb use.
7 6
Chapter 4.
Understanding the Functional Threshold:
Predictions from a Computational Model and Supporting Data from the
Extremity Constraint-Induced Therapy Evaluation (EXCITE) Trial
2
4.1. Introduction
Stroke is the leading cause of disability in the US, and about 65% of stroke survivors
experience long-term upper extremity functional
limitations. In more than half of patients
with severe paresis after stroke, recovery of upper extremity function is achieved solely
by use of the less-affected limb (Nakayama et al., 1994). Improving use of the more
affected arm is important, however, because difficulty in using this arm in daily tasks has
been associated with both reduced participation and quality of life (Duncan et al., 1999)
(Mayo, Wood-Dauphinee, Cote, Durcan, & Carlton, 2002). There now is evidence that
intensive task-specific practice, in which patients actively engage in repeated attempts to
produce motor behaviors beyond their present capabilities, is effective for improving
upper extremity function and use after stroke (Butefisch, 1995; Kwakkel, 1999; C.
Winstein & Wolf, 2008; S. Wolf, Blanton S, Baer H, Breshears J, Butler AJ, 2002; S. L.
Wolf et al., 2006; S. L. Wolf et al., 2008). Constraint Induced Movement Therapy
(CIMT) in particular, which involves intense functionally oriented task practice of the
more affected upper limb along with restraint of the less-impaired limb for 90% of
2
A functional threshold for long-term use of hand and arm function can be predicted:
predictions from a computational model and supporting data from the extremity
constraint-induced therapy evaluation (EXCITE) trial, N Schweighofer, CE Han, SL
Wolf, MA Arbib, CJ Winstein, Phys Ther.2009;89. In press.
7 7
waking hours, has been shown in a Phase III clinical trial (EXCITE trial) to largely
improve limb function and arm use compared to usual and customary care (S. L. Wolf et
al., 2006; S. L. Wolf et al., 2008).
Empirical studies in animals (Kleim et al., 1998; Nudo et al., 1996) and in humans
(Liepert et al., 2000) have further shown that several weeks of challenging rehabilitative
training with the upper limb contralateral to the lesion reverses, at least partially, the loss
of cortical representation due to stroke through recruitment of adjacent brain areas. Such
reorganization may last several years after the initial injury (E. Taub, Uswatte, & Morris,
2003), and has been linked to improved performance (Conner et al., 2003).
Long-term change in arm and hand use in the months following therapy is variable
among patients however. In some patients, use continues to improve in the months
following therapy (Fujiwara et al., 2008; C. J. Winstein et al., 2004; S. L. Wolf et al.,
2008). Earlier, we proposed that the repeated attempts to use the affected arm in daily
activities can promote motor learning that improves performance and function (C. J.
Winstein et al., 2004); this improvement in function could, in theory, further increase arm
use. However, for other patients, improvements can be short lasting. For a number of
patients in the EXCITE trial for instance (see results), and presumably, for a larger
number of patients who receive usual and customary care, a decrease in use of the
affected arm appears in the year following therapy. If this decrease is large, rehabilitation
is “in vain”. Accordingly, understanding the conditions that lead to sustained gains in arm
use following an intense bolus of therapy is important.
To achieve a uni-manual task, such as drinking from a glass, a patient recovering
from stroke can be conceptualized as a decision-maker who chooses to use either his
7 8
more or his less affected limb (the decisions here need not be made consciously). The
choice of limb use will presumably depend on many factors, including lesion
characteristics, impairment and functional levels, motor training, previously rewarding or
punishing experience after stroke, and motivation.
In a recent computational model (Han, Arbib, & Schweighofer, 2008), we modeled
bi-lateral control of reaching movements in patients recovering from stroke to explore the
interactions between adaptive decision process of limb use and limb performance.
Therapy is a simulation of CIMT: the simulated subject is forced to use the affected arm.
When no therapy is given, the subject is free to choose one of the two arms to reach a
target appearing on a circle centered on (overlapping) initial hand locations. Our neural
model contains two (independent) motor cortices, each controlling the contralateral arm,
with one being affected by stroke
3
. Before each movement, one motor cortex is selected
by an adaptive decision making system, tentatively located in a cortico-striatal network.
In our model, neural reorganization in the motor cortex was modeled with a neural
learning rule that has aims at achieving two goals concurrently: The first goal is to
reorganize the neural code to increase arm performance, via error-based learning (also
called “supervised learning”). Here, “arm performance” is defined as the directional error
of the movement towards the selected target and is the equivalent of “arm and hand
function” in patients. The second goal is to maximize neural resources for particular
desired movement directions to minimize movement variability, via Hebbian learning (a
model of long-term potentiation, which is also called “unsupervised learning” because
there is no teacher).
3
Note that our model does not account for the approximately 20% of uncrossed fibers
from the cortico-spinal tract
7 9
Furthermore, in the model, the decision to use one limb or the other is made by
comparing the “action value” of each limb in the adaptive decision making system. The
values for each arm are updated based on reward prediction errors (this type of learning is
also called “reinforcement learning”: if performance-based rewards are greater than
expected, the arm will be chosen more often for this particular movement. If not, the arm
will be chosen less often). It is noted that in the model “arm performance” is the
equivalent of “arm and hand function” in patients.
Simulation results showed that after small doses of therapy, the model exhibited a
simple form of learned non-use (for a more complete account of learned non-use, see (S.
L. Wolf, 2007)). Following larger doses of therapy, use increased after therapy. For a
specified intermediate dose of therapy, there is no change in use following therapy. In
other words, there is a threshold for rehabilitation. This threshold is an emergent property
of the learning dynamics, and only exists when the three types of learning, error-driven,
Hebbian-like, and reward-based are implemented (Han et al., 2008). The model further
showed that, similar to observations in monkeys with motor cortex lesions, sufficient
CIMT in the model reversed the cortical representation loss, principally because of
synergistic effect of supervised learning and unsupervised learning, which was
“adaptive”. No rehabilitation, or too little rehabilitation, increased this loss, principally
because of reorganization due to unsupervised learning, which was then “mal-adaptive”
(see Han et al. 2008 for further details and results).
The present paper directly tests the hypothesis that there is a threshold level for
function of the paretic arm and hand after therapy: if function is above this threshold,
spontaneous use will increase in the months following therapy. In contrast, if function is
8 0
below this threshold, spontaneous use will decrease in the months following therapy or
deteriorate, and compensatory use of the non-paretic hand will further develop. We first
present new computer simulations of the model to directly show the emergence of the
hypothesized performance threshold that determines average long-term limb use in
simulated patients. We then tested our prediction of a threshold with a re-analysis of
clinical data from the EXCITE clinical trial (S. L. Wolf et al., 2006).
4.2. Methods
4.2.1. Computer simulation methods
The objective of these computer simulations is to test the hypothesis that there is a
threshold level for function of the paretic arm and hand after therapy in simulations that
mimic the conditions of the EXCITE trial. Our model, although necessary highly
simplified, contains key ingredients that make it applicable to the participants in the
EXCITE trial. First, as described above, our model is a model of stroke recovery that
contains up-to-date knowledge about plastic processes underlying stroke recovery.
Second, therapy in the model has been modeled after CIMT. Third, the model makes
predictions of long-term changes in arm use as a function of performance just after
therapy, and the EXCITE data precisely capture such variables in these timeframes.
Finally, in the model, lesion sizes can be easily varied to capture the diversity of stroke
patients in the EXCITE trial. In previous work (Han et al., 2008), “identical” simulated
patients (same lesion size and location) received various doses of therapy. Here, in
contrast, we simulated patients with various lesion locations and sizes, and all simulated
patients received the same dose of therapy.
8 1
Specifically, we simulated 125 patients with locations of the lesion center randomly
chosen within the affected motor cortex. The lesion sizes were empirically determined
within a range of 16% to 43% of the affected motor cortex sizes, because such lesions
sizes led to effects that are neither too mild (to need therapy) nor too severe (to benefit
from therapy). All patients received the same dose of “400 forced use trials” of therapy
(corresponding to the 2 weeks of constraint induced movement therapy which was the
standard dose in EXCITE). Then for each simulated patient, we measured use of the
affected arm just after therapy (corresponding to the one week post-test in EXCITE) and
in a delayed follow-up test, given after “3000 free choice trials” following therapy
(corresponding to the 1 year post-test in the EXCITE Trial). All other simulation
parameters can be found in the previous section.
We developed sigmoid models of spontaneous arm use as a function of the error
both immediately after therapy and 3000 trials after therapy. For this purpose, we fitted
the equivalent linear models to the data: Y = a ERROR+ b, where Y = log (USE/(100%-
USE)), USE is the spontaneous arm use taken either in the just after therapy and 3000
trials after therapy. ERROR is the average of directional error (in degrees) just after
therapy computed from the average of 100 trials over the affected range. The parameter a
is the sigmoid slope, the parameter b is another free parameter, and log is the natural
logarithm. We report p value obtained from these linear regression models with
transformed Y values. We compared linear fits and sigmoid fits with root mean squared
error (RMSE) in spontaneous arm use.
8 2
4.2.2. Re-analysis of clinical data from the EXCITE trial
The objective of this reanalysis of the data form the EXCITE clinical trial (S. L.
Wolf et al., 2006) is to test the prediction, derived from our simulations, of a performance
threshold, on average, in patients post-stroke.
We performed a retrospective analysis of data from the 169 patients enrolled in the
EXCITE trials who did not withdraw from the trial (S. L. Wolf et al., 2006). Briefly, in
EXCITE, two groups of subjects were randomly assigned to an immediate or delayed
(wait) CIMT group. The immediate group received two weeks of therapy from time Pre1
(t = 0); the second group received receive no immediate therapy but did receive two
weeks of therapy after a one-year delay, from Pre2 (t = 1 year). Subjects were tested with
the Wolf Motor Function Test (WMFT) and the Motor activity Log (MAL) at Pre1, at
Post1 (t = 3 weeks; one week after the immediate group received therapy), at Pre2 (t = 1
year), and at Post 2 (t = 1 year + 3 weeks; one week after the delayed group received
therapy). All subjects were tested again at 24 months after inclusion in the study at MT24
(t = 2 years) (C. J. Winstein et al., 2003; S. L. Wolf et al., 2006). In the current analysis,
we are comparing use just after therapy with use a year later, so time of therapy is not a
factor in this analysis. We thus combined all data in EXCITE at these points and studied
function and use in the post test just following therapy (Post1 for the immediate group
and Post2 for the delayed group) and use in the delayed 1 year post test (Pre2 for the
immediate group, and MT24 for the delayed group).
Briefly, the WMFT (S. Wolf, Lecraw DE, Barton LA, Jann BB, 1989; S. L. Wolf et
al., 2001; S. L. Wolf et al., 2005) measures performance time (up to 120 seconds)
required for patients with stroke to perform 15 laboratory-based arm function tasks
8 3
requiring use of the more affected upper extremity. The WMFT also contains a shoulder
strength task and a grip strength task. Additionally, the quality of motor function during
the timed tasks is assessed by independent raters using a 6-point scale Functional Activity
Scale (FAS). The WMFT has good reliability, validity, and has been shown not to suffer
from a learning effect (S. L. Wolf et al., 2005). In our analyses, as in Wolf et al., the
natural logarithm of the time score of WMFT is used to normalize the distribution. In the
MAL, the participants (or their caregivers) rate how well (Quality of Movement QOM
scale) and how much (11-point Amount of Use AOU scale) the paretic arm is used
spontaneously to accomplish 30 activities of daily living outside of the laboratory
(Uswatte, Taub, Morris, Light, & Thompson, 2006; Uswatte, Taub, Morris, Vignolo, &
McCulloch, 2005). Each item on the MAL has an 11-point scale from 0 (no use) to 5
(normal) via increments of 0.5. Validity and reliability of the MAL has been established
(Uswatte et al., 2006). In our analyses, we present average MAL AOU scores.
Our neural computational model predicts that if performance is above threshold after
therapy, use will increase between just after and 1 year after therapy (positive change). If
performance is below threshold, use will decrease (negative change). Thus, as a first test
of this threshold prediction l, we performed a correlation analysis between the difference
in average MAL AOU between 1 year post- and just post-therapy and the logarithm of
the WMFT time score as in (S. L. Wolf et al., 2006). We then verified this theoretically-
driven prediction with a data-driven (hypothesis-free) analysis using a stepwise linear
regression (See published paper’s appendix for details). In this analysis, a large number
of variable predictors are initially entered in the linear model that predicts long-term
changes in MAL AOU. If a measure of function just after therapy is a significant
8 4
predictor of long-term changes in use, this stepwise linear regression would give further
support to the threshold hypothesis.
In view of the predictions of our simulations on the sigmoidal shapes of the
relationship between performance just after therapy and use both just after therapy and
after a delay of 1 year, and because the MAL AOU is bounded between 0 and 5, we
developed sigmoid models of the average MAL AOU as a function of the WMFT FAS
both immediately (1 week) after therapy and 1 year after therapy. For this purpose, we
fitted the equivalent linear models to the data: Y = a FAS+ b, where Y = - log (5/MAL – 1),
MAL is the MAL AOU taken either in the 1 week or 1 year post-test, FAS is the WMFT
FAS, the parameter a is the sigmoid slope, the parameter b is another free parameter, and
log is the natural logarithm. We report p value obtained from these linear regression
models with transformed Y values. We compared linear fits and sigmoid fits with root
mean squared error (RMSE) in MAL.
In this analysis with sigmoidal models, we only included patients with WMFT and
MAL data available at both time points (although all subjects included in our analysis
completed the trial, a number of data points were missing at either the 1 week or the 1
year post-test). Further, to obtain convergence of the sigmoidal fit, we removed two
subjects with perfect MAL AOU (scores = 5 just after therapy; no patient with scores = 5
at 1 year following therapy). We thus analyzed data from N = 132 subjects. We also
compared sigmoidal models and linear models after removal of outliers, defined as
individuals with residuals (observed - predicted values) greater than two standard
deviations of the residuals.
8 5
We compared the sigmoid models at 1 week and at 1 year, by computing and
comparing the confidence intervals at 95% and 90% of the slope parameter of the
sigmoid regression models. The prediction is that the slope is greater at 1 year than at 1
week. Finally, the performance threshold is obtained at the crossing of the sigmoids at 1
week and 1 year after therapy. The prediction is that because the slope at 1 year is greater
that at 1 week, the two curves will cross. Above this crossing point, use will be greater at
1 year than at 1 week, on average. Below this crossing point, use will be smaller at 1 year
than at 1 week, on average. Thus, this crossing represents an average threshold.
We set the significance level at p = 0.05.
4.3. Results
4.3.1. Computer simulation results
To test the hypothesis that spontaneous use will increase in the months following
therapy if arm performance after therapy is above a threshold, we first investigated arm
use at 0 trials and at 3000 trials following therapy as a function of performance just after
therapy. For simulated patients above threshold, the spontaneous arm use should, on
average, increase from 0 to 3000 trials. For patients below threshold, on average, use
should decrease from 0 to 3000 trials.
In the model, performance is measured as angular error between the target direction
and the actual arm movement direction. Spontaneous arm use is measured as percentage
of use of the affected arm to the targets in the range of movement directions most
affected by the lesion (in the model, the lesions affect some ranges more than other, see
Han et al. 2008 for details). Figure 4.1A shows that for most (simulated) subjects with
8 6
high performance after therapy, spontaneous arm use is high and saturates to maximum
use (100% in the model). Conversely, for subjects with low performance, use saturates
near zero. Because of these ceiling and floor effects, arm use after therapy is fit better by
a sigmoidal function of performance: (RMSE = 17.08, p < 0.0001) than a linear function
of performance (RMSE= 18.74, p < 0.0001). Figure 4.1B shows that arm use in the
follow-up test at 3000 trials after therapy is also fit better by a sigmoidal function (RMSE
= 18.98, p < 0.0001) than a linear function (RMSE = 19.36, p < 0.0001). Finally, the
slope of the sigmoid model in the long-term follow-up test is steeper than that just after
therapy (slope just after therapy = 0.31; slope one year after therapy = 0.52).
8 7
Figure 4.1. Simulated data of use of the affected arm (A) just after therapy and (B)
3000 trials after therapy as a function of performance (reach directional error in the
model) just after therapy for 125 simulated subjects with different lesion sizes and
locations. Note that we reversed the x axis, to compare with the data of Figure 4.2.C.
Comparison of the sigmoidal fit of use just after and 3000 trials after therapy. The
intersection of the two curves gives the threshold in arm performance above which use
increases, and below which use decreases. The upward arrow indicates that 89.1 % of the
simulated subjects above threshold show increase in arm use after therapy. The
downward arrow indicates that 87.0% of the simulated subjects below threshold show
decrease in arm use after therapy.
The intersection of the two curves gives an average (or “group”) threshold in arm
performance, corresponding to a reaching error of 22.8 degrees. Above this threshold, the
arm use of most (89.1%), but not all, simulated patients improve spontaneously following
therapy; below this threshold, the arm use of most simulated patients’ worsens following
therapy (87.1%). 89.1% of the simulated subjects showed an increase in use if above
threshold compared to only 12.90% if below threshold.
8 8
These simulation results make three important testable predictions. First, the
relationship between use after therapy and function after therapy is sigmoidal, and this is
true if spontaneous use is measured just after therapy or in a delayed follow-up test.
Second, the sigmoid is steeper in the follow-up test than just after therapy. Third, the
intersection of the sigmoid for use just after therapy with the sigmoid for use in the long-
term follow up test (Figure 4.1C) gives an (average) threshold in performance above
which use improves spontaneously following therapy and below which uses worsens. We
next tested these predictions with a re-analysis of clinical data from the EXCITE Trial.
4.3.2. Re-analysis of clinical data from the EXCITE trial
The correlation analysis between the difference in average MAL AOU between 1
year post- and just post-therapy and the logarithm of the WMFT time score shows that
function (WMFT) is positively correlated with change in use/MAL AOU (Pearson
correlation r = -.172, p = 0.046, N = 135). A similar correlation analysis shows that the
WMFT FAS correlates better with long-term changes in use (r = 0.22, p< 0.010, N = 134).
Among a large number of variables included as potential predictors of change in
MAL AOU after therapy in the stepwise regression analysis, only two were included in
the final model (p< 0.0005, R
2
= 0.123, N = 133): the WMFT FAS post therapy (p <
0.0001, standardized coefficient, 0.42); and the “weight to box” task with the more
affected arm (a strength task part of the WMFT) measured upon enrollment (p < 0.005,
standardized coefficient, -0.336). (See the Appendix for details.)
The sigmoid models of the average MAL AOU as a function of the WMFT FAS
showed good fit to the data both immediately (1 week) after therapy (RMSE = 0.83; p<
8 9
0.0001) and 1 year after therapy (RMSE = 0.86; p < 0.0001). The slope of the sigmoid
model is larger at 1 year after therapy (mean slope: 1.71; standard error (SE) of slope =
0.14; 95% confidence interval [1.43; 1.99]; 90% confidence interval [1.47; 1.94]) than at
1 week after therapy (mean slope: 1.31; SE = 0.13, 95% confidence interval [1.05; 1.58],
and 90% confidence interval [1.09; 1.53]). The corresponding linear models fit the data
with similar RMSE (RMSE = 0.83; p < 0.0001; and RMSE = 0.87; and p < 0.0001) for
the 1 week and 1 year models, respectively.
Figure 4.2. Use of the more affected arm (as recorded by the MAL AOU subscale)
(A) just after therapy and (B) 1 year after therapy as a function of arm and hand function
(Functional Ability Scale) just after therapy for subjects of the EXCITE trial. (C)
Comparison of the sigmoidal fit of use just after and 1 year after therapy. The intersection
of the two curves gives the functional threshold (FAS = 3.44) above which uses increases,
and below which use decreases. The upward arrow indicates that 68.2% participants
above threshold showed increase in arm use after therapy. The downward arrow indicates
that 63.5% of participants below threshold showed decrease in arm use after therapy.
9 0
The strength of the relationships between MAL AOU and FAS at 1 week and 1 year
increased after removal of outliers. Based on our criterion for outliers removal, 4 subjects
were removed in 1 week after therapy model, 5 subjects in 1 year after therapy model; 3
subjects were outliers for both 1 week and 1 year model; so N = 128 for 1 week model, N
= 127 for 1 year model). Compared to the sigmoid models without outlier removal, these
models with outlier removal (see Figure 4.2A and B) show improved fit to the data (1
week after therapy: RMSE = 0.76; p < 0.0001; 1 year after therapy: RMSE = 0.79, p <
0.0001) and smaller confidence intervals of the slopes both just after and 1 year following
therapy (1 week after therapy: mean slope: 1.44; SE of slope: 0.13; 95% confidence
interval [1.19; 1.68]; 90% confidence interval [1.23; 1.64]. 1 year after therapy: mean
slope = 1.91; SE = 0.14; 95% confidence interval [1.63; 2.18]; 90% confidence interval
[1.67; 2.14]). Finally, the sigmoid models with outlier removal fits the 1 week and 1 year
data better than the linear models with outlier removal (linear models: 1 week after
therapy: N = 130, RMSE = 0.78; p < 0.0001. 1 year after therapy: N = 129, RMSE =
0.83; p < 0.0001).
Thus, the slope of the sigmoidal model 1 year after therapy is greater than the slope
of the sigmoidal at 1 week after therapy, although there is a very small overlap of the
slope confidence intervals at 95% (by 0.05), but no overlap of the confidence intervals at
90%. The results are qualitatively similar without outlier removal, but the confidence
intervals at 95% for the sigmoid slopes overlap somewhat (by 0.15); there is less overlap
of the confidence intervals at 90% (by 0.06), however. The steeper slope of the sigmoid
at 1-year post-therapy compared to 1 week post-therapy is well illustrated in Figure 4.2C
in which we plotted the two sigmoids (models without outliers).
9 1
The intersection of the two sigmoids with mean slopes gives the average threshold
in function, given by WMFT FAS = 3.44. Among 22 subjects with function above this
threshold, 15 (68.2%) showed an increase in use in the year following therapy (model
without outliers). Conversely, among 104 subjects with function below this threshold,
only 38 (36.5%) showed an increase in use in the year following therapy (See Figure
4.2C). Thus, as predicted by the computational model, when function just after therapy is
above this average threshold, subjects on average show improvements in use in the year
following therapy; when function just after therapy is below this average threshold,
subjects on average experience a worsening of use in the year following therapy.
4.4. Discussion
To better understand the interactions between arm function and use after therapy, we
have presented new simulations with conditions that mimic those of the EXCITE trial.
These simulations made three new predictions, all of which were confirmed in a re-
analysis of arm use and function data from the EXCITE trial.
First, the relationship between use after therapy and performance is better fit by a
sigmoidal model than a linear regression model, and this sigmoidal relationship is true
whether use is measured just after therapy or in a follow-up test. The difference between
the linear fits and sigmoidal fits are not very large however (as shown by the relatively
small differences in RMSE between linear and sigmoidal models at 1 week and at 1 year
in the outlier removal conditions). Thus, this sigmoidal relationship needs to be verified
with other clinical databases. A sigmoidal relationship can results from two possible non-
exclusive reasons: The first possible reason is simply that the MAL AOU does not
9 2
adequately measure arm use when use is high or low. The second possible reason is an
actual non-linear relationship between function and use in patients, at least on average. A
flooring effect would suggest that when function is low, the arm is not used at all. A
ceiling effect would suggest that when function is high but less than maximal, the arm is
used as if the patient did not suffer from stroke.
Second, the sigmoid is steeper in the 1-year follow-up test than just after therapy.
This indicates that, for the average patient in EXCITE, if function is high just after
therapy, use improved. Conversely, and again on average, if function is low just after
therapy, use worsens.
Third, the intersection of the sigmoid for use just after therapy with the sigmoid for
use in the 1 year follow up test gives an average (or group) threshold in performance. If
performance is above the functional threshold just after therapy, use of the more affected
arm in the year following therapy improves, on average, On the contrary, if function is
below this threshold, use is not sustained, on average Thus, functional abilities just after
therapy predict change in use in the long-term following therapy, and on average, a
functional threshold can be determined.
In the stepwise regression analysis of the EXCITE data, only a measure of arm and
hand function (the WMFT FAS) and shoulder strength just after therapy were the
predictors of use. Furthermore, shoulder strength was negatively correlated with a future
change in use. How can this result be interpreted? One possibility is that patients with
greater shoulder strength may be better able to compensate with the proximal arm for the
lower functioning distal upper extremity. Because of these compensatory movements, use
of the hand is not reinforced and decreases. Another, non-exclusive, possibility is that
9 3
increase in arm strength correlate with a shrinkage of neural areas encoding for the distal
representation; this would explain the negative correlation. In any case, because stepwise
regression is an exploratory tool, this result and predictions require further investigation
by studies that combine of brain imaging or recording and behavioral data analysis.
4.4.1. Limitations and future work
A practical clinical implication of the present work is the determination of a
stopping criterion to determine the dose of therapy for each patient. The intersection of
the sigmoid curves in our re-analysis for the EXCITE data gives a “group” threshold,
above which most patients improve spontaneously.
A goal of our future research is therefore to provide the means to assess when an
individual reaches his/her personal threshold so that therapy may be stopped without
adverse effect. More research is needed, however, before we are able to provide robust
threshold values for individual patients and before such a stopping rule can be
implemented in the clinic for the following two reasons. First, the predictive capability of
our simple sigmoidal models is rather low: only 68.2% of subjects above the threshold
showed improvement in use; conversely, 36.5%. Such low predictive values arise
because our sigmoidal models are simple averaged models that do not consider
individual lesion size, location, or any other patient characteristics such as self-efficacy,
beside function just after therapy.
To be able to determine thresholds for individual patients we are currently extending
this work in two aspects. First, we are developing individualized predictive models that
can be used for individualized determination of threshold and dose. We previously
9 4
argued from a theoretical view-point (Han et al., 2008), and confirmed to some extent
here with clinical data, that stroke recovery is a time-varying process (arm use changes in
the year following after therapy) that is non-linear (arm use can increase or decrease in
the year following depending on arm function after therapy). We are currently developing
time-varying and non-linear models of recovery that will use neurological data such as
stroke lesion location and volume as regressors. Second, both the WMFT FAS and the
MAL AOU are lengthy tests that are impractical use in the clinic. To address this, we are
developing a novel, easy to administer, reliable, and valid measure of arm use that is
based on actual arm choices in a bilateral arm reaching task (S. Y. Chen et al., 2008).
Based on predictions from these novel models and tools, we expect to determine the
functional threshold with high confidence for individual patients, and as a result
determine the patient-specific dose of therapy that will bring function above this
threshold. Such an understanding of non-linear relationships between limb function and
use is important for the future development of cost-effective interventions and the
prevention of “rehabilitation in vain.”
9 5
Chapter 5.
Variability in detouring in reach-to-grasp:
Individualized strategies with virtual targets
5.1. Introduction
Previous experimental studies of reach-to-grasp action captured the typical behavior
pattern in which the hand directly approaching a target shows a bell-shaped reach
velocity with the peak aperture for the grasp occurring during deceleration at about 70 to
85% of total movement time (Desmurget et al., 1996; Flash & Hogan, 1985; P Haggard &
Wing, 1995; Jeannerod, 1981; Paulignan et al., 1997; Paulignan, Jeannerod et al., 1991;
Paulignan et al., 1990). Many experiments have investigated the effects of perturbation of
target size or location after movement onset (Paulignan, Jeannerod et al., 1991; Paulignan
et al., 1990), and when obstacles required a detour in approaching the target (Alberts,
Saling, & Stelmach, 2002; P. Haggard & Wing, 1998; Mon-Williams, Tresilian, Coppard,
& Carson, 2001; Sabes & Jordan, 1997; Sabes, Jordan, & Wolpert, 1998; Saling, Alberts,
Stelmach, & Bloedel, 1998; Tresilian, 1998). However, our recent studies of right-
handers reaching to grasp an object directly or detouring to avoid an obstacle have
demonstrated variation in strategies not only between subjects but even depending on
which hand is used by a given subject (Tretriluxana, Gordon, Arbib et al., 2009;
Tretriluxana, Gordon, Fisher, & Winstein, 2009; Tretriluxana et al., 2004; Tretriluxana,
Gordon, & Winstein, 2008; Tretriluxana et al., 2005). As far as we know, no other has
demonstrated such variation.
9 6
A basic model of the reach-to-grasp action (M.A. Arbib, 1981; Jeannerod, 1981)
detailed separated visuomotor processing to drive a reaching module and a grasping
module. The reaching schema plans movements of an arm in order to transport the hand
to a given target object. The grasping schema controls preshaping of the hand and its
subsequent closing on the given object to complete grasping. In the grasping module, a
key control parameter is the aperture between opposing surfaces such as the thumb and
index finger (Iberall & Arbib, 1990). Preshaping sets a grip aperture larger than the target
size to reduce the probability of accidental collision with the object before securing the
grasp.
Interdependency between the reaching and grasping modules has been demonstrated
in online perturbation experiment (Desmurget et al., 1996; Paulignan, Jeannerod et al.,
1991; Paulignan et al., 1990; Paulignan, MacKenzie et al., 1991) and with computational
models (Hoff & Arbib, 1993; Rand, Shimansky, Hossain, & Stelmach, 2008; Ulloa &
Bullock, 2003). The Hoff-Arbib model (1993) hypothesized time-based coordination, to
meet the equifinality constraint that grasping and reaching movements should finish at
the same time. Each module has optimal estimators for the required duration of its task
under given accuracy constraints. Equi-finality means the time required for reaching
equals the sum of the times required for preshaping and enclosing. If, due to a
perturbation of target location, the time required for reaching increased, the model
specifies how the grasping module will slow down and exhibit a bimodal aperture
trajectory. Similarly, if the target size is perturbed, the extra time required by the grasping
module sets a new duration for reaching. By contrast, Ulloa and Bullock (2003)
hypothesized uni-directional interdependency from the reaching module to the grasping
9 7
module. In their model, larger reaching velocity increased the grip aperture size with
some delay. There is also an indirect modulation from the grasping module to the
reaching module, resetting both modules.
Computational models which reproduced detouring reach-to-grasp are rather
uncommon (Rosenbaum, Meulenbroek, Vaughan, & Jansen, 1999; Simmons & Demiris,
2006; Vaughan, Rosenbaum, & Meulenbroek, 2001), compared to computational models
which reproduce the direct reach-to-grasp (Flash & Hogan, 1985; Hoff, 1992; Hoff &
Arbib, 1993; Meulenbroek, Rosenbaum, Jansen, Vaughan, & V ogt, 2001; Rand et al.,
2008; Rosenbaum, Engelbrecht, Bushe, & Loukopoulos, 1993a, 1993b; Rosenbaum,
Loukopoulos, Meulenbroek, Vaughan, & Engelbrecht, 1995; Ulloa & Bullock, 2003) and
detour reaching without grasping (Bendahan & Gorce, 2006; Flash & Hogan, 1985;
Simmons & Demiris, 2005). Most of the previous computational models reproduced the
trajectories of a representative averaged subject instead of an individual subject.
The first purpose of this study is to investigate what coordination variability exists
and what factors would lead to this variability. The next section summarizes the
experimental data (Tretriluxana, Gordon, Arbib et al., 2009; Tretriluxana, Gordon, Fisher
et al., 2009; Tretriluxana et al., 2004, 2008; Tretriluxana et al., 2005), using a new
statistical analysis that sets forth features of the data that are crucial to understanding the
various strategies they exhibit. We then introduce the Hoff-Arbib model (Hoff and Arbib,
1993) of the coordination of reach and grasp, and provide hypotheses which extend the
model to address the data on the effects of detouring on this coordination. Finally, we
provide simulation results and relate them to our analysis of the strategies exhibited in
our experimental data.
9 8
5.2. Experimental data
We summarize the experimental data (Tretriluxana, Gordon, Arbib et al., 2009;
Tretriluxana, Gordon, Fisher et al., 2009; Tretriluxana et al., 2004, 2008; Tretriluxana et
al., 2005) and offer a novel analysis which makes explicit the variation in strategies
which set new challenges for modeling reach-to-grasp in detouring (Tretriluxana, Gordon,
Arbib et al., 2009).
5.2.1. Experimental setup
Data from six nondisabled right-handed adult volunteers (five women, one man,
aged 24-59 years) were used for this analysis. Briefly, subjects were instructed to reach
from a start position to a goal position “as quickly as possible” in order to grasp and lift a
pre-positioned cylinder, located 30 cm from the start (2.4cm in diameter and 10 cm in
height) with their thumb and their index finger (Figure 5.1). Subjects could see the target
cylinder before initiating movement but a wooden shield occluded vision of the arm and
hand during the movement. After the subject’s hand was in the start position, an LED
turned on as the “start” signal. There were two paradigms: In the non-barrier (NB)
paradigm, subjects reached the target with a fairly straight reach trajectory. In the barrier
(B) paradigm, another cylinder (30 cm in height), positioned 15cm in front of the start
position and 2.5 cm lateral to the midline, served as a barrier. Subjects had to reach
around the barrier with a curved reaching trajectory. Reach-to-grasp data were collected
using each hand in each subject.
During each movement, position data were recorded at 120 Hz with the
MotionMonitor (Innsport, Inc) and Mini-Bird (Ascension Technologies) sensors. The
9 9
velocity profile of the reach movement was computed from the sensor on the forearm and
the grip aperture profile of grasping was computed from the distance between two
sensors on the thumb and index fingers. The grip aperture sometimes had artificial and
very large peaks; these were removed manually and interpolated with a cubic spline.
Further details of the experimental setup can be found in the previous papers(Tretriluxana,
Gordon, Arbib et al., 2009; Tretriluxana, Gordon, Fisher et al., 2009; Tretriluxana et al.,
2004, 2008; Tretriluxana et al., 2005).
Figure 5.1. Experimental setup (Adapted from Tretriluxana et al.2009) in a lateral
view and a top view. The target object is a cylinder, 30 cm from the start switch. The
barrier is another cylinder, 15 cm in front of the start switch and 2.5 cm away from the
midline from the start switch to the cylinder. A wooden shield occludes the vision of the
subject’s hand except when the hand is near the target object. NB indicates a typical hand
path when no barrier is present; B when the barrier is present.
1 0 0
0 0.5 1
0
20
40
60
80
100
120
normalized time
velocity (cm/sec)
0 0.5 1
0
1
2
3
4
normalized time
aperture (cm)
-0.2 -0.1 0
0.1
0.2
0.3
m
m
0 1 2 3 4
0
20
40
60
80
100
120
velocity (cm/sec)
aperture (cm)
FVP
MAP
PI
MDP
start
goal
(a) (b)
(c) (d)
Figure 5.2. Major features of an experimental trial (C004, left hand, 17
th
trial) in
reach velocity profile (a), in aperture size profile (b), trajectory (c) and coordination of
reach velocity over aperture size (d). FVP is the first peak in the velocity profile, MDP is
the point of maximum deviation of the reach trajectory from the midline, MAP is the
maximum in the aperture profile, and PI is preshape initialization. We will see that these
graphs correspond to just one of several different strategies for the reach-to-grasp in the
barrier condition.
The previous papers (Tretriluxana, Gordon, Arbib et al., 2009; Tretriluxana, Gordon,
Fisher et al., 2009; Tretriluxana et al., 2004, 2008; Tretriluxana et al., 2005) indicated
strategy variability qualitatively; here we re-analyze the data to assess the variability
quantitatively. We defined a set of features which may characterize reach-to-grasp
strategies and are easy to extract automatically. Figure 5.2 shows a detouring reach-to-
grasp trial which is similar to the result for online perturbation experiment. Although
detouring reaching module may exhibit a double velocity peak, there are cases where the
second peak is too small to distinguish from the sensor noise (Figure 5.2a). Thus, we
excluded the second velocity peak from the basic feature set. Instead, to account for the
velocity dip which is present when there are two velocity peaks, we selected the point of
maximum deviation of the reach trajectory from the midline (MDP, Figure 5.2.c). This
1 0 1
point of maximum deviation may distinguish the B and NB trajectories. The grasping
pattern shown in Figure 5.2b has two aperture peaks and one aperture dip between them.
However, we exclude the minor aperture peak from our key feature set because it is
absent on some trials. We do include the maximum aperture in preshape (MAP, Figure
5.2b). We assumed that the maximum aperture peak represents the preshape for the
object, so we call the starting point of this peak the preshape initialization (PI). We
collected these features for each subject and each hand, and correlated the features for
reaching and grasping. Each trial may have different movement duration; thus, we
normalized the occurrence time of these features. We emphasized that all time occurrence
features are based on the normalized time unless it’s not normalized. Table 5.1
summarizes the selected features.
Feature Description
t(FVP): normalized time of the first peak in reach velocity profiles
A(FVP) Reaching velocity at the first peak in reach velocity profile
t(MDP): normalized time when the hand is maximally deviated from the midline
A(MDP) amplitude of maximum deviation from the midline
t(MAP): normalized time of the major peak in the aperture profile
A(MAP): maximum aperture size
t(PI): normalized time of the initialization of the major aperture peak
Table 5.1. Summary of features used in assessing reaching and grasping modules.
We note that all time occurrence features are based on the normalized time unless it’s not
normalized.
5.2.2. Six patterns in the grasping module
We collated all the trials regardless of subjects and hands, and then classified the
data into six groups with the following criteria. First, we used occurrence time of the
primary aperture peak. As shown in Figures 5.3b and 5.3c (the second column, in section
1 0 2
2.2, figure 5.3 refers the second column), there were trials whose primary peak – t(MAP)
is earlier than 50 % of the movement time. Second, we used difference between the
maximum aperture size and the final aperture size. Even though we asked subjects to
precision pinch the target object (the object’s diameter is 2.4 cm), part of subjects tended
to grip around the object. So, the final aperture size was sometimes smaller than the
object size, as shown in Figure 5.3. As shown in Figure 5.3a, some subjects tended to
preshape with the maximum aperture at nearby 50% of the movement time and sustained,
showing that difference between the maximum aperture size and the final aperture size is
relatively small. Third, we used the time occurrence of preshape initialization – t(PI).
There were some groups whose starting time of the primary aperture peak was delayed
(Figures 5.3e and 5.3f), where their t(PI) were later than 30% of the total movement time.
Fourth, we used the existence of the apparent secondary aperture peak as shown in
Figures 5.3c and 5.3f. These four criteria can distinguish each pattern in grasping from
the others. We summarized our schemes in Table 5.2, following with the description of
each pattern. The detailed classification is shown in Table 5.2.
1 0 3
Pattern in
grasping
t(MAP)
< 50 %
Small
A(MAP) –
A(Final)
T(PI)
> 30 %
Apparent
secondary
peak
Related
figure
Fast preshape X O X X Figure 5.3a
Early Preshape O X X X Figure 5.3b
Double Early
Preshape
O X X O Figure 5.3c
Independent
Preshape
X X X X Figure 5.3d
Late Preshape X X O X Figure 5.3e
Double Late
Preshape
X X O O Figure 5.3f
Table 5.2. Six different groups in grasping module with four criteria.
1 0 4
0 0.5 1
0
100
velocity(cm/s)
normalized time
0 0.5 1
0
5
aperture(cm)
normalized time
0 5
0
100
velocity(cm/s)
aperture(cm)
0 0.5 1
0
100
velocity(cm/s)
0 0.5 1
0
5
aperture(cm)
0 5
0
100
velocity(cm/s)
0 0.5 1
0
100
velocity(cm/s)
0 0.5 1
0
5
aperture(cm)
0 5
0
100
velocity(cm/s)
0 0.5 1
0
100
velocity(cm/s)
0 0.5 1
0
5
10
aperture(cm)
0 5 10
0
100
velocity(cm/s)
0 0.5 1
0
100
velocity(cm/s)
0 0.5 1
0
5
aperture(cm)
0 5
0
100
velocity(cm/s)
0 0.5 1
0
100
velocity(cm/s)
0 0.5 1
0
5
10
aperture(cm)
0 5 10
0
100
velocity(cm/s)
(a) FP
(b) EP
(c) DEP
(d) LP
(e) DLP
(f) IP
(a) FP
(b) EP
(c) DEP
(d) LP
(e) DLP
(f) IP
Figure 5.3. Six different groups of coordination patterns, showing transport velocity
profile (the first column), aperture size profile (the second column), and coordination
between transport velocity profile and aperture profile (the third column). Each row
represents a different group: (FP) fast preshape, (EP) early preshape, (DEP) double early
preshape, (LP) late preshape, (DLP) double late preshape, and (IP) independent preshape.
We show only 10 trials for each group. We note that these six different groups may each
contain different subjects. A typical trajectory for each pattern is shown in red.
1 0 5
Figure 5.4 shows how our criterion separate the data set into three distinct groups to.
Because our features of the grasping module in time occurrence is t(MAP) and t(PI), we
could not show “fast preshape (FP)” patterns, which did not have maximum aperture.
Moreover, our feature does not characterize the secondary aperture peak. We thus
grouped double early preshape (DEP)” with “early preshape (EP)”, and “double late
preshape (DLP)” with “late preshape (LP)”. We called “SP”, “EP+DEP”, “LP+DLP”,
and “IP” the four major groups. We performed pair-wise Hotelling’s two-sample T-
square distribution (F=798.9221 between EP+DEP and LP+DLP, F= 246.1691 between
LP+DLP and IP, F= 204.5983 between EP+DEP and IP, all significant with p<0.05).
-0.2 0 0.2 0.4 0.6 0.8 1
-0.8
-0.2
0.4
t(PI)-t(MDP)
t(MAP)-t(FVP)
EP+DEP
LP+DLP
IP
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
t(PI)
t(MAP)
EP+DEP
LP+DLP
IP
(a) (b)
Figure 5.4. (a) Pattern separability of grasping features only with t(MAP) and t(PI),
and (b) pattern separability of coordination with hyper-parameter t(MAP)-t(FVP) and
t(PI)-t(MDP), where MAP, PI, FVP, and MDP represent maximum aperture peak in
aperture profile, preshape initialization in aperture profile, first velocity peak in transport
velocity profile, and maximally deviated point in the trajectory respectively. The legend
denotes EP=early preshape, DEP=double early preshape, LP=late preshape, DLP=double
late preshape and IP=independent preshape. “Fast preshape” patterns are not shown on
the figure because they do not have a major peak in the grip aperture profile. The
separability with grasping features only is slightly better than pattern separability with
hyper parameters.
1 0 6
5.2.3. Two patterns in the reaching module
The data include two types of reach velocity profile: one which contains a single
peak (Figure 5.5a left), and the other which contains two peaks (Figure 5.5a, right). We
decompose a velocity profile in direction towards the given goal (Figure 5.5b) from a
velocity profile in the deviating direction (Figure 5.5c). Because the reach velocity profile
is a combination of these two velocity profile, the shape of the reach velocity profile is
affected by the maximum velocity peak in the velocity profile toward the given goal
(Figure 5.5b) and amplitude of the velocity profile in deviating direction (Figure 5.5c), as
well as their shapes. When its amplitude is small, the velocity profile towards the given
goal was prominent in the combined velocity profile (Figure 5.5, left column). However,
in both cases, the occurrence of the first velocity peak (or the starting of the velocity
plateau) was determined by the first velocity peak of the velocity peak in the deviating
direction. When the velocity profile towards the given target was symmetrical, the
combined velocity profile had two velocity peaks whose values were almost the same
(not shown). Similarly, when the velocity profile towards the given target was skewed,
one of the velocity peaks was larger than the other (Figure 5.5a, right). Thus, we may
consider this one velocity peak pattern as an extreme case of asymmetric velocity peak
pattern. Because the zero-crossing in the velocity profile in the deviating direction
indicates the maximally deviated point, the velocity valley in the combined velocity
profile is generally aligned with the maximally deviated point, even though the actual
maximally deviated point shifted towards the larger velocity peak in the combined profile.
There was no group difference in t(FVP) – (SP: 0.278 ±0.096, EP&DEP: 0.257
±0.093, LP&DLP: 0.227±0.081 and IP: 0.267±0.081), in t(MDP) – (SP: 0.459±0.055,
1 0 7
EP&DEP: 0.458±0.067, LP&DLP: 0.428±0.077 and IP: 0.456±0.088). Due to sensing
noise and smooth transition between these two patterns, it is hard to assert whether the
velocity pattern has one peak or two. Here, we classified them manually. Double aperture
peak patterns (DEP and DLP) are more related to one velocity peak in the reach velocity
profile and the independent preshape (IP) correlates with double velocity peak (SP: 22/22,
EP: 18/16, DEP, 29/17, LP 20/21, DLP: 33/20 and IP: 3/19).
0 0.5 1
0
50
100
150
velocity(cm/s)
0 0.5 1
0
50
100
150
velocity(cm/s)
0 0.5 1
-100
0
100
velocity(cm/s)
normalized time
0 0.5 1
0
50
100
150
velocity(cm/s)
0 0.5 1
0
50
100
150
velocity(cm/s)
0 0.5 1
-100
0
100
velocity(cm/s)
normalized time
(a)
(b)
(c)
Figure 5.5. Two types of reach velocity profile. The left column shows velocity
profiles with a single velocity peak, and the right panel shows velocity profile with two
velocity peaks in (a) the combined velocity profile. We also show (b) the velocity profile
towards the given goal, and (c) the velocity profile in the deviating direction.
5.2.4. Coordination patterns relate more to grasp strategy than reach strategy
As shown in the previous section, the reaching module’s two patterns were not
closely related with the grasping module’s four patterns. We now consider how the
coordination patterns correlate with the features of the reaching module and grasping
1 0 8
modules. The coordination pattern’s apices are closely related to the time occurrence of
the features we defined in section 5.2.1. Most of the sample trials in Figure 5.3 (in section
5.2.4, we referred the third column) showed that the left-most edge (starting edge)
represents a rapid increase in reaching velocity while the aperture does not increase much
(except in Figure 5.3b and 5.3c). The left-upper apex always occurs on t(FVP) because of
the maximum transport velocity. On the contrary, the right-upper apex has different
characteristics. In Figure 5.3.a where the coordination pattern looks like a rectangle, the
right upper apex is related to starting to sustain the aperture size. In Figures 5.3.b and 3c
where the coordination pattern looks diagonally aligned, the right upper apex happened
when t(MAP) and t(FVP) are closely correlated because the A(FVP) and A(MAP) is
maximal on that point. In Figures 5.3d,e and f, where the coordination pattern looks a
triangle, the right apex occurred when the A(MAP) is the largest and it’s not the same
time with t(FVP) or t(MDP). The fundamental difference between Figure 5.3d and Figure
5.3e&f is t(PI) in our feature set; however it cannot be captured by this diagram. We note
that Figure 5.3d has the larger maximum reaching velocity correlated with a shorter
movement time then the other cases. We found that t(MDP) predicts t(IP) well (R>0.58)
in the LP+DLP group (R<0.3 in other groups), which implies that the preshape may be
initialized nearby when the hand was maximally deviated from the midline in LP+DLP
group.
To show this relationship for different grasping patterns, we re-draw Figure 5.4a
with a hyper parameter, t(MAP)-t(FVP) – which would be close to zero for EP+DEP
group, and t(PI)-t(MDP) – which would be close to zero for DP+DLP group (Figure 5.4b).
This graph also shows that in EP+DEP pattern., t(MDP) could not predict t(MAP)
1 0 9
(R<0.2) and t(MAP) is smaller than t(MDP) (one-tail t-test, p<1e-17), which implies the
maximum aperture occurs while detouring. Even though this graph is less separable
compared to Figure 5.4a, it’s still significant (F=613.8481 between EP+DEP and
LP+DLP, F= 218.5505 between LP+DLP and IP, F= 125.2069 between EP+DEP and IP,
all significant with p<0.05). Thus we concluded that reaching component did not induce
variability in coordination pattern. By contrast, the coordination pattern is influenced by
the grasping pattern.
5.2.5. Variability of patterns over subjects, hands and trials.
We showed the grasping patterns defined in the section 5.2.1 for each subject and
each hand. Each subject has his/her own dominant patterns – this is inter-subject
variability. A single subject may have different patterns for the two hands (intra-subject-
variability, Figure 5.6. Subject 1 and Subject 14). Also Even a subject using a certain
hand may have different patterns (inter-trial variability). Each percentage is determined
from a 20-trial dataset. As shown in Figure 5.6, Subject 2, Subject 4, Subject 6, Subject
14’s left hand and Subject 18 have their own dominant patterns. Subject 1 and Subject
14’s right hand did not show any trend for a certain pattern emerging as dominant at the
end of the 20-trial session.
There is no learning effect of the pattern variability as shown in Table C.1 in
appendix part C. Also, there is no age-related trend (C001:52, C002:59, C004:24,
C006:34, C014:56, and, C018:44 years old).
1 1 0
Figure 5.6. Inter-subject, inter-hand and inter-trial variability.
5.3. A computational model and its implications
We here extend the Hoff-Arbib (1993) model of reach-grasp coordination to
reproduce the variability in the strategy. The data in the previous section can be classified
into four different groups in grasping module, with four correspondingly different group
in coordination pattern. To reproduce detouring reach behavior, we introduce a virtual
target hypothesis and relaxed time-based coordination in section 5.3.2. Based on the
correlation shown in Figure 5.7a, we used the time-based coordination (1993) with an
appropriate activation signal for each module. Using the correlation between features, we
reproduce a representative trial of each strategy pattern.
1 1 1
5.3.1. The Hoff-Arbib model
The minimum jerk model (Flash & Hogan, 1985) of reaching to an endpoint states
that optimal behavior minimizes the integral of the square of the jerk (the third derivative
of the movement) over the whole trajectory. The resultant optimal trajectory is a smooth
reaching movement with a straight trajectory from the initial position to the target and
symmetric bell-shaped velocity profile.Trajectories close to this optimum have been
observed by many authors (Flash & Hogan, 1985; Morasso, 1981; Paulignan et al., 1997).
The Hoff-Arbib model (Hoff, 1992; Hoff & Arbib, 1993) extends the minimum jerk
model in three ways: (i) by converting computation of reach trajectory from an
optimization of the whole trajectory to an on-line controller, (ii) by introducing an
optimized controller for grasp aperture; and then (iii) using an estimation of the time
required by each controller to finish its movement as the basis for coordination in the
reach-to-grasp (Figure 5.7) to meet the equifinality constraint that grasping and reaching
movements should finish at the same time. Each module has optimal estimators to find a
required duration of reaching and or preshaping under given accuracy constraints. Then
the time required for reaching was required to equal the sum of the time required for
preshaping and a constant enclosing time.
1 1 2
(a) (b)
Transport
Preshape
Time
coordinator
time need duration
time need time need duration
duration
+
-
max based
Transport
Preshape
Time-activation
based
coordinator
Transport
Delayed
initialization
Switch to
a given target
time need duration
time need time need duration
duration
Assigned
Assigned
Enclose Enclose
Figure 5.7. (a) A model of time-based coordination between feedback controllers for
reach and grasp. The transport and preshape controller each compute the time needed to
complete their task, the coordinator slows down the faster one so that the duration of
transport = the duration of preshape + enclose (adapted from Hoff and Arbib (1993)). (b)
Time-activation based coordination of reach to grasp with detour around a barrier. We
now postulate two activations of the transport schema: the first for movement towards a
virtual target positioned to yield a trajectory that avoids the barrier; the second toward the
given target. A time-activation based coordinator (high-level controller) controls the
speed of each component by setting required time to finish and activation of (or transition
to) the next sub-schemas in both components. The general framework allows diverse
strategies depending, for example on the timing of the switch between the two transport
schemas, and the time of initialization of the preshape. We also reject the Hoff-Arbib
assumption of constant time for enclose. Thin solid lines denote information flow and
thick dotted lines denote activation signals.
Whereas the target position and the movement time were fixed in the original
minimum jerk model, the Hoff-Arbib model adapted the value of the time required so that
it was based on the current state (position & velocity) of the hand and target. This
allowed the model to explain the results of experiments with perturbation of both target
location and target size (Paulignan, Jeannerod et al., 1991; Paulignan et al., 1990;
Paulignan, MacKenzie et al., 1991) after movement onset has occurred. The Hoff-Arbib
model showed that alterations in coordination could be explained by enforcing the
1 1 3
equifinality constraint. If, due to a perturbation of a given target, the time required for
reaching increases, then the grasping module will slow down its behavior, and vice versa
for perturbation of target size.
Interestingly, each controller in the model acts as both a feedforward controller and a
feedback controller. Each controller estimates current arm-and-hand state and time
required “D”. If feedback signals are on pace, this D value is also on pace. In this case,
the model acts as a feedforward controller which, in the absence of perturbation,
produces a trajectory satisfying the minimum jerk criterion (Flash & Hogan, 1985).
However, if errors occur (e.g., due to perturbation), the system will respond to the error
after a suitable delay. The reader may consult Hoff and Arbib (1993) for the equations
which characterize the model.
5.3.2. Extending the model to detouring in the reach-to-grasp
Our presentation in Figure 5.4 of pattern separability of both grasping features and
coordination patterns supported a division of the data acquired with the apparatus of
Figure 5.1 into 4 groups: EP + DEP, LP + DLP, IP and (not shown in the figure) FP. Our
task in what follows is to show that the variation between these groups (as distinct from
the variability of trajectories within each group) can be explained by expanding the Hoff-
Arbib model of Figure 5.7a to the model shown in Figure 5.7b – where we show that
different strategies of time-activation based coordination correspond to these 4 classes of
detour behavior.
We focused on how movement may show the variability with minimum parameters
which is required; but we did not model how those parameters are selected while
1 1 4
planning. Thus, once we defined those parameters, we directly or indirectly estimated
them for use in simulation of a given trajectory, thus showing how a high-level choice of
strategy can determine which of the patterns of Figure 5.3 will be exhibited.
To reproduce a curved trajectory in the barrier paradigm, we propose the virtual
target hypothesis (Figure 5.8), in which the initial movement is toward a virtual target
positioned to avoid then barrier, and then (a “virtual perturbation”) the target is switched
to the actual target. The dynamics of the reaching component smoothes out the effect of
this switching, where the reaching dynamics acts as a filter. The virtual target is defined
by 1) its distance from the start, 2) the initial direction and 3) the time of switching to the
actual target.
According to the Hoff-Arbib model, when the speed control parameter is set to a
certain constant, the distance between the start location and virtual target will determine
the movement time of the detouring movement. The initial direction can be extracted
directly from the experimental data. We note that we could not generalize how to
compute the initial direction because the experimental data contains only one
combination of target location and barrier location in detouring movement conditions.
The switching time between the virtual target to the given target is crucial. It determines
how long the evading movement lasts and thus affects the shape of the trajectory. In the
virtual target hypothesis, the movement towards the virtual target is always straight. Only
when the switch to the actual target occurs does the trajectory start to curve. We can thus
estimate the switch time from the experimental data.
1 1 5
Start
End
Virtual Target
Initial direction and distance
Switch Time
Figure 5.8. Virtual target hypothesis for reaching, where the black curved line is the
reach path from experimental data and the red straight line is the estimated path to the
virtual target.
The grasping movements in the double late preshape patterns and late preshape
patterns have three phases: 1) an initial phase before preshape initialization, which may
include minor peaks, 2) preshaping towards a maximum grip aperture, and 3) enclosing
towards a given object, which may include minor peaks (Figure 5.2b, Figure 5.3’s second
column). During the initial phase, grip aperture increases to a certain amount. We note
that there is only one object size in this detouring movement condition in the available
empirical data (Tretriluxana et al., 2008) and so we have no data on how preshape
correlates with the object size. Instead, we assume that this increase is because of a
tendency to hold a resting grip aperture size (~2cm). The constant enclosing time
postulated in the Hoff-Arbib model (Hoff, 1992; Hoff & Arbib, 1993) does not match the
detour data, so we instead just measure the time of the peak aperture in the detouring
reach-to-grasp experimental data. In the early preshape pattern and double early preshape
patterns, the initial phase was considered as being omitted, while the enclosing time is
relatively long. In the independent preshape and fast preshape patterns, the initial phase
was considered as being omitted.
1 1 6
The original Hoff-Arbib model coordinates the reach and grasp by their total
duration. However, in the presence of the barrier, the reach and grasp are each divided
into subschemas. We hypothesize that variability in patterns of the equifinality for sub-
schemas is responsible for the variability in coordination pattern seen in the data. In our
detouring movement, reaching movements contain three subschemas: evading, switching
and approaching, while grasping movements can contain three subschemas: initial phase,
preshaping, and enclosing.
Section 5.2 extracted features which may indicate these subschemas’ ending or
starting times. We estimated that
• the first velocity peak (FVP) is near the switch time between the virtual target and
the given target
• the maximally deviated point in reaching movement (MDP) represents the ending
time of switching the target and the starting time of approaching the real target
• the preshape initialization in the grasping movement (PI) represents the ending
time of the initial phase and the starting time of the preshaping, and
• the maximum aperture peak (MAP) represents the ending time of the preshaping
and the starting time of enclosing.
In Figure 5.4, we showed that late preshape and double late preshape patterns have
similar values of t(PI) and t(MDP), which may imply that (double) late preshape pattern
has equifinality of switching movement and the initial phase. In the same manner, the
early preshape and double early preshape patterns have similar values of the first velocity
peak and maximum aperture peak which may imply that evading movement and
1 1 7
preshaping end together. To be more precise, we quantitatively fit the extended Hoff-
Arbib model to the given data. The actual trajectory then depends on the equifinality
pattern as well as the numerical patterns shown in Table 5.3:
Module Parameter name How to compute
Reaching delays From Hoff (1992)
speed control
parameter, R
Tuned to match the movement time and the
movement length
Location of a virtual
target
Analyze the trajectory to obtain an initial direction.
The distance is defined through R.
Switch time from a
virtual target to a
given target
Comparing the data and the simulated trajectory,
which were re-sampled for point-by-point
comparison.
D Computed from the speed control parameter and
the movement length
Grasping delays From Hoff (1992)
timing of preshape
initialization
Measured from the grip aperture profile,
characterized by the drastic change in grip aperture
velocity.
size of maximum grip
aperture
Measured from the grip aperture profile
timing of maximum
grip aperture
Measured from the grip aperture profile
final aperture Measured from the grip aperture profile
Table 5.3 Parameters for the extended Hoff-Arbib model.
We do not offer a model of how the parameters shown in Table 5.3 are generated by
the brain of the subject; we do offer a hypothesis on how a high-level choice of strategy
can determine which of the patterns of Figure 5.3 will be exhibited. As shown in Figure
5.7b, a high-level scheduler controls the speed of each component by setting required
time to finish and activation of (or transition to) the next sub-schemas in both component.
Table 5.4 presents the coordination strategies that we posit for our 4 different classes of
1 1 8
detour behavior. An assessment of these hypotheses will be offered with the simulation
results in Section 5.3.3.
Trajectory
Class
Coordination Strategy
FP grip aperture reaches the size of object with less
overshoot and is sustained until the hand is very
close to the target
EP + DEP evading movement and the preshape start and
end together
LP + DLP preshaping starts with approaching movement
IP preshape is independent from reaching
component.
Table 5.4. Coordination strategies for the four groups of detour trajectories.
We now describe how we used the data to extract the parameters of Table 5.4 as the
basis for showing how the model fits the data as it independently executes reaching and
grasping with the given parameter values. Using a nonlinear optimization procedure, a
speed controlling parameter R was initially estimated in a non-barrier paradigm; which
might be different for each hand. This parameter is used for the barrier case, with a fine
tuning. Because the distance to the virtual target is not given, we may change it to control
speed of the evading movement differently from the speed of the approaching movement.
The switch time from the virtual target to the given target is found through comparing the
actual trajectory with simulated trajectories in Cartesian coordinate. Even though the
grasping module has more parameters, obtaining them is rather simple through analyzing
the data. The size and timing of the maximum grip aperture and the size of the final
aperture were simply measured from the grip aperture profile. We used t(PI) in the
previous analysis for the ending time of the initial phase.
1 1 9
Because our data exhibit a certain level of inter-trial variability, averaging across
trials may smear each pattern’s characteristics. So we first dissociated subject condition
and hand condition to determine the pattern. Then we showed the inter-subject and inter-
hand variability in the coordination pattern.
5.3.3. Simulation results
We randomly selected experimental data from each strategy group and fitted our
model to it. For the reaching component, initial direction and speed controlling
parameters are measured from the data while switch time from the virtual target to the
real target is manually found to match A(MDP), t(MDP) and t(FVP). For the grasping
component, we measured t(PI), t(MAP), and A(MAP) from the data and utilized the
statistical analysis result for coordination. We computed a root-mean-squared error to
measure the similarity between the experimental data and the simulation data.
In the following 4 subsections, we explain the coordination schemes (Table 5.4)
which we hypothesize explain the inter- and intra-subject variability in coordination
patterns, and present a typical simulation result in each case. The used parameters are in
Table C.2 in Appendix part C.
1 2 0
0 500
0
50
100
150
time (msec)
velocity (cm/sec)
0 500
0
2
4
time (msec)
aperture (cm)
-0.2 -0.1 0
0.1
0.2
0.3
m
m
0 5
0
50
100
150
velocity (cm/sec)
aperture (cm)
0 500
0
50
100
150
time (msec)
velocity (cm/sec)
0 500
0
5
10
time (msec)
aperture (cm)
0 0.1 0.2
0.1
0.2
0.3
m
m
0 5 10
0
50
100
150
velocity (cm/sec)
aperture (cm)
0 500
0
50
100
150
time (msec)
velocity (cm/sec)
0 500
0
2
4
time (msec)
aperture (cm)
-0.2 -0.1 0
0.1
0.2
0.3
m
m
0 5
0
50
100
150
velocity (cm/sec)
aperture (cm)
0 500
0
50
100
150
time (msec)
velocity (cm/sec)
0 500
0
2
4
time (msec)
aperture (cm)
0 0.1 0.2
0.1
0.2
0.3
m
m
0 5
0
50
100
150
velocity (cm/sec)
aperture (cm)
(a) FP
(b) EP
(c) LP
(d) IP
Figure 5.9. Comparison of a typical pattern of movement (shown in black solid) and
the result obtained from the model (shown in red dots) using numerical parameters (Table
C.2 in Appendix part C) extracted from the data for (FP) fast preshape, (EP) early
preshape, (LP) late preshape and (IP) independent preshape. In each case,, the first
column represents its trajectory, the second column its transport velocity profile, the third
column its aperture profile and the fourth column the coordination between the transport
velocity profile and the aperture profile. The coordination schemes are described in Table
5.4 and the text.
5.3.3.1. Fast preshape (FP)
Fast preshape (FP) has a sustained grip aperture after fast preshape (Figure 5.9a). In
terms of interdependency, this is similar to the IP pattern (3.3.4) – the two visuospatial
1 2 1
channels seem to be independent. However we noted that the sustained grip aperture
starts around 50% of movement time, which is approximately similar to maximally
deviated point.
5.3.3.2. Early Preshape (EP) and double early peak (DEP)
In early preshape (EP), the major peak in the grip aperture profile occurs before 50%
of movement time. Its auxiliary pattern is the double early peak (DEP), which has a
secondary peak, in addition to the first peak, after 50% of movement time. For EP pattern
in grip aperture (Figure 5.9b), early activation of preshaping module with a short required
time for preshaping, which ensures the equifinality of the evading reaching movement
and preshaping, would induce a larger maximum peak aperture than other patterns, as is
supported by experimental results (Rand, Squire, & Stelmach, 2006) and insights from
computational models (Simmons & Demiris, 2006; Smeets & Brenner, 1999). Then,
because the enclosing starts earlier than other patterns, there is a long enough time for
enclosing to assure equifinality of the approaching movement and enclosing. This long
enough time allows slow resting to the final aperture, which is similar to experiments
with initial aperture (Timmann, Stelmach, & Bloedel, 1996). For DEP pattern, there
possibly exists another small preshaping to ensure grasping. We suspected that when
enclosing starts, there might be a transition between sub-schemas also in the reaching
component; which may be related to the switch time between the virtual target and the
actual target. Here, we note that time occurrence of minor aperture peak in DLP
(183msec) was within 25 ms of t(MAP) in EP+DEP (208msec), where the sampling rate
is 120Hz.
1 2 2
5.3.3.3. Late Preshape (LP) and double late preshape (DLP)
In Late Preshape (LP), the major peak in the grip aperture profile occurs after 50%
of movement time and the dominant preshape starts after 30% of movement time. Its
auxiliary pattern, double late preshape (DLP), has a small peak before the major preshape.
For LP pattern in grip aperture (Figure 5.3c), activation of preshaping is delayed until a
certain point, which is shown as maximally deviated point in trajectory based on our
statistical analysis. This delayed activation of preshape has been reported by Cuijpers,
Smeets and Brenner(Cuijpers et al.) who showed that the orientation difference between
the opposition axis of the thumb and index finger and the opposition axis of the target
played an important role in initiating preshaping. This orientation difference in detoured
reach-to-grasp was widely varied and needs to be analyzed. In her thesis Tretriluxana
(2008) suggested that the minimal reach velocity between two reach velocity peaks may
trigger the delayed initiation of preshaping. We note that this point is generally matched
with the maximally deviated point, considering x-y movement decomposition (Section
5.2.3 and Figure 5.5, right). However, we are skeptical on this idea because the valley in
the reach velocity is sometimes missing in other velocity profiles (Figure 5.5, left).
For DLP, activation of major preshaping is the same as for LP p. Its pattern is
generally very similar to online perturbation experiments (Desmurget et al., 1996;
Paulignan, Jeannerod et al., 1991; Paulignan et al., 1990; Paulignan, MacKenzie et al.,
1991). So, we might consider this pattern as an online perturbation, that is, first, the
preshape was initiated with a certain amount of required time, and second, closing the
aperture occurs when there is a target switch (from a virtual target to the real target), then
the major preshape is initiated. This explanation naturally included a switch time of
1 2 3
virtual target and implied that this target switch requires a longer time to complete the
reaching component. Then in response due to this longer movement time, the grasping
component reaches a smaller aperture size.
5.3.3.4. Independent Preshape (IP)
In Independent Preshape (IP), the major peak in the grip aperture occurs after 50%
of movement time and the dominant preshape starts before 30% of movement time. This
pattern is very similar to LP in the occurrence time of the maximum aperture, and to SP
in the size of the maximum aperture. For IP pattern (Figure 5.9d), its grip aperture profile
was similar to a profile of non-barrier condition. This pattern could be treated as a
separation between two visuospatial channels. As an alternative view, we noted that even
in the non-barrier condition, the trajectory was slightly curved (Tretriluxana, 2008;
Tretriluxana et al., 2008). This presumably is required to position wrists slightly off-
center to accommodate the grasp of the target.
5.4. Discussion
The virtual target hypothesis suggests the detouring reaching movement may have
subschemas which for both an evading movement and an approaching movement. We
fitted our revised model to the individual data (Tretriluxana et al., 2004, 2008) and we
showed the virtual target characterizes detouring movements. The time-activation
coordination hypothesis formalized how equifinality of sub-schemas using time
occurrence of major features in reaching component and grasping component can explain
the variability in grasping and its corresponding coordination patterns.
1 2 4
Previous controllers (Flash & Hogan, 1985; Rosenbaum et al., 1999; Simmons &
Demiris, 2006) of two-phase detouring movements are based on via-point (or via-
posture) approach. Except in the via-posture approach of Rosenbaum et al. (1999), the
via-point is a point on the trajectory itself which can be used to define the curve in the
reaching movement, which has constraints in velocity and/or acceleration. In contrast, our
virtual target hypothesis invokes a virtual target that determines the initial trajectory but
is not passed through. The approach has flexibility, accommodating the case where there
is one or two peaks, a variation identified in our experimental dataset.
Our model captured variability in coordination pattern found in the experiments
(Tretriluxana et al., 2004, 2008). In her thesis, Tretriluxana (2008), indicated that a
majority of subjects (10/12, over 6 subjects’ right and left arm) showed delayed peak
aperture with a smaller peak aperture. Except for EP+DEP dominant subjects, subjects’
dominant patterns (LP+DLP, IP) have the peak aperture after 50% of movement time.
Moreover, because all the non-barrier paradigm elongated movement time (Mon-
Williams et al., 2001; Saling et al., 1998; Tresilian, 1998; Tretriluxana, Gordon, Arbib et
al., 2009), which may be the same effect of slowing down, the maximum aperture size is
decreased, as reported by Rand et al. (2006). On the contrary, previous models (Hoff &
Arbib, 1993; Simmons & Demiris, 2006; Ulloa & Bullock, 2003) have too tight a
coupling between two modules, which can only exhibit a double late preshape
coordination.
Sabes and Jordan (1997) indicated that the vector from the nearest point to an
obstacle (a big triangular tip) in the trajectory lies with the minor eigenvector of the
sensitivity matrix, which is a inverse of endpoint stiffness, where variability in the
1 2 5
direction of the minor eigenvector is minimal. They indicated that the kinematics of the
arm may be important because of this sensitivity and, so, the optimal detouring
movement would be affected by postures and direction of movements. However, the
movement in Jordan and Sabes excludes grasping. Thus, there is no interference from the
grasping module. The experimental data used here (Tretriluxana et al., 2004, 2008) shows
different reaching strategies affected by the grasping module. Further, even though
subjects in Sabes and Jordan’s study cannot see their arms, they can see their endpoints
(fingertips) with respect to the obstacle whereas subjects in Tretriluxana et al.’s cannot
see their arms and fingers at all until the fingers are near the actual target. Thus there is
no visual feedback to use with respect to the obstacle.
This lack of visual feedback may be related to the variability of reaching movements.
If the visual feedback were provided, subjects could see whether their hands were on
course to collide with the barrier. As a result, evading movements may be larger (i.e.
more curved) or longer (i.e. slower) than when visual feedback is available.
Variability in our data set, also, may be because our experiment is less constrained
than other previous experiments. Instead of providing a via-point or a via-posture
(Rosenbaum et al., 1999; Simmons & Demiris, 2006; Vaughan et al., 2001), we asked
subjects simply not to collide with the barrier, so they can more freely choose the shape
of trajectory. Shape of the barrier, which was a thin cylinder, also helped to choose the
trajectory freely, contrast to triangular tip in Sabes and Jordan (1997). In final posture,
though we asked subjects to do a precision grip, they sometimes did a power grip, which
wrapping around the object with fingers. The final aperture size of independent preshape
(IP) pattern is less than 2.4 cm. Also part of early preshape and double early preshape
1 2 6
pattern had less-than-2.4 cm final aperture size. Where the more straight reaching
movement may required less movement time, noting shorter movement time of those two
patterns (FP and IP) then the other patterns’ (EP+DEP and LP+DLP), the approaching
direction is more shallow. In contrast, ellipsoidal target may control approaching
direction (Cuijpers et al., 2004). Cuijpers et al.’s data showed that the opposition axis
alignment may delay preshape initialization. In our data set, this constraint was also
relaxed. The high inter-trial variability in subjects 1 and 14 may imply a smooth
transition between patterns, because each pattern’s parameters had broad distribution
(figure 5.4).
In summary, the reasons of variability in detouring reach-to-grasp strategy may be 1)
uncertainty of the barrier location where the visual feedback is prohibited, related to
A(MDP) and/or t(MDP), 2) less constraint in trajectory, related to A(MDP) and/or
t(MDP), 3) less constraint in final posture, related to grasping precision requirement, and
4) less constraint in approaching direction, related to t(PI).
Our hypotheses give a hint of interdependency between the two components, which
is observed as equifinality of sub-schemas. As an example, in the EP+DEP pattern, the
evading movement and the preshape start and end together, while in the LP+DLP pattern,
the preshape starts with approaching movement (Figure C.1 in Appendix part C). Though
we indicate how variability occurs, we did not answer why there were only six (four)
patterns. This provides a fascinating challenge for future research.
Having said this, we close by noting several shortcomings of the present work that
point the way to new research that explores the important coordination patterns we have
revealed: First, we have only a relatively small number of subjects (six subjects) to
1 2 7
confirm the variability of patterns analyzed with our extended model. Second, because
we did not consider the kinematics of the arm, we could not analyze the effects of
kinematic constraints, such as the endpoint stiffness matrix (Sabes & Jordan, 1997; Sabes
et al., 1998) and direct control of joints (Rosenbaum et al., 1999). Third, we assumed that
a constant speed during the evading phase and the approaching phase in reach module.
However, when the velocity profile towards goal is skewed, this assumption is broken,
resulting that bad performance of our computational model in reproducing the given data.
Fourth, we did not perform statistical analysis of switch time with other major features
because the data were marred by artifacts due to miscommunication between sensors and
computer. We removed those artifacts manually because automatic removal failed, thus
limiting our ability for statistical analysis of the switch time. Finally, further studies are
needed to analyze how the switch time of the virtual target will change over different
movement lengths, different obstacle locations or different object sizes.
1 2 8
Chapter 6.
A cortical model of motor control
with extrinsic and intrinsic neural representation
6.1. Introduction
Since Penfield and Rasmussen (1955) localized the motor cortex area through
correlation between electrical stimuli and muscle twitches, the motor cortex has been
viewed as the center of motor control. However, the precise role of the primary motor
cortex is still open to debate even with abundant physiological evidences.
First, there is no model which can explain skill learning and topographical map
representation in reach-to-grasp behavior. The primary motor cortex contains
topographical map representations. Skill learning may yield plastic changes on the
primary motor cortex (Nudo et al., 2001; Nudo et al., 1996). They located map
representation of hand muscles of a monkey, then lesioned the area of the primary motor
cortex, deteriorating hand functions. After asking the monkey to pick up a small object
inside a small well, they located map representation of hand muscles again. They found
that the map representation of hand muscles expanded after an appropriate training,
reversing effects of lesions. Even though it is strongly believed that the skill learning
changes map formation and plasticity in the primary motor cortex, there are few
computational models, which explain the underlying mechanisms (Armentrout et al.,
1994; Y. Chen & Reggia, 1996; Goodall et al., 1997). Most of the models developed for
1 2 9
motor control ignore map representations and their reorganization (see sections 2.3 for
details) whereas cortical models which explain topographical maps have not focused on
motor control (see section 2.2.2 for details). The cortical models using a competitive
distribution theory (Reggia et al., 1992) used a loop including the motor cortex,
motoneurons in spinal cords, and a simple but biologically plausible arm model to
explain development of map representation (Yinong Chen, 1997; Y. Chen & Reggia,
1996) and to explain reorganization after a focal lesion (Goodall et al., 1997). However,
since their arm model is static, which only used relationship between high activation of a
motoneuron in spinal cords and short length of corresponding muscle, instead of its
muscle synergy or exerted tension (E.V . Evarts, 1968), their representation of the primary
motor cortex is rather simple.
Second, since Evarts (1968) correlated the firing rate of single neurons in the
primary motor cortex with the exerted force, four decades of research on what the
neurons in the primary motor cortex encodes has focused on the “muscle-versus-
movement” debate (Georgopoulos et al., 1986; Sakamoto, Arissian, & Asanuma, 1989;
Selemon & Goldman-Rakic, 1988). This controversy concerns whether or not the activity
of neurons in the motor cortex correlates with muscle contraction or movement of the
limbs. Georgopoulos et al. (1986) reported that neurons in the motor cortex of a monkey
activate differently according to different movement directions, a finding yielded by
recording neuronal activity during active movements. On the contrary, it has been
asserted that the primary cortex encodes a muscle related characteristic (Asanuma and
Sakata, 1967; R. Lemon, 1988; S.H. Scott, 2000; E. Todorov, 2000).
1 3 0
At the end of the twentieth century, Kakei et al. (1999) reported experimental
evidence that the primary motor cortex may encode both muscles and directions. During
wrist movement task with three different wrist postures (pronation, middle, and
supination), they measured preferred direction of each neuron in the ventral premotor
cortex and the primary motor cortex, and preferred direction of certain muscles of
monkeys. Then they found that, according to the postures, some neurons in the primary
motor cortex showed rotated preferred directions, whereas they found no change in the
preferred directions of neurons of the ventral premotor cortex. Even though Todorov
(2000a) insisted that a dominant encoding of neurons in the primary motor cortex is
muscle and the direction coding is an ‘epi-phenomenon’, he showed that the primary
motor cortex may show two aspects of neural coding at the same time.
Here, inspired by Kakei et al. (1999) and Todorov (2000a), I hypothesize a dual map
(Figure 6.1) in which the primary motor cortex contains both muscle coding and
directional coding. As for Todorov’s (2000a) hypothesis, the primary motor cortex
dominantly encodes muscles, especially muscle force, and the map formed by these
neurons is called the motor output map (Figure 6.1. red arrow), reflecting the
corticomotoneuronal projection (R. Lemon, 1988). Also, directional coding may be
encoded in the primary motor cortex, using the motor input map (Figure 6.1. gray arrow),
which learns posture-dependent associations between high level motor commands
(kinematic coding) from premotor cortex and muscle coding in the primary motor cortex.
The motor output map provides an action pool, which contains muscle force coding. This
muscle force coding implies the muscle’s preferred force direction and this direction
should be associated with the movement direction. The motor input map may learn this
1 3 1
association through either reinforcement learning, supervised learning or both. The aims
for the modeling presented here are first, to confirm this dual map hypothesis, and second,
to develop a computational model of how learning may reverse effects of stroke.
Figure 6.1. Overall structure for the dual map hypothesis of motor cortex. The gray
arrow indicates a motor input map learned through reinforcement learning, and the red
arrow indicates a motor output map, learned through competitive Hebbian learning.
6.2. Methods
Todorov’s (2000a) model may explain the dual map hypothesis, but it is not
developmental and involves a too simple arm model (not including musculoskeletal
system). Cortical models with a competitive distribution theory (see Section 2.2.2) use a
static arm model and thus the corresponding motor control is not based on muscle forces.
Here, I used a realistic arm model which represents the force exerted and a cortical model
whose motor output map is detailed enough to explain map representations and their
reorganization, and whose motor input map is precise enough for learning to improve
skills with the realistic arm model. Todorov (2000a) indicated that the characteristic of
1 3 2
muscles may be propagated up to the primary motor cortex. Thus, the better arm model
we have, the better cortical model would we have.
The overall structure of the dual map hypothesis is shown in Figure 6.1. The
feedback sensory signal is fed to kinematic coding, which encodes the difference between
the target location and current location in joint coordinates, including current joint
angular velocity. This difference vector might be encoded in the premotor cortex (Pesaran,
Nelson, & Andersen, 2006). They found that the premotor cortex contained target
position in the eye centered coordinate, hand position in the eye centered coordinate, and
target position in the hand centered coordinate. The latest one is corresponding to the
difference vector in our model. There is no strong evidence that premotor cortex encodes
either joint coordinate or extrinsic coordinate. Here I followed that other previous motor
control models’ kinematic coding, which was the joint coordinates (Bissmarck et al.,
2008; Kambara et al., 2009).
This association between a feedback (difference) signal and a certain muscle
synergy is similar to optimal gain based on a feedback signal in optimal feedback
controller (Todorov & Jordan, 2002). Optimization of cost, which includes both accuracy
and efficiency, can be also done by the reinforcement learning, maximizing sum of
rewards (R. S. Sutton, 1995; Richard S. Sutton & Barto, 1998). The cost-to-go in the
optimal control theory can be captured by a value function, which represents expected
rewards. Thus, I used reinforcement learning for motor input map adaptation. In
terminology of the reinforcement learning, kinematics coding serves as a state-space (R.
S. Sutton, 1995; Richard S. Sutton & Barto, 1998). They are fully connected to the motor
cortex model. Each motor cortex neuron represents certain muscle synergies, which is
1 3 3
defined in the motor output map, weight matrix from the motor cortex neurons to the
motoneurons in the spinal cord. Thus, the full connectivity from the kinematic coding to
the motor cortex neurons, the motor input map, recruited those piece-wise muscle
synergies to construct a movement, which directs the difference vector from the start to a
given target. The motor output map is learned through the competitive Hebbian learning
with Mexican hat activation patterns (Chernjavsky & Moody, 1990a, 1990b), based on
the coincidence frequency on a random activation on the motor cortex and motoneuron
activation patterns. The motor input map is learned through the temporal difference
reinforcement learning with the actor-critic framework (Doya, 2000b; Richard S. Sutton
& Barto, 1998).
The motor learning contains two phases: motor babbling (Kuperstein, 1988c)
without any target, and motor training with a specified targets. During the motor babbling
phase, the competitive Hebbian learning developed the motor output map based on
random movements. In this phase, because there is no target, the reward is not given.
Thus, the motor input map is unchanged. During the motor training phase, rewards is
available, the motor input map is learned. Ideally, both the motor input map and the
motor output map learns concurrently, here for simplicity, I disabled adaptation of the
motor output map while adjusting the motor input map. The concurrent learning may
raise an issue of balance between two learning process, as shown as a simplified model in
Chapter 3 & 4 (See Appendix B). Also during rehabilitation after stroke, only the motor
input map is adapted.
1 3 4
6.2.1. Arm model
The arm model (Figure 6.2) combines an appropriate skeletal system (Masazumi
Katayama, 1993; Lan, 2002; Spoelstra et al., 2000) with six Hill-type muscles (Cheng,
Brown, & Loeb, 2000; Lan, 2002; Zajac & Gordon, 1989): shoulder extensor (E),
shoulder flexor (F), elbow opener (O), elbow closer (C), biarticular bicep (B), and
biarticular triceps (D)
4
.
Figure 6.2. (a) Arm model with six muscles, redrawn (Kambara et al., 2009;
Masazumi Katayama, 1993). Six muscles are: shoulder extensor (E), shoulder flexor (F),
elbow opener (O), elbow closer (C), biarticular abductor (B), and biarticular adductor (D).
The radius of outer circle of the shoulder joint is a
1
, the radius of inner circle of the
shoulder joint is a
3
, the radius of outer circle of the elbow joint is a
4
, and the radius of
inner circle of the elbow joint is a
2
. (b) 8 targets on a circle 15 cm from the resting
posture’s endpoint. The leftmost target called target 1, and the others are numbered in
counter clockwise.
Details of the arm model are given in Appendix D, Our muscle model is based on
the work of (Lan, 2002; Zajac & Gordon, 1989) while the parameters and notation for the
4
The naming of muscles is followed by Armentrout et al. (1994)
1 3 5
arm dynamics are borrowed from Kambara et al. (2009). Even though a muscle receives
from multiple motoneurons, followed by Lan (2002), I simplified a motoneuron per
muscle, which abstracts a recruiting rate of motoneurons.
6.2.2 Cortical model of the motor cortex and motor output map
Figure 6.3. (a) cortical model, redrawn from Chernjavsky & Moody (1990a; 1990b).
Excitatory neurons receive full connectivity from an external unit (here for simplicity,
only one external unit was shown), and are laterally connected to the neighbors and to
inhibitory neurons. The inhibitory neurons would shunt the excitatory neuron when it is
activated (shown as blank arrows). The weight matrix is set to follow the physiological
structure of the neocortex, where the thickness of solid arrows is roughly captured weight
strength. (b) Topology of neighborhood neurons, redrawn from Reggia and his colleagues
(Armentrout et al., 1994; Y. Chen & Reggia, 1996; Cho & Reggia, 1994; Goodall et al.,
1997; Sutton III et al., 1994). Each neuron is surrounded by six neurons.
The cortical dynamics for the motor output map is based on the model by
Chernjavsky & Moody (1990a; 1990b). This cortex model follows the Mexican hat
activation pattern, that is, when one neuron is activated it also activates nearby neurons
whereas neurons farther from the center are inhibited. Because excitatory connections are
generally short-range and inhibitory connection is long-range through GABA neurons,
two layers of neurons are required: a pyramidal excitatory layer and a GABA inhibitory
layer (Figure 6.3). They used shunting inhibition from the GABA layer to the pyramidal
1 3 6
layer, which is more stable. The dynamics of the two types of neurons are given by
equation (6.8), where V
i
denotes activation level of the ith pyramidal excitatory neuron,
Q
i
denotes activation level of the ith GABA inhibitory neuron, which has a shunting
inhibitory signal S
i
.
∑
∑
+ − =
−
=
+ − + − =
j
j ij i
i
i
cortex i
i
j
j ij i i i
i
e
V R Q
dt
dQ
Q
sigmoid S
V L E S V
dt
dV
τ
α
δ
τ
) (
) 1 (
(6.8)
where E
i
is an external activation of the ith neuron (section 6.2.3), L
ij
is a lateral
connection weight matrix from the jth excitatory neuron to the ith excitatory neuron, R
ij
is
a connection matrix from the jth excitatory neuron to the ith inhibitory neuron, τ
e
is a time
constant of excitatory neurons, and τ
i
is a time constant of inhibitory neurons. Note that
inhibition shunts both self-excitation and lateral excitation of the pyramidal neurons.
Here, L
ij
and R
ij
are fixed to have a characteristic of the Mexican hat activation.
Connection strength can be defined with difference of Gaussian, g(0,σ
E
)-g(0,σ
I
) as a
function of distance between the ith neuron and the jth neuron, where g(0,σ) denotes a
Gaussian function whose mean is zero and width is σ, σ
E
is a width of excitatory
connectivity and σ
I
is a width of inhibitory connectivity, followed by Chernjavsky &
Moody (1990a; 1990b). After setting L
ij
to g(0,σ
E
)-g(0,σ
I
) and R
ij
to g(0, σ
I
)-g(0, σ
E
), I
remove all negative weights to obtain those fixed connectivity matrix. S
i
represents a
shunting signal from the GABA neurons; when the ith GABA neuron activates close to
one, the ith pyramidal neuron hardly activates.
1 3 7
I extended the 1D cortical model of Chernjavsky & Moody (1990a; 1990b) to a 2D
cortical model through the topology of neurons shown by Reggia and his colleagues
(Armentrout et al., 1994; Y . Chen & Reggia, 1996; Cho & Reggia, 1994; Goodall et al.,
1997; Sutton III et al., 1994), where each neuron is surrounded by six neighbor neurons,
keeping the distance to the neighbor neurons are all equal. Chernjavsky & Moody
(1990a; 1990b) used 1 dimensional array of neurons. However, to from a map formation,
I need 2 dimensional arrays of neurons. If a neuron has nine neighbors, diagonally
located neighbor is farther from the others. The hexagonal neighborhood would provide
all the neighbors has a same distance and the connection defined in Chernjavsky &
Moody (1990a; 1990b) using difference of Gaussian works well. Also, to remove the
edge effect, following Reggia and his colleagues, I constructed a torus, whose its leftmost
edge is connected to its rightmost edge, and its uppermost edge is connected to its
bottommost edge. The size of cortex is 20 by 20. This torus structure helped the motor
output map more equally distributed.
Each neuron on the motor cortex is fully connected to the six motoneurons in the
spinal cord through W
ij
, which denotes a connection strength from the jth neuron on the
motor cortex to the ith motoneuron in the spinal cord. The jth neuron on the motor cortex
has a specific connectivity to six muscles and represents a muscle synergy. The
motoneurons followed the dynamics below.
ν μ
τ
− − =
+ − + − =
∑
) (
i
MCX
i
j
j ij i m i
i
m
a sigmoid u
V W b c a
dt
da
(6.9)
where τ
m
is a time constant of motoneurons, b
i
denotes an activation level of the
1 3 8
antagonist to the muscle with activation level a
i
, and u
i
abstracts a recruiting rate of
motoneurons, which is assigned to a specific muscle, followed by Lan (2002). The lateral
connectivity c
m
plays an important role in summing up muscle synergies. When the motor
cortex is activated too much, the motoneurons in the spinal cord are also activated too
much and are clamped by the sigmoid activation function. This lateral connectivity will
prevent this clamping effect. μ would shift the activation level on a sensitive range of
sigmoid and ν controls co-activation level of muscles.
The motor output map is learned through competitive Hebbian learning. This is
basically Hebbian learning, but the Mexican hat activation patterns support competition
between all but the closest neurons. The cortical model for the motor output map mapped
the pyramidal layer’s output to motoneurons in the spinal cord. To consider effect of
inhibitory neurons, the learning rule need the original difference of Gaussian. Because
pyramidal neurons receive connections through |g(0,σ
E
)-g(0,σ
I
)| and GABA neurons
receive connections through |g(0, σ
I
)-g(0, σ
E
)|, V
i
-Q
i
is used (equation 6.10). Weight
normalization on the outflow synaptic weight, which made weight vector size equals to 1,
is used for stability. With a random stimulus of neurons on the pyramidal layer, the
induced motor output pattern is associated with the stimulus (Armentrout et al., 1994;
Kuperstein, 1988c).
MCX
i j j MCX ij
u Q V dW ⋅ − ⋅ = ) ( η (6.10)
where η
MCX
is a learning rate.
1 3 9
6.2.3. Kinematic coding for actor-critic framework and motor input map
The motor input map was learned through temporal difference reinforcement
learning (R. S. Sutton, 1995; Richard S. Sutton & Barto, 1998) in the actor critic
framework (Bissmarck et al., 2008; R. S. Sutton, 1988, 1995; Richard S. Sutton & Barto,
1998) in continuous time and space (Doya, 2000b).
Our brain may learn kinematic coding adaptively also. We may think of a receptive
field of each kinematic coding. Kinematic codings should cover the workspace entirely
and smoothly. At the same time, if a certain range of workspace is more used (more
focused), the receptive field of each neuron assigned to the range is smaller than the other
location. This automatic adaptation of receptive field can be done through unsupervised
learning. However, here, for simplicity, I used a uniformed distributed kinematic coding,
with a minor correction.
The feedback signal computed in joint coordinate activates a normalized radial basis
function network (Bissmarck et al., 2008; Doya, 2000b; Kambara et al., 2009), which
smoothly cover up input space. The NRBF network is fully connected to the cortical
model (Section 6.2.2), where the connection represents a motor input map. The motor
input map thus associates the difference vector in joint location and joint velocity with
the activity of motor cortex neurons, each of which encodes a muscle synergy via the
motor output map. This transformation via input and output motor maps results in a set of
varying feedback gains (See section 2.3.3.2).
1 4 0
The normalized radial basis function (NRBF) network (Doya, 2000b; Kambara et al.,
2009) follows the equation 6.11, where x=[θ
target
- θ
current
,0- θ’
target
]. and c
i
and S
i
represents the center and shape of the ith RBF.
∑
− −
− −
=
j
j j
k k
k
b
) ) ( exp(
) ) ( exp(
) (
2
2
c x S
c x S
x (6.11)
Because the input space encodes the difference between a given target location and
the current location of the arm in joint coordinates, when the target is reached and
stopped, x would be all zero. When targets on a circle (Figure 6.2b) are shown in the
input space (Figure 6.4.a), they are skewed. So, I found a principal component, which
indicate a direction of the second target, and its orthogonal component. Then I rotated the
input vector and rescale the two axis, relating inverse magnification rule (Figure 6.4.b).
This rescaling based on the principal component analysis would correct the skewness of
joint coordinate, providing almost equal number of kinematic codings for a certain
movement direction, similar to the inverse magnification rule. After this input space
transformation with 25 % of margin, the eight targets are well distributed over [-1 1] for
joint location. We used 11 by 11 by 3 by 3 units for kinematics coding over the range of [-
1:1, -1:1, -π:π rad/s, -π:π rad/s]. We simplified broadness matrix S
i
axis aligned, so it is a
diagonal matrix whose elements in each direction is inverse of half distance to the nearest
neighbor in the direction. This transformation matrix is
⋅
= ⋅ =
0.4712 0.8820
0.8820 - 0.4712
8571 . 2 0
0 8 . 0
trans trans trans
R A T (6.12)
1 4 1
Figure 6.4. transforming distorted input space (a) in joint coordinate to a more
balanced input space (b), where the blue dot on the center represent the target location
and each magenta cross represents the initial location of the ith movement shown in
Figure 6.2b in the feedback signal.
The reward contains two terms (Kambara et al., 2009): a goal reward using a smooth
curve, which is based on the distance between the current location to the target in
Cartesian coordinate and whose width is given by σ
r
(=6cm) with a negative bias λ
r
(=0.5), and a efficiency reward using a squared summation of all motoneurons’ activation
with a mixing coefficient c
e
(=0.1). This reward signal is combined accuracy and energy
consumption similar to Chapter 3 (for detail argument, please see Section 3.2.2.1).
Minimizing energy consumption may improve precision under existence of signal
dependent noise (Harris & Wolpert, 1998; Todorov, 2002)
∑
=
⋅ − −
−
− =
6
1 j
2
e 2
2
c ) exp( ) (
MCX
i r
r
u t r λ
σ
t x
(6.13)
1 4 2
The critic estimates sum of rewards from the current state to the goal, using
∑
=
j
j
C
j
b w V ) ( ) ( x x
(6.15)
where w
j
C
is learned through temporal difference learning (Doya, 2000b) with eligibility
trace (equation 6.17).
) ( ) (
1
) ( ) ( x x V
t
V t r t
TD
∂
∂
− ⋅ − =
τ
δ
(6.14)
) ( ) ( t e t w
C
j
C C
j
δ η = (6.16)
) ( ) (
) (
x
j
C
j
C
j C
b t e
dt
t de
+ − = τ
(6.17)
where η
C
is a learning rate of the critic network, τ
TD
is a discounting factor for future
rewards (=100 msec), τ
C
is time constant of eligibility trace of the critic network.
The actor, the motor input map, is a fully connected matrix from the kinematic
coding to the neurons in 20 by 20 cortex model (Equation 6.18). We added an exploratory
noise on the motor cortex neurons. The exploratory noise is an impulse activation on the
motor cortex, whose activation size is uniformly distributed between –exp(-V(x)) and
exp(-V(x)), whose location is randomly selected and whose radius is one, alternating
every 100 msec. exp(-V(x)) is small when the value is enough large, and this results in a
smaller exploratory noise. Because the cortex model is a dynamical system, transition
between impulses is smoothly changed. The motor input map is updated through
reinforcement learning with eligibility trace (Equation 6.19 & 6.20), combining with the
equation 6.14. Though the exploratory noise alternates every 100 msec, the reinforcement
learning updates the motor input map on-line.
1 4 3
∑
+ =
j
i j
A
ij i
t n b w E ) ( ) ( ) ( x x (6.18)
) ( ) ( t e t w
A
ij
A A
ij
δ η = (6.19)
) ( ) (
) (
t n t e
dt
t de
j
A
ij
A
ij A
+ − = τ (6.20)
where η
A
is a learning rate of the critic network and τ
A
is time constant of eligibility trace
of the critic network.
Dynamical models Time constant (msec) Other parameters
Cortical model (pyramidal) 50
Cortical model (GABA) 50
cortex
δ =0.5, α =0.175
motoneurons 50
m
c =2, μ =2, ν =0.12
Learnable networks Learning rate Time constant for the
eligibility trace (msec)
Motor output map 0.1
Motor input map 50 100
critic 1 100
Table 6.1. Parameters of the dynamics of cortical models and learnable networks.
6.2.4. Simulation
First, the motor output map is developed through 4000 random activations which
each last for 1.2 sec where dt is 0.01 sec. Then, I disabled the learning procedure of the
motor output map and developed the motor input map, described in section 6.2.3. The
reinforcement learning would adapt the motor input map with 500 pseudo random trials
1 4 4
of 8 targets on the circle. Each trial lasts 1.2 sec where dt is 0.01 sec, and a new
exploratory activation is given every 100 msec. To simulate a stroke, I set all the neuronal
activation of the right-half of the cortex to zero. In the current study, therapy includes the
same schedule with the schedule used in the development – pseudorandom trials over 8
targets and only updates the motor input map.
6.3. Results
6.3.1. Motor output map
The motor output map, which is based on muscle coding, may be developed through
the competitive Hebbian learning. The motor output map for six muscles is shown in
Figure 6.5: shoulder extensor (E), shoulder flexor (F), elbow opener (O), elbow closer
(C), biarticular abductor (B), and biarticular adductor (D). Before learning, the weights
over the cortex were randomly distributed, which did not have any trend of map
formation. On the contrary, after learning, there are developed map representations,
which are combined with stripes and island (Armentrout et al., 1994; Y . Chen & Reggia,
1996; Goodall et al., 1997). In Figure 6.5.b, certain gray regions represent coactivaton
with the other muscles, a muscle synergy.
Contrasting to Reggia and his colleagues (Armentrout et al., 1994; Y . Chen &
Reggia, 1996; Goodall et al., 1997), the current model does not require proprioceptive
inputs to the motor cortex to developed a motor output map. It implies that the motor
output map can be obtained solely by a competitive Hebbian learning between the motor
cortex and motoneurons in the spinal cord. In contrast to Todorov (Todorov, 2000a) and
1 4 5
Guigon et al. (Guigon et al., 2007a), where they assumed uniformly distributed random
weight, the current model forms map representations. This allows us to simulate a stroke
study.
Figure 6.5. Motor output map for six muscles before training (a) and after training
(b), where white denotes strong connections and black denotes no connection between a
neuron in the motor cortex and the specified motoneuron in the spinal cord: shoulder
extensor (E), shoulder flexor (F), elbow opener (O), elbow closer (C), biarticular
abductor (B), and biarticular adductor (D). Each small box shows 20 by 20 set of neurons
on the motor cortex.
6.3.2. Motor input map
Reinforcement learning may improve accuracy of a reaching movement with the
given motor output map. The motor input map can be shown in the activation pattern of
the primary motor cortex during a voluntary movement. Here, for the eight different
movement directions after 500 pseudo random trials each, Figure 6.6a shows the
1 4 6
averaged activation pattern of the pyramidal neurons in the model between 0 msec and
300 msec after movement onset. Though the movements have final endpoint errors, the
movement direction is fairly good (Figure 6.6b).
Figure 6.6. Averaged motor activation pattern for 8 different directions between 0
msec and 300 msec after movement initiation (a), where white denotes strong activation
(0.15) and black denotes no activation and each small box represents 20 by 20 neurons on
the motor cortex for 8 different directions, and (b) corresponding movements with their
velocity profile.
Todorov used cosine tuning hypothesis (Todorov, 2002) to find activation level of
neurons on the motor cortex (Todorov, 2000a). Cosine tuning hypothesis is a theoretically
driven concept of optimal distribution of activation, under existence of signal dependent
noise. When there exists a signal dependent noise, large motor command would include
higher noise. Thus the motor system would minimize the size of motor command as
much as possible. However, to exert enough force with a smaller noise, the brain would
1 4 7
recruit muscles with smaller motor commands, whose muscle directions are different. If
this recruitment is for all possible directions, a neuron’s activation level would follow the
cosine curve (Todorov, 2002).
However, without explicit cosine tuning hypothesis (Todorov, 2000a, 2002), as
aligning the activation level to the muscle synergy on a given movement direction,
directional coding was eventually developed. Guigon et al. (2007a) also obtained
directional coding without cosine tuning hypothesis. Here, the motor input map does not
set the movement speed, the time for reaching to the target is varying (Figure 6.6.b,
velocity profiles). Because the model may reproduce Evarts’ data (1968), that is, larger
activation generates larger force, the mean activation level of the motor cortex neurons
for the target #1 (0.0387) is higher than the others’ (0.0176). Thus the exerted force while
moving toward the target #1 is higher than the others, resulting in a faster movement. So,
the directional tuning (Figure 6.6a) contains higher activation on the degree 0, for target
#2, target #4 and target #5.
Figure 6.7b provides a population coding diagram (Georgopoulos et al., 1986) for
the model. Thin blue lines represent individual neurons’ activation as the length of a
vector in the neuron’s preferred direction and thick red lines represent population vectors
for each direction. I note that this population diagram is not a sum of muscle synergies
with activation level. The population coding is aligned to the desired movement direction
only when the size of force generated by neurons on the motor cortex is identical.
1 4 8
Figure 6.7. Directional coding of selected neurons (a) and population vector for
eight different directions (b) based on averaged activation between 0 msec and 300 msec
after movement initialization. In (a), for eight directions, directional tuning of neurons
with four largest activation were shown. Except target #2, which is maximally tuned to 0
degree, not 45 degree, directional tunings are matched with the movement directions. In
(b), thin blue lines represent individual neurons’ activation on the direction of their own
preferred direction and thick red lines represent population vectors for each movement
direction, which fairly indicate movement direction.
6.3.3. Stroke rehabilitation
I simulated a stroke as setting activation level of neurons on the right-half motor
cortex. In the current study, therapy includes the same schedule with the schedule used in
the development – pseudorandom trials over 8 targets and only updates the motor input
map. Just after an artificial stroke, movements towards target #5, #6 and #7 were strongly
affected, and movements towards target #2 and #3 were moderately affected. The stroke
removed most of shoulder flexor (F) and elbow opener (O) in the motor output map
(Figure 6.5.b). Contracting these two muscles together will move the end point toward
the target #6. Retraining, consisted of 200 trials for each direction, improves the motor
1 4 9
performance, utilizing the current resources. I noted that in this current study the
competitive Hebbian learning for the motor output map is disabled. So, there is no
reorganization of the motor output map – map expansion or shrunken. Re-training
updates the motor input map accordingly. Compared to the motor input map just after
stroke, the motor input map is more activated, to generate more forces following Evarts
(1968).
Figure 6.8. Motor performance just after stroke (a) and after re-training (b), without
reorganization of muscle synergies. The retraining was consisted of 200 trials for each
direction. The stroke affects a part of movements where the muscle synergy patterns were
mostly removed. Retraining improves the motor performance, utilizing the current
resources.
1 5 0
Figure 6.9. Motor cortex activation pattern changes just after stroke (a) and after
retraining (b), without reorganization of muscle synergies, where white denotes strong
activation (0.15) and black denotes no activation and each small box represents 20 by 20
neurons on the motor cortex for 8 different directions. The retraining was consisted of
200 trials for each direction. Retraining increases activation of the current motor input
map. This increased activation is not correlated with the motor output reorganization yet;
in other words, this increased activation does not mean that map expansion.
6.4. Discussion
The current model showed that the primary motor cortex can learn to encode muscle
synergies through competitive Hebbian learning and that aligning the movement direction
to these muscle synergy directions through reinforcement learning is then enough to
develop directional coding and population vector. In contrast to previous analytical
models (Guigon et al., 2007b; Todorov, 2000a), both the motor output map and motor
input map are developed through training. This emergence of directional coding of
neurons on the motor cortex is due to the characteristic of reinforcement learning. It’s
easier to understand this with the simplest case of TD learning. If the reward decreases
1 5 1
linearly as a function of distance from the target to the current, the reward is given with a
cosine function of direction difference, cos(θ
target
-θ
current
). When the discounting factor for
TD learning is zero, the learning rule (6.19) would learn the cosine function of direction
difference. That’s the reason why the reinforcement learning developed directional tuning.
Detail explanation how the reinforcement learning adjust the motor input map is followed.
When a small exploratory noise on the motor cortex moves the end point towards the
target, reward is given more. So, the amount of reward change (TD error) would update
the selection probability of a certain neuron, which received the exploratory noise. The
more reward received, that is, the more direct to the target the end point moves, the
higher probability the action is chosen. So eventually exploratory noises will cover all the
possible directions and provides a reward to shape the directional tuning, where the
maximum action selection probability occurs when the neuron’s muscle synergy direction
is aligned to the movement direction.
The current model has two unsolved problems: 1) summing up the muscle synergies
linearly and 2) controlling a final posture through the inverse static model. Because
synaptic weights from neurons on the motor cortex to the motoneurons in the spinal cord
are always positive, summing up the muscle synergies is not simple. When the outputs
allowed negative values, because the opposite direction to a certain direction may be
denoted with a negative value, the summation is reasonably easy. However, because the
motoneuron does not allow a negative activation, a simple integration of muscle
synergies would results in infinite activation with very high coactivation. In the model,
because of sigmoidal abstraction of motoneuron recruit rates, all six neurons reach to one;
there is no movement at all. If there are biarticular muscles, this summation became more
1 5 2
complex. Here I approximated the summation of muscle synergies in the level of
motoneuron activation patterns, with decaying.
The current model does not achieve zero final end-point error. It is because the
center of kinematic coding is shared by all movements (Figure 6.4, I noted again that a
cross mark with number i denote the starting position of movements for ith target). To
maintain the hand on the target position, the muscles need to overcome muscles’ passive
forces, which can be obtained through 1) the feedback controller or 2) the inverse static
model. Scheidt and Ghez (2007) showed that there is a separated pathway for an inverse
static control experimentally. In the current model, all the target location shared a
kinematic coding. This means that whenever the target is reached, there is the one
kinematic coding for all the targets. On the target location, though the motor system
should overcome passive force of the arm to maintain the posture, where passive forces
for maintaining the posture is dependent on the posture, because the model used a
common kinematic coding, it cannot represent the passive force.
A computational model of Bissmarck et al. (2008) learned the motor input map
using reinforcement learning similar to the model in this study, where the action pool is a
set of predefined torques. Their model has a feedback controller through slow visual
feedback. After training the motor control model learned through reinforcement learning
became dominant, because it has a shorter latency. But the convergence to the target
position precisely owed to the slow visual feedback controller.
Kambara et al. (2009) has a very similar structure to the model in this study. The
difference is that they do not have a motor cortex model and it contains the inverse static
model to achieve zero final error. When the arm contains muscles, this inverse static
1 5 3
model would learn an equilibrium point associated with a certain motoneuron activation
pattern. They learned the inverse static model through a feedback error learning
framework (Kawato & Gomi, 1992). I note that after the inverse static model is perfectly
learned, the dynamic control, which is learned through the reinforcement learning is
degenerated. It’s because the size of motor commands from the inverse static model is
generally larger than the size of motor commands from the motor control model learned
through the reinforcement learning. Though I also developed a way to compute exact
motoneuron pattern to reach a certain equilibrium point (not shown), when fed into the
arm model, there is no need to have a dynamical controller; only with the inverse static
model, the initial direction is fairly towards the target and converged to the target
perfectly, though the movement itself is not straight to the target and did not have a bell-
shaped velocity profile. So, if we have an inverse static model, the motor cortex model is
an auxiliary controller, and we cannot have strong alignment between the movement
direction and the muscle synergy directions. Thus, in this study, we excluded fine control
of the final posture.
As the other models using the reinforcement learning (Bissmarck et al., 2008;
Kambara et al., 2009) did not, the current model does not control the speed of movement
explicitly. Instead, coefficients in the reward function (6.13) affect the speed of
movement indirectly. When c
e
is large, large activation of motoneurons would be
penalized. Thus, the smaller activation of the motoneurons is preferred. The smaller
activated motoneurons would exert weaker muscle forces and the movement would be
slower. This relationship also can be explained by Fitt’s law. However, this does not
assure constant speed for every direction because of biarticular muscles in the model,
1 5 4
contrasting that Guigon et al. (2007b) optimized the movement with specifying the
movement duration following their ‘constant effort principle’.
Strength of the current model is that it has a map representation and it supports a
stroke rehabilitation study. However, in this chapter, the rehabilitation study was half
done because I disabled motor output map learning. Currently, even when half of the
motor cortex is lost, the performance was restored fairly well (Figure 6.8.b). It implies
that the left-over muscle representations were enough to reverse the effects of stroke.
When a stroke affects the motor cortex more, if the left-over muscle representation were
no longer enough to restore the motor performance, we need reorganization of the motor
output map. The movements just after stroke shows unbalanced effects of the stroke over
direction (Beer et al., 2004). There were still good movement in certain direction, while
part of movements were deteriorated.
This concurrent learning requires two different future studies: 1) speed of learning,
and 2) adaptive schedule of the rehabilitation condition. For the concurrent learning, the
ratio between unsupervised learning and reinforcement learning is important, as shown in
Chapter 3 (the ratio between unsupervised learning and supervised learning, see
Appendix B for details). The reorganization of the motor output map is related to the
frequency of the motoneuron patterns. When a certain motoneuron pattern is shown more
frequently correlated with a certain motor cortex activation pattern, the motor output map
would have the motoneuron pattern – a certain muscle synergy more. To expose a certain
motoneuron pattern more frequently, a stroke subject needs to try the affected movement
more often, connecting to an adaptive schedule. Or, we can ask a subject to try until the
movement succeeds.
1 5 5
Here, the more frequent trials for a certain action may lead to reorganization of the
motor output map, containing more frequent occurrences of a motoneuron pattern
correlated with the certain action. More occurrence of motoneuron pattern results in the
increase of number of the map representations, implying map expansion. Then, the model
may explain correlation between map expansion and skill learning. As stated before,
kinematic coding can have inverse magnification rule, which has more neurons with
smaller receptive fields, where the workspace is more concerned. We may extend this
inverse magnification rule for the motor output map. If there is more frequent occurrence
of a motoneuron pattern, the number of encoding of the specific motoneuron pattern on
the motor output map increases, resulting in a smaller receptive filed. Then, the motor
cortex can control the arm a certain range more precisely with more experts.
1 5 6
Chapter 7.
Conclusion
7.1. Summary
Following the literature review of physiological data on the motor cortex and its
peripheral regions, and computational models of the cortex and motor control in the first
two chapters, the next four chapters presented three distinct models.
In Chapter 3, the first model integrates simple bilateral motor cortical models with
higher level coding and an action choice module. This study indicates that the
rehabilitation of the cortical model and adjusting of action selection interact each other. I
formulated an optimal dose hypothesis for constraint induced therapy, that is, existence of
the optimal length of therapy where spontaneous arm use and motor performance
improved even after termination of therapy, because high spontaneous arm use facilitates
motor learning. If the dose duration reaches the optimal dose duration, then use in daily
life will improve the affected arm’s functionality and spontaneous arm use automatically.
In Chapter 4, using the model described in Chapter 3, a new simulation was
conducted in order to confirm the previous chapter’s hypothesis more directly with data
from the EXCITE trials (S. L. Wolf et al., 2006) which involved 169 stroke subjects with
2 years of arm use to provide measures of changes in use and functionality with a fixed
(immediate or 1-year-delayed) length of therapy. The simulation matched with the
EXCITE trials qualitatively and showed an averaged optimal dose.
1 5 7
The second model, in Chapter 5, extended the Hoff-Arbib model to reproduce
variability in motor planning of detouring reach-to-grasp movements. In contrast to
previous models of reach-to-grasp action that capture only the stereotypical behavior
patterns of a single unobstructed reach, experimental data from the interdisciplinary study
in the University of Southern California (Tretriluxana, Gordon, Arbib et al., 2007;
Tretriluxana, Gordon, Fisher et al., 2007; Tretriluxana, Gordon, & Winstein, 2007)
explored right-handers' reaching to grasp an object when detouring to avoid an obstacle,
and showed that healthy adults do not exhibit the stereotypical reach-to-grasp strategy in
this case, and indeed we found that subjects employed several different strategies for
controlling the reach and for reach-grasp coordination.. I showed that variability in reach-
to-grasp coordination may be based on the different equifinality of subschemas; i.e.
evading the obstacle starts and ends together with preshaping vs. preshaping starts after
evading the obstacle ends. I also hypothesized that a virtual-target is used to plan
detouring reaching movements to reproduce the experiment data with the extended Hoff-
Arbib model.
The third model, in Chapter 6, is a model of motor cortex coding, investigating a
biologically plausible role of the cortex. This model attempts to answer the “muscle vs.
movement” debate, and also to provide insight into rehabilitation after stroke. I
hypothesize that the motor cortex encodes both muscle coding and movement coding
instead of encoding only one of them; I called this the “dual map” hypothesis. With a
biologically plausible cortical model, I controlled an arm with six muscles through
muscle synergies encoded in a motor output map (projection between neurons in the
motor cortex and motoneurons in the spinal cord) and a motor input map (projection
1 5 8
between premotor cortex and the neurons in the motor cortex) which activates neurons in
the motor cortex appropriately, aligning a given high-level movement direction with a
low-level muscle synergy direction.
7.2. Relationship of models
Three models are all different but part of motor control procedure (Kawato et al.,
1987), focusing a specific procedure. Indeed, they have a hierarchical relationship. In
Chapter 3 and 4, the simplified motor cortex model assumed that there is a system that
controls the arm moving in a given population vector. The model in Chapter 6 expands
the simplified motor cortex model, which adjusted its preferred directions using the
supervised learning and the unsupervised learning to a cortical model with map formation
using the reinforcement learning and the competitive hebbian learning. The model in
Chapter 6 localized the unsupervised learning on the motor output map and the
reinforcement learning on the motor input map. In Chapter 3 & 4, while the purpose of
the unsupervised learning is a more precise movement, the purpose of supervised
learning is minimizing the error, which is similar to a purpose of the reinforcement
learning. . Thus, we can view that previous model’s supervised learning would update the
motor input map through reinforcement learning in Chapter 6. To assure this relationship,
we need to assume that a neuron on the motor cortex would generate the same size of
muscle force in its preferred direction, which I used to draw Figure 6.7. In Chapter 3 & 4,
I used unbalanced effects of the stroke over direction (Beer et al., 2004). Because the
stroke is considered as a loss of muscle twitch map representation (Nudo et al., 1996),
removal of neurons whose preferred direction fells in a certain range, as a simulation of
1 5 9
strokes was not well explained. However, in Chapter 6, we showed that the muscle map
(the motor output map) is strongly correlated with the movement direction (the motor
input map). Thus, we can say that deterioration of a certain muscle representation may
lead a deterioration of a certain movement direction, as shown in Figure 6.8a.
In Chapter 5, I presented an extended Hoff-Arbib model, which contains a reaching
module, a grasping module, and a coordination module. In Chapter 6, we focused on the
reaching module only. The Hoff-Arbib model (Hoff, 1992; Hoff & Arbib, 1993) does not
generated a motor command, which controls muscles. When there is a target location and
feedback signals of a current location, the Hoff-Arbib model moved a point-mass using
the derivative of acceleration as a motor command. The Hoff-Arbib model will associate
difference between the target location and the current location to a certain motor
command, which lead the point mass movement to follows a minimum jerk trajectory
(Flash & Hogan, 1985) in the absence of perturbations. In summary, the Hoff-Arbib
model is a feedback controller which generates a motor command which acts as if
feedforward control signals are given so long as the feedback signal remains small. Thus,
it is identical to the structure of optimal feedback controller and the model in chapter 6
(Figure 7.1). The model associated a difference between the target location and a current
location, to a certain muscle command, which also minimizing sum of motoneurons’
activation, where this minimization may lead a bell-shaped velocity profile under
existence of a signal dependent noise (Harris & Wolpert, 1998).
1 6 0
Figure 7.1. Relationship between models in chapter 5 and chapter 6.
7.3. Optimization with learnable components
The three models in this study have a non-linear interaction between the components
of motor control. This non-linear interaction between components can be summarized as
‘optimization with learnable components.’ There are multiple low-level motor controllers
which would maximize motor performance in some sense. On the contrary, a high level
motor controller will combine those low level controllers, maximizing a certain discipline.
As an example, the model in Chapter 3 & 4, motor cortices tried to minimize motor errors,
while the action choice module simply chose which one would be used for a given target.
This structure requires two essential characteristics of the system: optimization through
the reinforcement learning and competition between low-level controllers to recruit more
neurons through the unsupervised learning. It is sensitive to changes in low level motor
controllers, using a reward. When the low level motor controller performs worse due to
stroke, a reward signal carries changes in lower level structures to the high level
1 6 1
component, and update the high-level component accordingly to maximize its own
discipline. The change in the high-level component is crucial because unselected
component would be degenerated due to unsupervised learning.
The first model has two components: action choice module and motor execution
module (a simplified motor cortex). While the action choice module maximizes the
success of the reach efficiently regardless of the choice of arm, the motor execution
module focused on maximizing the performance of the selected arm. Thus, only the
selected arm has a chance to improve its motor performance. Ironically, when the
performance of a certain arm is worse or less efficient than the other after stroke, the arm
is less selected. So, once performance of the arm is degraded, the performance gets worse
because it lost a chance to be trained in daily life. Because the motor execution module is
learnable, the action choice is affected. We may consider maximizing success of reach
with efficiency as an optimization of the reward function (Equation 3.4), which combine
reward of success and reward of efficiency.
In Chapter 5, reach-to-grasp behavior may be separated into two main pathways: a
pathway for reaching and a ventral pathway for grasping. They are controllers of the
proximal and distal parts of the arm respectively. They can be executed independently;
however, while reach-to-grasp action, those two controllers must be coordinated. Chapter
5 showed that there are several different patterns in coordination, because of difference in
grasping patterns. Because the Hoff Arbib model only represents a perfectly learned
(optimized) reaching movement, it is not possible to have imperfect controller. So, I
cannot apply ‘optimization with learnable components’ to this coordination variability,
using the Hoff-Arbib model. Instead, I would consider a learnable reaching component
1 6 2
and grasping component, which both maximize accuracy and precision. I hypothesized
that the coordination pattern is also related to a certain reward function, and coordination
patterns depend on mixing parameters of the reward function. This idea is followed by
the Fitts law for the reaching component alone. Similarly to being able to decide the
reaching movement speed with the precision requirement using Fitts law, we may
formulate a new trade-off law including both reaching and grasping. It includes 1) how
much taking a risk, related to amplitude of deviation, 2) how precise the grasping is, and
3) how big the end-effector is, besides a tradeoff between speed and precision. When the
amplitude of deviation is larger, the reaching movement could be faster, but the traveling
length is longer. When the grasping should be more precise, the required time for the
enclosing would be elongated. Then the reaching movement should be slower or the
enclosing starts earlier, noting early preshape pattern. If the aperture size is bigger,
because the possibility to collide with the barrier increases, the speed should be slow
down, noting that early preshape patterns have longer movement time than the others. On
the contrary, if the aperture size is smaller, which means smaller end-effector size and
less probability to collide with an obstacle, thus, movement can be faster than movements
when the aperture size is large, noting that no preshape and independent preshape has
shorter movement time than others where the maximum grip aperture size is smaller than
the others. I called it an extended Fitts Law and it is a target of post-thesis research.
In Chapter 6, the motor input map is bootstrapped on the motor output map, where
both maps are learnable. Though the motor output map is not closely related to the motor
performance, each neuron of the motor cortex is specialized to control an arm in a certain
direction. We may consider each neuron as an expert and the motor cortex as a mixture of
1 6 3
experts (Jacobs, Jordan, Nowlan, & Hinton, 1991). The motor input map is optimizing
the utilization of the actions on the motor cortex with successful reach and efficiency,
through reinforcement learning. When the stroke affects the motor cortex, part of basis
actions – muscle synergies were removed. Then, the motor input map is also deteriorated.
When the motor input map is changed, the performance would be improved. However,
when the stroke affects too large area of the motor cortex, because the exerted force on a
certain direction decreased too much, combining with Evarts (1968), movements in the
certain direction cannot be improved. To improve the motor performance further, changes
in motor output map is required – neural reorganization, resulting in expansion of muscle
twitch map (Nudo et al., 1996). While motor output map change affects the motor input
map update directly, the motor input map does not affect the motor output map
reorganization directly. Instead, repeated trials on a certain direction, which is
deteriorated, increased frequency of exposure of a certain motoneuron pattern, correlating
with a given target information. This correlation will reorganize the motor output map.
This optimization through learnable components may make the learning procedure
look complex. However, here I emphasize that this may be simpler than what we think,
because each component is updated through its own discipline. The higher level
optimization learns how to combine the low-level components, while low-level
components pursue their own purposes, as if it does not know the existence of the higher
level optimization. As an example, in the Chapter 4, while the motor execution module
will improve the motor performance under the condition that a certain arm is selected, the
action choice module learns which arm is used for a given target. In chapter 6, while the
purpose of a reaching component and a grasping component were accurate and precise
1 6 4
action, the coordination controller will combine them maximizing a certain discipline,
without concerning improvement of lower-level components. In Chapter 7, while the
motor output map tends to increase the precision of the motor control, where more
neurons results in more fine control (Reinkensmeyer et al., 2003; Reinkensmeyer et al.,
2002; Todorov, 2002), the motor input map focused on accuracy and efficiency.
We may extend this idea to the perceptual level of “action-oriented perception”
(Michael A. Arbib, 1972, 1989), in which perception is optimized to facilitate action. We
may think of a structure similar to the model of Chapter 4 in which there is lower level
component, which is a motor controller with a certain type of the perceived information: I
may call this a percept-controller complex. The lower level component would try to
improve the motor performance as much as possible with given perceived information.
However, if it is not well performed compared to the other parallel percept-controller
complex, it is less selected by the high level action selector. Once it is not selected, it is
diminished because of competitive Hebbian learning such as that shown in Chapter 6. I
suggest that this is a developmental or evolutionary procedure prior to Neisser’s (1976)
action perception cycle. The action perception cycle may improve a percept-controller
complex.
7.4. Future work
In line with the computational model in chapter 2, ‘learned non-use’ (Haider, Duque,
Hasenstaub, & McCormick, 2006) is a still controversial idea for physical therapists. To
confirm the learned non-use effect, I am developing an experiment setup called the
Bilateral Arm Reaching Test (BART) under supervision of Dr. Schweighofer and help of
1 6 5
Dr. Winstein and her students. I analyzed data of both healthy and stroke subjects with a
second order logistic regression. Also, I formalized the definition of learned non-use in
probabilistic notation and now await the data collection.
As for chapter 4, the current data set (Tretriluxana, 2008) does not have different
conditions in detouring reach-to-grasp behavior. Thus, I could not derive rules to plan a
virtual target or different grasping. To confirm the virtual target hypothesis, we may need
an experiment which disturbs a selection of virtual target. Related to section 7.3, I would
like to develop an extended Fitts law for this detouring reach-to-grasp behavior, which
may explain why different grasping patterns exist, connecting grasping and reaching
through a characteristic that a larger aperture size slowed down a reaching velocity. In
addition to this healthy subjects’ data, the data set also contains stroke subjects. I may
applied the current analysis on stroke subjects.
In Chapter 6, I used a kinematic coding as an input layer of the motor cortex.
Though this coding may reside in premotor cortex, the structure of this kinematic coding,
which is uniform over a certain range in 4 dimensions, may be unrealistic. I would like
replace this layer with a cortical model similar to the motor cortex model. I have tried to
obtain the motor input map, which accounts for correlation between high-level movement
direction coding and low-level force direction coding, through supervised learning (error
back propagation) and reinforcement learning. In case of supervised learning, it is non-
local and non-biologically plausible learning rule (not shown). And in case of
reinforcement learning, even though we can have correlation between initial direction and
desired direction. I did not have final position control. We may include the cerebellum for
correction at the end of movement.
1 6 6
Though the current motor cortex model does not contain distal vs. proximal
dissociation, fundamentally, it is possible to obtain dissociation through the model. The
reason of distal vs. proximal map dissociation may be due to its independency. Thus, if a
certain muscle pattern for the distal upper limb is shown with different two muscle
patterns for the proximal upper limb, at the early stage, these two patterns combined with
distal representation and proximal representation are getting apart. And at the later stage,
various proximal muscle patterns with the certain fixed distal muscle pattern would
cancel out the plasticity between the neuron which encodes the distal muscle pattern and
the proximal motoneurons.
1 6 7
Bibliography
Alberts, J. L., Saling, M., & Stelmach, G. E. (2002). Alterations in transport path
differentially affect temporal and spatial movement parameters. Exp Brain Res,
143(4), 417-425.
Albus, J. S. (1975). Data Storage in the Cerebellar Model Articulation Controller
(CMAC). Journal of Dynamic Systems, Measurement and Control, American Soc.
of Mechanical Engineers.
Albus, J. S. (1975). A New Approach to Manipulator Control: The Cerebellar Model
Articulation Controller (CMAC). Journal of Dynamic Systems, Measurement and
Control, American Soc. of Mechanical Engineers.
Albus, J. S. (1979). Mechanisms of Planning and Problem Solving in the Brain.
Mathematical Biosciences(45), 247-293.
Arbib, M. A. (1972). The metaphorical brain; an introduction to cybernetics as artificial
intelligence and brain theory. New York: Wiley-Interscience.
Arbib, M. A. (1981). Perceptual structures and distributed motor control. In V . B.
Brooks (Ed.), Handbook of Physiology — The Nervous System II. Motor
Control (pp. 1449-1480). Bethesda, MD: American Physiological Society.
Arbib, M. A. (1989). The metaphorical brain 2 : neural networks and beyond. New York,
N.Y .: Wiley.
Armentrout, S. L., Reggia, J. A., & Weinrich, M. (1994). A neural model of cortical map
reorganization following a focal lesion. Artif Intell Med, 6(5), 383-400.
Asanuma, H. (1989). The motor cortex. New York: Raven Press.
Baraduc, P., Guigon, E., & Burnod, Y . (2001). Recoding arm position to learn visuomotor
transformations. Cereb Cortex, 11(10), 906-917.
Barber, M. J., Clark, J. W., & Anderson, C. H. (2003). Neural representation of
probabilistic information. Neural Comput, 15(8), 1843-1864.
Barto, A. G. (1985). Learning by statistical cooperation of self-interested neuron-like
computing elements. Human Neurobiology(4), 360-375.
Barto, A. G., & Sutton, R. S. (1982). Simulation of anticipatory responses in classical
conditioning by a neuron-like adaptive element. Behav Brain Res, 4(3), 221-235.
1 6 8
Barto, A. G., Sutton, R. S., & Anderson, C. (1983). Neuronlike adaptive elements that can
solve difficult learning control problems. IEEE Transaction on Systems, Man, and
Cybernetics, 13, 834-846.
Battaglini, P. P., Muzur, A., Galletti, C., Skrap, M., Brovelli, A., & Fattori, P. (2002).
Effects of lesions to area V6A in monkeys. Exp Brain Res, 144(3), 419-422.
Beer, R. F., Dewald, J. P., Dawson, M. L., & Rymer, W. Z. (2004). Target-dependent
differences between free and constrained arm movements in chronic hemiparesis.
Exp Brain Res, 156(4), 458-470.
Bendahan, P., & Gorce, P. (2006). A neural network architecture to learn arm motion
planning in grasping tasks with obstacle avoidance. Robotica, 24(2), 197-203.
Bhushan, N., & Shadmehr, R. (1999). Computational nature of human adaptive control
during learning of reaching movements in force fields. Biol Cybern, 81(1), 39-60.
Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford: New York:
Clarendon Press; Oxford University Press.
Bissmarck, F., Nakahara, H., Doya, K., & Hikosaka, O. (2004). Responding to Modalities
with Different Latencies. NIPS.
Bissmarck, F., Nakahara, H., Doya, K., & Hikosaka, O. (2008). Combining modalities
with different latencies for optimal motor control. J Cog Nerosci, 20(11), 1966-
1979.
Bizzi, E., Accornero, N., Chapple, W., & Hogan, N. (1984). Posture control and trajectory
formation during arm movement. J Neurosci, 4(11), 2738-2744.
Bogacz, R., & Gurney, K. (2007). The basal ganglia and cortex implement optimal
decision making between alternative actions. Neural Comput, 19(2), 442-477.
Buccino, G., Binkofski, F., Fink, G. R., Fadiga, L., Fogassi, L., Gallese, V., et al. (2001).
Action observation activates premotor and parietal areas in a somatotopic manner:
an fMRI study. Eur J Neurosci, 13(2), 400-404.
Bullock, D., & Grossberg, S. (1988). In dynamic patterns in complex systems. In Kelso
(Ed.). Jas Mandell, AJ and Shlesinger, MF (World Scientific Publishers,
Singapore).
Bullock, D., Grossberg, S., & Guenther, F. (1993). A self-organizing neural model of
motor equivalent reaching and tool use by a multijoint arm. J Cog Nerosci 5, 408-
435.
1 6 9
Buneo, C. A., Jarvis, M. R., Batista, A. P., & Andersen, R. A. (2002). Direct visuomotor
transformations for reaching. Nature, 416(6881), 632-636.
Butefisch, C., Hummelsheim, H, Denzler, P, Mauritz, KH. (1995). Repetitive training of
isolated movements improves the outcome of motor rehabilitation of the centrally
paretic hand. J Neurol Sci, 130, 59-68.
Chae, J., Yang, G., Park, B. K., & Labatia, I. (2002). Muscle weakness and cocontraction
in upper limb hemiparesis: relationship to motor impairment and physical
disability. Neurorehabil Neural Repair, 16(3), 241-248.
Chen, S. Y ., Han, C. E., Parikh, N., Lee, J., Lee, J. Y ., Xu, E., et al. (2008). BART: A novel
laboratory-based instrument to quantify preferred limb use in patients after stroke
Paper presented at the Society for Neuroscience Meeting, Washington, D.C.
Chen, Y . (1997). A motor control model based on self-organizing feature maps.
University of Maryland, College park.
Chen, Y ., & Reggia, J. A. (1996). Alignment of coexisting cortical maps in a motor
control model. Neural Comput, 8(4), 731-755.
Cheng, E. J., Brown, I. E., & Loeb, G. E. (2000). Virtual muscle: a computational
approach to understanding the effects of muscle properties on motor control. J
Neurosci Methods, 101(2), 117-130.
Chernjavsky, A., & Moody, J. (1990a). Note on development of modularity in simple
cortical models. In D. Touretzky (Ed.), Advances in Neural Information
Processing Systems (V ol. 2). Palo Alto: Morgan Kaufmann.
Chernjavsky, A., & Moody, J. (1990b). Spotaneous development of modularity in simple
cortcal models. Neural Computation(2), 334-354.
Cho, S., & Reggia, J. A. (1994). Map formation in proprioceptive cortex. Int J Neural
Syst, 5(2), 87-101.
Cisek, P. (2006). Integrated neural processes for defining potential actions and deciding
between them: a computational model. J Neurosci, 26(38), 9761-9770.
Cisek, P., Grossberg, S., & Bullock, D. (1998). A cortico-spinal model of reaching and
proprioception under multiple task constraints. J Cogn Neurosci, 10(4), 425-444.
Clammam, P. (1969). Statistical analysis of motor unit firing patterns in human skeletal
muscle. Biophysics J., 9, 1223-1251.
1 7 0
Conditt, M. A., Gandolfo, F., & Mussa-Ivaldi, F. A. (1997). The motor system does not
learn the dynamics of the arm by rote memorization of past experience. J
Neurophysiol, 78(1), 554-560.
Conner, J. M., Culberson, A., Packowski, C., Chiba, A. A., & Tuszynski, M. H. (2003).
Lesions of the Basal forebrain cholinergic system impair task acquisition and
abolish cortical plasticity associated with motor skill learning. Neuron, 38(5),
819-829.
Crick, F. (1989). The recent excitement about neural networks. Nature, 337(6203), 129-
132.
Cuijpers, R., Smeets, J., & Brenner, E. (2004). On the relation between object shape and
grasping kinematics. J Neurophysiol, 91(6), 2598-2606.
Dayan, P. (2003). Pattern formation and cortical maps. Journal of physiology Paris, 97(4-
6), 475-489.
Desmurget, M., Prablanc, C., Arzi, M., Rossetti, Y ., Paulignan, Y ., & Urquizar, C. (1996).
Integrated control of hand transport and orientation during prehension movements.
Exp Brain Res, 110(2), 265-278.
Desmurget, M., Prablanc, C., Rossetti, Y ., Arzi, M., Paulignan, Y ., Urquizar, C., et al.
(1995). Postural and synergic control for three-dimensional movements of
reaching and grasping. J Neurophysiol, 74(2), 905-910.
Dobkin, B. (2005). Clinical practice. Rehabilitation after stroke. N Engl J Med, 352(16),
1677-1684.
Dominey, P., Arbib, M., & Joseph, J. (1995). A model of corticostriatal plasticity for
learning oculomotor associations and sequences. Journal of Cognitive
Neuroscience, 7(3), 311-336.
Donoghue, J. P. (1995). Plasticity of adult sensorimotor representations. Curr Opin
Neurobiol, 5(6), 749-754.
Donoghue, J. P., Leibovic, S., & Sanes, J. N. (1992). Organization of the forelimb area in
squirrel monkey motor cortex: representation of digit, wrist, and elbow muscles.
Exp Brain Res(89), 1-19.
Douglas, R. J., & Martin, K. A. (2004). Neuronal circuits of the neocortex. Annu Rev
Neurosci, 27, 419-451.
Doya, K. (2000a). Complementary roles of basal ganglia and cerebellum in learning and
motor control. Curr Opin Neurobiol, 10(6), 732-739.
1 7 1
Doya, K. (2000b). Reinforcement learning in continuous time and space. Neural Comput,
12(1), 219-245.
Duncan, P. W., Wallace, D., Lai, S. M., Johnson, D., Embretson, S., & Laster, L. J. (1999).
The stroke impact scale version 2.0. Evaluation of reliability, validity, and
sensitivity to change. Stroke, 30(10), 2131-2140.
Evarts, E. V . (1966). Pyramidal tract activity associated with a conditioned hand
movement in the monkey. J Neurophysiol(29), 1011-1027.
Evarts, E. V . (1968). Relation of pyramidal tract activity to force exerted during voluntary
movement. J Neurophysiol, 31(1), 14-27.
Evarts, E. V . (1968). Relation of pyramidal tract activity to force exerted during voluntary
movement. J. Neurophysiol., 31, 14-27.
Fagg, A. H., & Arbib, M. A. (1998). Modeling parietal-premotor interactions in primate
control of grasping. Neural Netw, 11(7-8), 1277-1303.
Feldman, A. G. (1981). The composition of central programs subserving horizontal eye
movements in man. Biol Cybern, 42(2), 107-116.
Feldman, A. G. (1986). Once more on the equilibrium-point hypothesis (lambda model)
for motor control. J Mot Behav, 18(1), 17-54.
Fetz, E. E., & Cheney, P. D. (1980). Postspike facilitation of forelimb muscle activity by
primate corticomotoneuronal cells. J Neurophysiol, 44(4), 751-772.
Flash, T., & Hogan, N. (1985). The coordination of arm movements: an experimentally
confirmed mathematical model. J Neurosci, 5(7), 1688-1703.
Franklin, J. A. (1988). Refinement of robot motor skills through reinforcement learning.
Paper presented at the the 27th Conference on Decision and Control, Austin,
Texas.
Fujiwara, T., Kasashima, Y ., Osada, M., Muraoka, Y ., Ito, M., Tsuji, T., et al. (2008).
Motor improvement and corticospinal modulation induced by hybrid assistive
neuromuscular dynamic stimulation (HANDS) therapy in patients with chronic
stroke . . Clinical Neurophysiology , Volume 119 , Issue 6 , Pages e82 - e82,119,
e82
Gallese, V ., Fadiga, L., Fogassi, L., & Rizzolatti, G. (1996). Action recognition in the
premotor cortex. Brain, 119 ( Pt 2), 593-609.
1 7 2
Georgopoulos, A. P. (1996). On the translation of directional motor cortical commands to
activation of muscles via spinal interneuronal systems. Brain Res Cogn Brain Res,
3(2), 151-155.
Georgopoulos, A. P., & Ashe, J. (2000). One motor cortex, two different views. Nat
Neurosci, 3(10), 963; author reply 964-965.
Georgopoulos, A. P., Kalaska, J. F., Caminiti, R., & Massey, J. T. (1982). On the relations
between the direction of two-dimensional arm movements and cell discharge in
primate motor cortex. J Neurosci, 2(11), 1527-1537.
Georgopoulos, A. P., Schwartz, A. B., & Kettner, R. E. (1986). Neuronal population
coding of movement direction. Science, 233(4771), 1416-1419.
Gilbert, C. (1983). Microcircuitory of the visual cortex. Annu. Rev. Neurosci., 6, 217-247.
Gilbert, C., & Wiesel, T. (1983). Functional organization of the visual cortex. Prog Brain
Res, 58, 209-218.
Goodall, S., Reggia, J. A., Chen, Y ., Ruppin, E., & Whitney, C. (1997). A computational
model of acute focal cortical lesions. Stroke, 28(1), 101-109.
Graziano, M. S., Cooke, D. F., & Taylor, C. S. (2000). Coding the location of the arm by
sight. Science, 290(5497), 1782-1786.
Graziano, M. S., Taylor, C. S., & Moore, T. (2002). Probing cortical function with
electrical stimulation. Nat Neurosci, 5(10), 921.
Graziano, M. S., Taylor, C. S., Moore, T., & Cooke, D. F. (2002). The cortical control of
movement revisited. Neuron, 36(3), 349-362.
Guigon, E., Baraduc, P., & Desmurget, M. (2007a). Coding of movement- and force-
related information in primate primary motor cortex: a computational approach.
Eur J Neurosci, 26(1), 250-260.
Guigon, E., Baraduc, P., & Desmurget, M. (2007b). Computational motor control:
redundancy and invariance. J Neurophysiol, 97(1), 331-347.
Haggard, P., & Wing, A. (1995). Coordinated responses following mechanical
perturbation of the arm during prehension. Exp Brain Res, 102(3), 483-494.
Haggard, P., & Wing, A. (1998). Coordination of hand aperture with the spatial path of
hand transport. Exp Brain Res, 118(2), 286-292.
1 7 3
Haider, B., Duque, A., Hasenstaub, A. R., & McCormick, D. A. (2006). Neocortical
network activity in vivo is generated through a dynamic balance of excitation and
inhibition. J Neurosci, 26(17), 4535-4545.
Han, C.E, Arbib, M., & Schweighofer, N. (2008). Stroke rehabilitation reaches a
threshold. Plos Computational Biology, 4, e1000133.
Harris, C. M., & Wolpert, D. M. (1998). Signal-dependent noise determines motor
planning. Nature, 394(6695), 780-784.
Haruno, M., Wolpert, D. M., & Kawato, M. (2001). Mosaic model for sensorimotor
learning and control. Neural Comput, 13(10), 2201-2220.
Helmholts, H. (1925). Physiological Optics (J. P. Southall, Trans. V ol. 3). Rochester, NY ,
USA: Optical Society of America.
Herter, T. M., Kurtzer, I., Cabel, D. W., Haunts, K. A., & Scott, S. H. (2007).
Characterization of torque-related activity in primary motor cortex during a
multijoint postural task. J Neurophysiol, 97(4), 2887-2899.
Hertz, J., Krogh, A., & Palmer, R. G. (1991). Introduction to the theory of neural
computation: Perseus Books.
Hoff, B. (1992). A computational description of organization of human reaching and
prehension. University of Southern California, Los Angeles, CA.
Hoff, B., & Arbib, M. A. (1993). Models of Trajectory Formation and Temporal
Interaction of Reach and Grasp. J Mot Behav, 25(3), 175-192.
Hogan, N. (1984). An organizing principle for a class of voluntary movements. J
Neurosci, 4(11), 2745-2754.
Hogan, N. (1985). The mechanics of multi-joint posture and movement control. Biol
Cybern, 52(5), 315-331.
Hubel, D. H., & Wiesel, T. N. (1962). Receptive fields, binocular interaction and
functional architecture in the cat's visual cortex. J Physiol, 160, 106-154.
Iberall, T., & Arbib, M. A. (1990). Schemas for the Control of Hand Movements: An
Essay on Cortical Localization. In M. A. Goodale (Ed.), Vision and Action: The
Control of Grasping (pp. 204-242): Ablex Publishing Corporation.
Ijspeert, J. A., Nakanishi, J., & Schaal, S. (2002). Movement imitation with nonlinear
dynamical systems in humanoid robots. Paper presented at the International
Conference on Robotics and Automation (ICRA2002).
1 7 4
Izawa, J., Kondo, T., & Ito, K. (2004). Biological arm motion through reinforcement
learning. Biol Cybern, 91(1), 10-22.
Jacobs, R. A., Jordan, M. I., Nowlan, S. J., & Hinton, G. E. (1991). Adaptive Mixtures of
Local Experts. Neural Comput, 3(1), 79-87.
Jeannerod, M. (1981). Specialized channels for cognitive responses. Cognition, 10(1-3),
135-137.
Jeannerod, M., Paulignan, Y ., & Weiss, P. (1998). Grasping an object: one movement,
several components. Novartis Found Symp, 218, 5-16; discussion 16-20.
Jones, E. G., & Wise, S. P. (1977). Size, laminar and columnar distribution of efferent
cells in the sensory-motor cortex of monkeys. J Comp. Neurol(175), 391-438.
Jones, K. E., Hamilton, A. F., & Wolpert, D. M. (2002). Sources of signal-dependent
noise during isometric force production. J Neurophysiol, 88(3), 1533-1544.
Jordan, M. I., & Rosenbaum, D. (1989). Action. Cambridge, MA: MIT Press.
Jordan, M. I., & Rumelhart, D. (1992). Forward models: Supervised learning with a distal
teacher. Cog Sci, 16, 307-354.
Kakei, S., Hoffman, D. S., & Strick, P. L. (1999). Muscle and movement representations
in the primary motor cortex. Science, 285(5436), 2136-2139.
Kalaska, J. F., Cohen, D. A., Hyde, M. L., & Prud'homme, M. (1989). A comparison of
movement direction-related versus load direction-related activity in primate motor
cortex, using a two-dimensional reaching task. J Neurosci, 9(6), 2080-2102.
Kalman, R. E. (1960). A New Approach to Linear Filtering and Prediction Problems.
Transactions of the ASME Journal of Basic Engineering, 82(Series D), 35-45.
Kambara, H., Kim, K., Shin, D., Sato, M., & Koike, Y . (2009). Learning and generation
of goal-directed arm reaching from scratch. Neural Netw, 22(4), 348-361.
Kandel, E. R., Schwartz, J. H., & Jessell, T. M. (2000). Principles of neural science (4th
ed.). New York: McGraw-Hill, Health Professions Division.
Katayama, M. (1993). A computational understanding of motor learning control using
neural internal models for a multi-joint arm. Unpublished A dissertation,
University of Tokyo, Tokyo.
Katayama, M., & Kawato, M. (1993). Virtual trajectory and stiffness ellipse during
multijoint arm movement predicted by neural inverse models. Biol Cybern, 69(5-
6), 353-362.
1 7 5
Kawato, M., Furukawa, K., & Suzuki, R. (1987). A hierarchical neural-network model for
control and learning of voluntary movement. Biol Cybern, 57(3), 169-185.
Kawato, M., & Gomi, H. (1992). The cerebellum and VOR/OKR learning models. Trends
Neurosci, 15(11), 445-453.
Kawato, M., & Samejima, K. (2007). Efficient reinforcement learning: computational
theories, neuroscience and robotics. Curr Opin Neurobiol, 17(2), 205-212.
Kirkwood, P., Maier, M., & Lemon, R. (2002). Interspecies comparisons for the C3-C4
propriospinal system: unresolved issues. Adv Exp Med Biol(508), 299-308.
Kitazawa, S., Kimura, T., & Yin, P. B. (1998). Cerebellar complex spikes encode both
destinations and errors in arm movements. Nature, 392(6675), 494-497.
Kleim, J. A., Barbay, S., & Nudo, R. J. (1998). Functional reorganization of the rat motor
cortex following motor skill learning. J Neurophysiol, 80(6), 3321-3325.
Knill, D. C., & Pouget, A. (2004). The Bayesian brain: the role of uncertainty in neural
coding and computation. Trends Neurosci, 27(12), 712-719.
Knutson, B., Taylor, J., Kaufman, M., Peterson, R., & Glover, G. (2005). Distributed
neural representation of expected value. J Neurosci, 25(19), 4806-4812.
Kohonen, T. (1973). A new model for randomly organized associative memory. Int J
Neurosci, 5(1), 27-29.
Kolmogorov, A. N., & Fomin, S. V . (1957). Elements of the theory of functions and
functional analysis. Rochester, N.Y .,: Graylock Press.
Kording, K. P., & Wolpert, D. M. (2004). Bayesian integration in sensorimotor learning.
Nature, 427(6971), 244-247.
Kuffler, S. W. (1953). Discharge patterns and functional organization of mammalian
retina. J Neurophysiol, 16(1), 37-68.
Kuperstein, M. (1988a). An adaptive neural model for mapping invariant target position.
Behav Neurosci, 102(1), 148-162.
Kuperstein, M. (1988b). Adaptive visual-motor coordination in multijoint robots using
parallel architecture. IEEE Trans Neural Syst Rehabil Eng, 1595-1602.
Kuperstein, M. (1988c). Neural model of adaptive hand-eye coordination for single
postures. Science, 239(4845), 1308-1311.
1 7 6
Kurtzer, I., Herter, T. M., & Scott, S. H. (2006). Nonuniform distribution of reach-related
and torque-related activity in upper arm muscles and neurons of primary motor
cortex. J Neurophysiol, 96(6), 3220-3230.
Kwakkel, G., Wagenaar, RC, Twisk, JW, Lankhorst, GJ, Koetsier, JC. (1999). Intensity of
leg and arm training after primary middle-cerebral artery stroke: a randomized
trial. Lancet, 354, 191-196.
Lan, N. (2002). Stability analysis for postural control in a two-joint limb system. IEEE
Trans Neural Syst Rehabil Eng, 10(4), 249-259.
Lee, D., Port, N. L., Kruse, W., & Georgopoulos, A. P. (1998). Variability and correlated
noise in the discharge of neurons in motor and parietal areas of the primate cortex.
J Neurosci, 18(3), 1161-1170.
Lemon, R. (1988). The output map of the primate motor cortex. Trends Neurosci, 11(11),
501-506.
Lemon, R., Kirkwood, P., Maier, M., Nakajima, K., & Nathan, P. (2004). Direct and
indirect pathways for corticospinal control of upper limb motoneurons in the
primate. Prog Brain Res(143), 263-279.
Li, C. S., Padoa-Schioppa, C., & Bizzi, E. (2001). Neuronal correlates of motor
performance and motor learning in the primary motor cortex of monkeys adapting
to an external force field. Neuron, 30(2), 593-607.
Li, W., Todorov, E., & Pan, X. (2004). Hierarchical optimal control of redundant
biomechanical systems. Conf Proc IEEE Eng Med Biol Soc, 6, 4618-4621.
Li, W., Todorov, E., & Pan, X. (2005). Hierarchical Feedback and Learning for Multi-
joint Arm Movement Control. Conf Proc IEEE Eng Med Biol Soc, 4, 4400-4403.
Liepert, J., Bauder, H., Wolfgang, H. R., Miltner, W. H., Taub, E., & Weiller, C. (2000).
Treatment-induced cortical reorganization after stroke in humans. Stroke, 31(6),
1210-1216.
Lo, C. C., & Wang, X. J. (2006). Cortico-basal ganglia circuit mechanism for a decision
threshold in reaction time tasks. Nat Neurosci, 9(7), 956-963.
Luft, A. R., & Hanley, D. F. (2006). Stroke recovery--moving in an EXCITE-ing
direction. Jama, 296(17), 2141-2143.
Lukashin, A. V ., Amirikian, B. R., & Georgopoulos, A. P. (1996). Neural computations
underlying the exertion of force: a model. Biol Cybern, 74(5), 469-478.
1 7 7
Lukashin, A. V ., & Georgopoulos, A. P. (1993). A dynamical neural network model for
motor cortical activity during movement: population coding of movement
trajectories. Biol Cybern, 69(5-6), 517-524.
Mamolo, C. M., Roy, E. A., Bryden, P. J., & Rohr, L. E. (2005). The performance of left-
handed participants on a preferential reaching test. Brain Cogn, 57(2), 143-145.
Matthews, P. B. (1996). Relationship of firing intervals of human motor units to the
trajectory of post-spike after-hyperpolarization and synaptic noise. J Physiol, 492
(Pt 2), 597-628.
Matthews, P. B. C. (1972). Mammalian muscle receptors and their central actions.
Baltimore: Williams & Wilkins.
Mayo, N. E., Wood-Dauphinee, S., Cote, R., Durcan, L., & Carlton, J. (2002). Activity,
participation, and quality of life 6 months poststroke. Arch Phys Med Rehabil,
83(8), 1035-1042.
Mel, B. (1991). A connectionist model may shed light on neural mechanisms for visually
guided reaching. J Cog Nerosci, 3, 273-292.
Meulenbroek, R. G., Rosenbaum, D. A., Jansen, C., Vaughan, J., & V ogt, S. (2001).
Multijoint grasping movements. Simulated and observed effects of object location,
object size, and initial aperture. Exp Brain Res, 138(2), 219-234.
Miller, J. P., Jacobs, G. A., & Theunissen, F. E. (1991). Representation of sensory
information in the cricket cercal sensory system. I. Response properties of the
primary interneurons. J Neurophysiol, 66(5), 1680-1689.
Mink, J. W. (2003). The Basal Ganglia and involuntary movements: impaired inhibition
of competing motor patterns. Arch Neurol, 60(10), 1365-1368.
Miyai, I., Blau, A. D., Reding, M. J., & V olpe, B. T. (1997). Patients with stroke confined
to basal ganglia have diminished response to rehabilitation efforts. Neurology,
48(1), 95-101.
Mon-Williams, M., Tresilian, J. R., Coppard, V . L., & Carson, R. G. (2001). The effect of
obstacle position on reach-to-grasp movements. Exp Brain Res, 137(3-4), 497-501.
Moran, D. W., & Schwartz, A. B. (2000). One motor cortex, two different views. Nat
Neurosci, 3(10), 963; author reply 963-965.
Morasso, P. (1981). Spatial control of arm movements. Exp Brain Res, 42(2), 223-227.
Morasso, P. (1983). Three dimensional arm trajectories. Biol Cybern, 48(3), 187-194.
1 7 8
Mussa Ivaldi, F. A., Morasso, P., & Zaccaria, R. (1988). Kinematic networks. A
distributed model for representing and regularizing motor redundancy. Biol
Cybern, 60(1), 1-16.
Nakayama, H., Jorgensen, H. S., Raaschou, H. O., & Olsen, T. S. (1994). Compensation
in recovery of upper extremity function after stroke: the Copenhagen Stroke Study.
Arch Phys Med Rehabil, 75(8), 852-857.
Neisser, U. (1976). Cognition and reality : principles and implications of cognitive
psychology. San Francisco: W. H. Freeman.
Nelson, W. (1983). Physical principles for economies of skilled movements. Biological
Cybernetics(46), 135-147.
Nudo, R. J., Plautz, E. J., & Frost, S. B. (2001). Role of adaptive plasticity in recovery of
function after damage to motor cortex. Muscle Nerve, 24(8), 1000-1019.
Nudo, R. J., Wise, B. M., SiFuentes, F., & Milliken, G. W. (1996). Neural substrates for
the effects of rehabilitative training on motor recovery after ischemic infarct.
Science, 272(5269), 1791-1794.
O'Doherty, J. P. (2004). Reward representations and reward-related learning in the human
brain: insights from neuroimaging. Curr Opin Neurobiol, 14(6), 769-776.
Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive field
properties by learning a sparse code for natural images. Nature, 381(6583), 607-
609.
Oztop, E., & Arbib, M. A. (2002). Schema design and implementation of the grasp-
related mirror neuron system. Biol Cybern, 87(2), 116-140.
Paulignan, Y ., Dufosse, M., Hugon, M., & Massion, J. (1989). Acquisition of co-
ordination between posture and movement in a bimanual task. Exp Brain Res,
77(2), 337-348.
Paulignan, Y ., Frak, V . G., Toni, I., & Jeannerod, M. (1997). Influence of object position
and size on human prehension movements. Exp Brain Res, 114(2), 226-234.
Paulignan, Y ., Jeannerod, M., MacKenzie, C., & Marteniuk, R. (1991). Selective
perturbation of visual input during prehension movements. 2. The effects of
changing object size. Exp Brain Res, 87(2), 407-420.
Paulignan, Y ., MacKenzie, C., Marteniuk, R., & Jeannerod, M. (1990). The coupling of
arm and finger movements during prehension. Exp Brain Res, 79(2), 431-435.
1 7 9
Paulignan, Y ., MacKenzie, C., Marteniuk, R., & Jeannerod, M. (1991). Selective
perturbation of visual input during prehension movements. 1. The effects of
changing object position. Exp Brain Res, 83(3), 502-512.
Paz, R., Boraud, T., Natan, C., Bergman, H., & Vaadia, E. (2003). Preparatory activity in
motor cortex reflects learning of local visuomotor skills. Nat Neurosci, 6(8), 882-
890.
Pearson, J. C., Finkel, L. H., & Edelman, G. M. (1987). Plasticity in the organization of
adult cerebral cortical maps: a computer simulation based on neuronal group
selection. J Neurosci, 7(12), 4209-4223.
Penfield, W., & Faulk, M. E., Jr. (1955). The insula; further observations on its function.
Brain, 78(4), 445-470.
Pesaran, B., Nelson, M. J., & Andersen, R. A. (2006). Dorsal premotor neurons encode
the relative position of the hand, eye, and goal during reach planning. Neuron,
51(1), 125-134.
Plautz, E., Milliken GW, Nudo RJ. (2000). Effects of repetitive motor training on
movement representations in adult squirrel monkeys: role of use versus learning.
Neurobiol Learn Mem, 74, 27-55.
Plautz, E. J., Milliken, G. W., & Nudo, R. J. (2000). Effects of repetitive motor training
on movement representations in adult squirrel monkeys: role of use versus
learning. Neurobiol Learn Mem, 74(1), 27-55.
Pouget, A., Dayan, P., & Zemel, R. S. (2003). Inference and computation with population
codes. Annu Rev Neurosci, 26, 381-410.
Rand, M. K., Shimansky, Y . P., Hossain, A. B., & Stelmach, G. E. (2008). Quantitative
model of transport-aperture coordination during reach-to-grasp movements. Exp
Brain Res, 188(2), 263-274.
Rand, M. K., Squire, L. M., & Stelmach, G. E. (2006). Effect of speed manipulation on
the control of aperture closure during reach-to-grasp movements. Exp Brain Res,
174(1), 74-85.
Rao, R. P. (2004). Bayesian computation in recurrent neural circuits. Neural Comput,
16(1), 1-38.
Reggia, J. A., D'Autrechy, C. L., Sutton III, G. G., & Weinrich, M. (1992). A Competitive
Distribution Theory of Neocortical Dynamics. Neural Computation, 4, 287-317.
1 8 0
Reinkensmeyer, D. J., Iobbi, M. G., Kahn, L. E., Kamper, D. G., & Takahashi, C. D.
(2003). Modeling reaching impairment after stroke using a population vector
model of movement control that incorporates neural firing-rate variability. Neural
Comput, 15(11), 2619-2642.
Reinkensmeyer, D. J., McKenna Cole, A., Kahn, L. E., & Kamper, D. G. (2002).
Directional control of reaching is preserved following mild/moderate stroke and
stochastically constrained following severe stroke. Exp Brain Res, 143(4), 525-
530.
Reynolds, J. N., & Wickens, J. R. (2002). Dopamine-dependent plasticity of
corticostriatal synapses. Neural Netw, 15(4-6), 507-521.
Rioult-Pedotti, M. S., Friedman, D., & Donoghue, J. P. (2000). Learning-induced LTP in
neocortex. Science, 290(5491), 533-536.
Rioult-Pedotti, M. S., Friedman, D., Hess, G., & Donoghue, J. P. (1998). Strengthening of
horizontal cortical connections following skill learning. Nat Neurosci, 1(3), 230-
234.
Rizzolatti, G., & Fadiga, L. (1998). Grasping objects and grasping action meanings: the
dual role of monkey rostroventral premotor cortex (area F5). Novartis Found
Symp, 218, 81-95; discussion 95-103.
Rizzolatti, G., Fadiga, L., Gallese, V ., & Fogassi, L. (1996). Premotor cortex and the
recognition of motor actions. Brain Res Cogn Brain Res, 3(2), 131-141.
Roelfsema, P. R., & van Ooyen, A. (2005). Attention-gated reinforcement learning of
internal representations for classification. Neural Comput, 17(10), 2176-2214.
Rokni, U., Richardson, A. G., Bizzi, E., & Seung, H. S. (2007). Motor learning with
unstable neural representations. Neuron, 54(4), 653-666.
Rosenbaum, D. A., Engelbrecht, S. E., Bushe, M. M., & Loukopoulos, L. D. (1993a).
Knowledge Model for Selecting and Producing Reaching Movements. J Mot
Behav, 25(3), 217-227.
Rosenbaum, D. A., Engelbrecht, S. E., Bushe, M. M., & Loukopoulos, L. D. (1993b). A
model for reaching control. Acta Psychol (Amst), 82(1-3), 237-250.
Rosenbaum, D. A., Loukopoulos, L. D., Meulenbroek, R. G., Vaughan, J., & Engelbrecht,
S. E. (1995). Planning reaches by evaluating stored postures. Psychol Rev, 102(1),
28-67.
1 8 1
Rosenbaum, D. A., Meulenbroek, R. G., Vaughan, J., & Jansen, C. (1999). Coordination
of reaching and grasping by capitalizing on obstacle avoidance and other
constraints. Exp Brain Res, 128(1-2), 92-100.
Rumelhart, D., Hinton, G., & Willams, R. (1986). Learning internal representations by
error propagation. In D. Rumelhart, J. McClelland & t. P. R. Group (Eds.),
Parallel Distributed Processing: Explorations in the microstructure of Cognition,
(V ol. V olume 1:Foundations, pp. 318-362). Cambridge, MA: MIT Press.
Reprinted in Anderson and Rosenfeld (1988).
Sabes, P. N., & Jordan, M. I. (1997). Obstacle avoidance and a perturbation sensitivity
model for motor planning. J Neurosci, 17(18), 7119-7128.
Sabes, P. N., Jordan, M. I., & Wolpert, D. M. (1998). The role of inertial sensitivity in
motor planning. J Neurosci, 18(15), 5948-5957.
Sakamoto, T., Arissian, K., & Asanuma, H. (1989). Functional role of the sensory cortex
in learning motor skills in cats. Brain Res, 503(2), 258-264.
Sakata, H., Taira, M., Kusunoki, M., Murata, A., Tanaka, Y ., & Tsutsui, K. (1998). Neural
coding of 3D features of objects for hand action in the parietal cortex of the
monkey. Philos Trans R Soc Lond B Biol Sci, 353(1373), 1363-1373.
Sakata, H., Taira, M., Kusunoki, M., Murata, A., Tsutsui, K., Tanaka, Y ., et al. (1999).
Neural representation of three-dimensional features of manipulation objects with
stereopsis. Exp Brain Res, 128(1-2), 160-169.
Sakata, H., Taira, M., Murata, A., & Mine, S. (1995). Neural mechanisms of visual
guidance of hand action in the parietal cortex of the monkey. Cereb Cortex, 5(5),
429-438.
Saling, M., Alberts, J., Stelmach, G. E., & Bloedel, J. R. (1998). Reach-to-grasp
movements during obstacle avoidance. Exp Brain Res, 118(2), 251-258.
Samejima, K., Ueda, Y ., Doya, K., & Kimura, M. (2005). Representation of action-
specific reward values in the striatum. Science, 310(5752), 1337-1340.
Sanes, J. N., Suner, S., Lando, J. F., & Donoghue, J. P. (1988). Rapid reorganization of
adult rat motor cortex somatic representation patterns after motor nerve injury.
Proc Natl Acad Sci U S A, 85(6), 2003-2007.
Schaal, S. (2003). Dynamic movement primitives - A framework for motor control in
humans and humanoid robots. Paper presented at the The International
Symposium on Adaptive Motion of Animals and Machines.
1 8 2
Schaal, S., Ijspeert, A., & Billard, A. (2004). Computational approaches to motor learning
by imitation. In C. D. W. Frith, D (Ed.), The Neuroscience of Social Interaction
(V ol. 1431, pp. 199-218): Oxford University Press. .
Schaal, S., Peters, J., Nakanishi, J., & Ijspeert, A. (2004). Learning Movement Primitives.
Paper presented at the International Symposium on Robotics Research
(ISRR2003).
Schafer, S., Berkelmann, B., & Schuppan, K. (1999). The contribution of muscle
afferents to kinaesthesia shown by vibration induced illusions of movement and
by the effects of paralysing joint afferents. Brain Res(846), 210-218.
Scheidt, R. A., & Stoeckmann, T. (2007). Reach adaptation and final position control
amid environmental uncertainty after stroke. J Neurophysiol, 97(4), 2824-2836.
Schmidt, R. A., & Lee, T. D. (2005). Motor control and learning: A behavioral emphasis
(4th ed.). Champaign, IL: Human Kinetics.
Schultz, W. (1998). Predictive reward signal of dopamine neurons. J Neurophysiol, 80(1),
1-27.
Schultz, W., Tremblay, L., & Hollerman, J. R. (1998). Reward prediction in primate basal
ganglia and frontal cortex. Neuropharmacology, 37(4-5), 421-429.
Schweighofer, N., Arbib, M. A., & Kawato, M. (1998). Role of the cerebellum in
reaching movements in humans. I. Distributed inverse dynamics control. Eur J
Neurosci, 10(1), 86-94.
Schweighofer, N., Doya, K., & Lay, F. (2001). Unsupervised learning of granule cell
sparse codes enhances cerebellar adaptive control. Neuroscience, 103(1), 35-50.
Schweighofer, N., Han, C.E., Wolf S. L., Arbib M.A., Winstein C.J (2009). A functional
threshold for long-term use of hand and arm function can be predicted:
predictions from a computational model and supporting data from the extremity
constraint-induced therapy evaluation (EXCITE) trial. Phys Ther., 89. In press.
Schweighofer, N., Shishida, K., Han, C. E., Okamoto, Y ., Tanaka, S. C., Yamawaki, S., et
al. (2006). Humans can adopt optimal discounting strategy under real-time
constraints. PLoS Comput Biol, 2(11), e152.
Schweighofer, N., Spoelstra, J., Arbib, M. A., & Kawato, M. (1998). Role of the
cerebellum in reaching movements in humans. II. A neural model of the
intermediate cerebellum. Eur J Neurosci, 10(1), 95-105.
Scott, S. H. (2000a). Population vectors and motor cortex: neural coding or
epiphenomenon? Nat Neurosci, 3(4), 307-308.
1 8 3
Scott, S. H. (2000b). Reply to 'One motor cortex, two different views'. Nat Neurosci,
3(10), 964-965.
Scott, S. H., Gribble, P. L., Graham, K. M., & Cabel, D. W. (2001). Dissociation between
hand motion and population vectors from neural activity in motor cortex. Nature,
413(6852), 161-165.
Scott, S. H., & Kalaska, J. F. (1995). Changes in motor cortex activity during reaching
movements with similar hand paths but different arm postures. J Neurophysiol,
73(6), 2563-2567.
Scott, S. H., & Kalaska, J. F. (1997). Reaching movements with similar hand paths but
different arm orientations. I. Activity of individual cells in motor cortex. J
Neurophysiol, 77(2), 826-852.
Scott, S. H., Sergio, L. E., & Kalaska, J. F. (1997). Reaching movements with similar
hand paths but different arm orientations. II. Activity of individual cells in dorsal
premotor cortex and parietal area 5. J Neurophysiol, 78(5), 2413-2426.
Selemon, L. D., & Goldman-Rakic, P. S. (1988). Common cortical and subcortical targets
of the dorsolateral prefrontal and posterior parietal cortices in the rhesus monkey:
Evidence for a distributed neural network subserving spatially guided behavior. J.
Neurosci., 8(11), 4049-4068.
Shadmehr, R., & Mussa-Ivaldi, F. A. (1994). Adaptive representation of dynamics during
learning of a motor task. J Neurosci, 14(5 Pt 2), 3208-3224.
Shadmehr, R., & Wise, S. P. (2005). The computational neurobiology of reaching and
pointing: a foundation for motor learning. Cambridge, MA 02142: MIT press.
Shibata, K., & Ito, K. (2003). Hidden Representation after Reinforcement Learning of
Hand Reaching Movement with Variable Link Length. Paper presented at the
IJCNN (Int'l Conf. on Neural Networks).
Shibata, K., Sugisaka, M., & Ito, K. (2000). Hand Reaching Movement Acquired through
Reinforcement Learning. Paper presented at the 2000 KACC (Korea Automatic
Control Conference).
Simmons, G., & Demiris, Y . (2004). Biologically inspired optimal robot arm control with
signal-dependent noise. Paper presented at the IEEE IROS 2004, Sendai, Japan.
Simmons, G., & Demiris, Y . (2005). Optimal robot arm control using the minimum
variance model. Journal of Robotic Systems, 22(11), 677-690.
Simmons, G., & Demiris, Y . (2006). Object Grasping using the Minimum Variance Model.
Biol Cybern, 94(5), 393-407.
1 8 4
Smeets, J., & Brenner, E. (1999). A New view on Grasping. Motor Control(3), 237-271.
Song, D., LAN, N., & Gordon, J. (2006). Simulated hand variability during multi-joint
arm posture control. Paper presented at the Society for Neuroscience 2006,
Atlanta, Georgia
Spoelstra, J., Schweighofer, N., & Arbib, M. A. (2000). Cerebellar learning of accurate
predictive control for fast-reaching movements. Biol Cybern, 82(4), 321-333.
Sterr, A., Freivogel, S., & Schmalohr, D. (2002). Neurobehavioral aspects of recovery:
assessment of the learned nonuse phenomenon in hemiparetic adolescents. Arch
Phys Med Rehabil, 83(12), 1726-1731.
Sunderland, A., & Tuke, A. (2005). Neuroplasticity, learning and recovery after stroke: a
critical evaluation of constraint-induced therapy. Neuropsychol Rehabil, 15(2),
81-96.
Sutton III, G. G., Reggia, J. A., Armentrout, S. L., & D'Autrechy, C. L. (1994). Cortical
Map Reorganization as a Competitive Process. Neural Computation, 6, 1-13.
Sutton, R. S. (1988). Learning to predict by the methods of temporal difference. Machine
Learning, 3, 9-44.
Sutton, R. S. (1995). TD models: Modeling the world at a mixture of time scales. Paper
presented at the the 12th International Conference on Machine Learning, San
Metro, CA: Morgan Kaufmann.
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning : an introduction.
Cambridge, Mass.: MIT Press.
Taub, E., Miller NE, Novack TA, Cook EW III, Fleming WC, Nepomuceno CS, Connell
JS, Crago JE. (1993). Technique to improve chronic motor deficit after stroke.
Arch Phys Med Rehabil, 74, 347-354.
Taub, E., & Uswatt, G. (2006). Constraint-Induced Movement therapy: answers and
questions after two decades of research. NeuroRehabilitation, 21(2), 93-95.
Taub, E., & Uswatte, G. (2003). Constraint-induced movement therapy: bridging from the
primate laboratory to the stroke rehabilitation laboratory. J Rehabil Med(41
Suppl), 34-40.
Taub, E., Uswatte, G., & Elbert, T. (2002). New treatments in neurorehabilitation founded
on basic research. Nat Rev Neurosci, 3(3), 228-236.
Taub, E., Uswatte, G., Mark, V . W., & Morris, D. M. (2006). The learned nonuse
phenomenon: implications for rehabilitation. Eura Medicophys, 42(3), 241-256.
1 8 5
Taub, E., Uswatte, G., & Morris, D. M. (2003). Improved motor recovery after stroke and
massive cortical reorganization following Constraint-Induced Movement therapy.
Phys Med Rehabil Clin N Am, 14(1 Suppl), S77-91, ix.
Theunissen, F. E., & Miller, J. P. (1991). Representation of sensory information in the
cricket cercal sensory system. II. Information theoretic calculation of system
accuracy and optimal tuning-curve widths of four primary interneurons. J
Neurophysiol, 66(5), 1690-1703.
Timmann, D., Stelmach, G. E., & Bloedel, J. R. (1996). Grasping component alterations
and limb transport. Exp Brain Res, 108(3), 486-492.
Todorov, E. (2000a). Direct cortical control of muscle activation in voluntary arm
movements: a model. Nat Neurosci, 3(4), 391-398.
Todorov, E. (2000b). Reply to 'One motor cortex, two different views'. Nat Neurosci,
3(10), 964.
Todorov, E. (2002). Cosine tuning minimizes motor errors. Neural Comput, 14(6), 1233-
1260.
Todorov, E. (2004). Optimality principles in sensorimotor control. Nat Neurosci, 7(9),
907-915.
Todorov, E. (2005). Stochastic optimal control and estimation methods adapted to the
noise characteristics of the sensorimotor system. Neural Comput, 17(5), 1084-
1108.
Todorov, E., & Jordan, M. I. (2002). Optimal feedback control as a theory of motor
coordination. Nat Neurosci, 5(11), 1226-1235.
Tresilian, J. R. (1998). Attention in action or obstruction of movement? A kinematic
analysis of avoidance behavior in prehension. Exp Brain Res, 120(3), 352-368.
Tretriluxana, J. (2008). HEMISPHERIC SPECIALIZATION OF REACH-TO-GRASP
ACTIONS. University of Southern California, Los Angeles.
Tretriluxana, J., Gordon, J., Arbib, M. A., Fisher, B., & Winstein, C. J. (2007). Right
hemisphere specialization for transport-grasp coordination after stroke: Insight
from a barrier paradigm. Journal of neurophysiology, submitted.
Tretriluxana, J., Gordon, J., Arbib, M. A., Fisher, B. E., & Winstein, C. J. (2009). Right
hemisphere specialization for transport-grasp coordination after stroke: Insight
from a barrier paradigm. Neurorehabilitation and Neural Repair, submitted (under
revision).
1 8 6
Tretriluxana, J., Gordon, J., Fisher, B., & Winstein, C. J. (2007). Hemispheric
specialization in reach-to-grasp control: perspective from stroke. journal of
neurophysiology, submitted.
Tretriluxana, J., Gordon, J., Fisher, B. E., & Winstein, C. J. (2009). Hemisphere Specific
Impairments in Reach-to-Grasp Control after Stroke: Effects of Object Size.
Neurorehabilitation and Neural Repair, doi:10.1177/1545968309332733
Tretriluxana, J., Gordon, J., & Winstein, C. J. (2004). Hemispheric asymmetry in reach-
to-grasp adaptation to external perturbation. Paper presented at the Society for
Neuroscience (SfN) 2004.
Tretriluxana, J., Gordon, J., & Winstein, C. J. (2007). Manual Asymmetries in grasp pre-
shaping and transport-grasp coordination. Journal of Neurophysiology, submitted.
Tretriluxana, J., Gordon, J., & Winstein, C. J. (2008). Manual Asymmetries in grasp pre-
shaping and transport-grasp coordination. Exp Brain Res, 188(2), 305-315.
Tretriluxana, J., Winstein, C. J., & Gordon, J. (2005). Reach-to-grasp coordination:
Manual asymmetry. Paper presented at the Society for Neuroscience 2005.
Ulloa, A., & Bullock, D. (2003). A neural network simulating human reach-grasp
coordination by continuous updating of vector positioning commands. Neural
Networks, 16, 1141-1160.
Uno, Y ., Kawato, M., & Suzuki, R. (1989). Formation and control of optimal trajectory in
human multijoint arm movement. Minimum torque-change model. Biol Cybern,
61(2), 89-101.
Uno, Y ., Suzuki, R., & Kawato, M. (1989). Minimum-muscle-tension-change model
which reproduces human arm movement. Paper presented at the the 4th
Symposium on Biological and Physiological Engineering.
Uswatte, G., Taub, E., Morris, D., Light, K., & Thompson, P. A. (2006). The Motor
Activity Log-28: assessing daily use of the hemiparetic arm after stroke.
Neurology, 67(7), 1189-1194.
Uswatte, G., Taub, E., Morris, D., Vignolo, M., & McCulloch, K. (2005). Reliability and
validity of the upper-extremity Motor Activity Log-14 for measuring real-world
arm use. Stroke, 36(11), 2493-2496.
van Beers, R. J., Wolpert, D. M., & Haggard, P. (2002). When feeling is more important
than seeing in sensorimotor adaptation. Curr Biol, 12(10), 834-837.
1 8 7
Vaughan, J., Rosenbaum, D. A., & Meulenbroek, R. G. (2001). Planning reaching and
grasping movements: the problem of obstacle avoidance. Motor Control, 5(2),
116-135.
von der Malsburg, C. (1973). Self-organization of orientation sensitive cells in the striate
cortex. Kybernetik, 14(2), 85-100.
Widrow, B., & Stearns, S. D. (1985). Adaptive signal processing. Englewood Cliffs, N.J.:
Prentice-Hall.
Winstein, C., & Wolf, S. (2008). Task-oriented training to promote upper extremity
recovery. In J. Stein, R. Macko, C. Winstein & R. Zorowitz (Eds.), Stroke
recovery & rehabilitation. (pp. 267-290). New York: Demos Medical
Winstein, C. J., Miller, J. P., Blanton, S., Taub, E., Uswatte, G., Morris, D., et al. (2003).
Methods for a multisite randomized trial to investigate the effect of constraint-
induced movement therapy in improving upper extremity function among adults
recovering from a cerebrovascular stroke. Neurorehabil Neural Repair, 17(3),
137-152.
Winstein, C. J., Rose, D. K., Tan, S. M., Lewthwaite, R., Chui, H. C., & Azen, S. P.
(2004). A randomized controlled comparison of upper-extremity rehabilitation
strategies in acute stroke: A pilot study of immediate and long-term outcomes.
Arch Phys Med Rehabil, 85(4), 620-628.
Wolf, S., Blanton S, Baer H, Breshears J, Butler AJ. (2002). Repetitive Task Practice: A
critical review of constraint induced therapy in stroke. The Neurologist, 8, 325-
338.
Wolf, S., Lecraw DE, Barton LA, Jann BB. (1989). Forced use of hemiplegic upper
extremities to reverse the effect of learned nonuse among chronic stroke and
head-injured patients. Exp Neurol, 104, 104-132.
Wolf, S. L. (2007). Revisiting constraint-induced movement therapy: are we too smitten
with the mitten? Is all nonuse "learned"? and other quandaries. Phys Ther, 87(9),
1212-1223.
Wolf, S. L., Catlin, P. A., Ellis, M., Archer, A. L., Morgan, B., & Piacentino, A. (2001).
Assessing Wolf motor function test as outcome measure for research in patients
after stroke. Stroke, 32(7), 1635-1639.
Wolf, S. L., Thompson, P. A., Morris, D. M., Rose, D. K., Winstein, C. J., Taub, E., et al.
(2005). The EXCITE trial: attributes of the Wolf Motor Function Test in patients
with subacute stroke. Neurorehabil Neural Repair, 19(3), 194-205.
1 8 8
Wolf, S. L., Winstein, C. J., Miller, J. P., Taub, E., Uswatte, G., Morris, D., et al. (2006).
Effect of constraint-induced movement therapy on upper extremity function 3 to 9
months after stroke: the EXCITE randomized clinical trial. Jama, 296(17), 2095-
2104.
Wolf, S. L., Winstein, C. J., Miller, J. P., Thompson, P. A., Taub, E., Uswatte, G., et al.
(2007). Retention of upper limb function in stroke survivors who have received
constraint-induced movement therapy: the EXCITE randomised trial. Lancet
Neurol.
Wolf, S. L., Winstein, C. J., Miller, J. P., Thompson, P. A., Taub, E., Uswatte, G., et al.
(2008). Retention of upper limb function in stroke survivors who have received
constraint-induced movement therapy: the EXCITE randomised trial. Lancet
Neurol, 7(1), 33-40.
Wolpert, D. M. (1997). Computational approaches to motor control. Trends Cogn. Sci.,
1(6), 209-216.
Wolpert, D. M., Ghahramani, Z., & Jordan, M. I. (1995a). Are arm trajectories planned in
kinematic or dynamic coordinates? An adaptation study. Exp Brain Res, 103(3),
460-470.
Wolpert, D. M., Ghahramani, Z., & Jordan, M. I. (1995b). An internal model for
sensorimotor integration. Science, 269(5232), 1880-1882.
Wolpert, D. M., & Kawato, M. (1998). Multiple paired forward and inverse models for
motor control. Neural Netw, 11(7-8), 1317-1329.
Zajac, F. E., & Gordon, M. E. (1989). Determining muscle's force and action in multi-
articular movement. Exerc Sport Sci Rev, 17, 187-230.
Zemel, R. S., & Dayan, P. (1997). Combining probabilistic population codes. Paper
presented at the 15th International joint Conference in Artificial Intelligence.
1 8 9
Appendix A.
Learning rule derivation of a simplified motor cortex model
The partial derivatives of the cost function in equation (3.2) with respect to the
preferred direction of the ith neuron, θ
p
i
.
≥ − − −
< − − − − −
=
2
0 ) (
2
) sin( ) cos( ) (
0
π
θ θ
δθ
δθ
θ θ
π
θ θ θ θ θ θ λ
δθ
δθ
θ θ
δθ
δ
i
p d i
p
e
d e
i
p d
i
p d
i
p d i
p
e
d e
i
p
E
(A.1)
where
0
λ = λ 2 . This equation is valid where 2 / π θ θ < −
i
p d
, because y is
differentiable in this range; otherwise the second term equals zero.
Derivation of the first term in the right hand side of equation (A.1)
We used a geometrical approximation to
i
p
e
δθ
δθ
(Figure A.1).
Figure A.1. Approximation of
i
p
e
δθ
δθ
. V
i
is a vector for preferred direction
i
p
θ , and
V
e
a vector for the executed direction
e
θ .
1 9 0
A perturbation on the preferred direction
i
p
δθ also perturbed the cell’s vector V
i
along this direction of magnitude
i
y . This perturbation in the vector also perturbed the
population vector V
e
on the executed direction. Because the amount of perturbation in the
specific individual vector,
i
p i
V δθ | | and the amount of perturbation in the population
vector,
e e
V δθ | | are equal,
i
e e
i
i
p
e
e e
i
p i
y
V V
V
V V
| |
1
| |
| |
| | | | = = ⇔ =
δθ
δθ
δθ δθ (A.2)
Thus we approximate
i
p
e
δθ
δθ
with
i
y , the only available local (and thus biologically
plausible) information.
| |
1
e
V
is a scaling factor and can be absorbed into the learning
rate of supervised learning.
We can then obtain the supervised learning rule in equation (3.2).
≈ − ≈ −
| |
| |
) ( ) (
e
i
i
p
e
e
i
d e i
p
e
d e
V
y
V
y
δθ
δθ
θ θ
δθ
δθ
θ θ Θ
(A.3)
We verified both analytically and in simulations that the approximation
| |
e
i
i
p
e
V
y
≈
δθ
δθ
is appropriate (please contact the corresponding author to obtain the exact derivation and
corresponding simulation results).
1 9 1
Derivation of the second term in the right hand side of equation (A.1)
The second term can be approximated with
) ) sin( ), cos( (
) ( ) sin( ) cos(
0 0
x x y
y
i
p d
i
i
p d
i i
p d
i
p d
≈ − =
− ≈ − −
θ θ
θ θ λ θ θ θ θ λ
Θ
(A.4)
Note that sequation (A.4) is valid where 2 / π θ θ < −
i
p d
; otherwise this term is
equal to zero. The first Taylor expansion of sine function is valid for
i
p
θ ≈
d
θ . However,
due to truncation of the neurons’ activation rule (Equation (3.1)), if
i
p
θ is far from
d
θ ,
where the approximation of the sine function is invalid,
i
y is zero or near zero. Thus,
this approximation is valid for all directions. We verified in simulations, that compared to
a non-approximated equation, the maximum error is about 13%.
In summary, the weight update rule is
i i
p d UL
i
e d SL
i
p
i i
p d
e
i
d e
i
p i
p
i
p
i
p
y y
y
V
y E
) ( ) (
) (
| |
) (
0
θ θ α θ θ α θ
θ θ αλ θ θ α θ
δθ
δ
α θ θ
− + − + =
− + − − ≈ − ←
(A.5)
with α ≥ 0,
SL
α =
| |
e
V
α
the learning rate of the supervised learning rule, and
UL
α =
0
αλ the learning rate of the unsupervised learning rule.
1 9 2
Appendix B.
Effects of learning process on bistability in use and motor performance
Figure B.1. Directional error, the normalized population vector (PV) and spontaneous
arm use after different durations of therapy followed by 0 free choice trials (immediate)
and 3000 free choice therapy (follow-up) without supervised learning. Unlike in the full
model (see Figure 3.5), the bistable behavior is not present, as shown by the non-crossing
of the curves in the immediate and follow-up condition.
1 9 3
Figure B.2. Directional error, the normalized population vector (PV) and spontaneous
arm use after different durations of therapy followed by 0 free choice trials (immediate)
and 3000 free choice therapy (follow-up) without unsupervised learning. Unlike in the
full model (see Figure 3.5), the bistable behavior is not present, as shown by the non-
crossing of the curves in the immediate and follow-up condition.
Figure B.3. Directional error, the normalized population vector (PV) and spontaneous
arm use after different durations of therapy followed by 0 free choice trials (immediate)
and 3000 free choice therapy (follow-up) without reinforcement learning. Unlike in the
full model (see Figure 3.5), the bistable behavior is not present, as shown by the non-
crossing of the curves in the immediate and follow-up condition. Here, we used the same
reinforcement learning rate (0.01) on acute stroke phase (500 free choice trials after
lesioning) then simulated without reinforcement learning. Due to supervised learning and
unsupervised learning, the performance improved over time. But the spontaneous arm use
does not affected.
1 9 4
Appendix C.
Trial numbers of each group, simulation parameters,
and histogram diagram for reach-to-grasp coordination model
Pattern in grasping Trial numbers Total number
of trials
Fast preshape C001-L: 2 7 8 11 14 15 17 19 C001-R: 1 5 7 8 9 12
C004-L: 10 20 C006-L: 1 2 10 20 C006-R: 17
C014-L: 2 3 4 5 7 9 10 11 13 14 15 16 17 18 19 20
C014-R: 1 11 12 15 17 C018-L: 10 15
44
Early Preshape C001-L: 3 4 6 10 C002-L: 17
C006-L: 3 4 8 9 12 13 14 15 16 17 18 19
C006-R: 2 3 4 5 6 7 8 9 10 11 13 14 15 16 18 19 20
C018-L: 1
35
Double Early
Preshape
C001-L: 1 5 9 C006-L: 6 7 C006-R: 1 12
C014-R: 10 13 18 19 20
C018-L: 2 3 4 5 6 7 8 9 12 13 17 18 19 20
C018-R: 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
43
Independent
Preshape
C001-L: 12 16 20 C001-R: 3 10 11 15 16 17 18 19 20
C002-R: 16 17 18 C004-L: 4 7
C006-L: 5 11 C014-L: 12 C014-R: 4 7 8 9 16
25
Late Preshape C001-L: 18 C001-R: 2 4 6 14
C002-L: 7 8 12 16 C002-R: 1 2 4 5 6 8 9 13 20
C004-L: 1 2 3 5 6 7 8 9 10 11 13 14 16 18 19
C004-R: 3 5 6 12 17 18
40
Double Late
Preshape
C001-L: 13 C001-R: 13
C002-L: 1 2 3 4 5 6 9 10 11 13 14 15 18 19 20
C002-R: 3 7 10 11 12 14 15 19
C004-L: 4 12 15 17 20 C004-R: 1 2 8 9 11 13 14 15 16 19
C014-L: 1 6 8 C014-R: 2 3 5 6 14 16
C018-L: 11 14 C018-R: 1 19 20
53
Table C.1. Trial numbers which are classified to each group.
1 9 5
Module Parameter name fast preshape Early
preshape
Late preshape Independent
preshape
subject # / hand / trial # C014LB2T17 C006RB2T13 C004LB2T17 C001RB2T15
Reaching speed control parameter, R 0.000007 0.000025 0.00002 0.000006
distance to a virtual target 1,9 m 3.8 m 3.8 m 1.9 m
initial direction 12,46° 35.89° 27.29° 41.78°
switch time 76.5 msec 80.2 msec 90.0 msec 70.7 msec
D 510 msec 668 msec 600 msec 442 msec
Grasping timing of preshape
initialization
0 msec 0 msec 180 msec 0 msec
size of maximum grip
aperture
2.69 cm 4.85 cm 2.57 cm 2.59 cm
timing of maximum grip
aperture
268 msec 190 msec 505 msec 330 msec
final aperture 2.4 cm 2.4 cm 1.64 cm 1.20 cm
Table C.2. Parameters used for each pattern.
1 9 6
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
10
20
30
#
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
5
10
15
20
#
normalized time
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
5
10
15
20
#
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
5
10
15
20
#
(a) EP+DEP (reaching)
(b) EP+DEP (grasping)
(c) LP+DLP (reaching)
(d) LP+DLP (grasping)
FVP MDP
PI MAP
FVP MDP
PI MAP
Figure C.1. Histogram of t(FVP), t(MDP), t(PI) and t(MAP) for EP+DEP
patterns and LP+DLP patterns, where x axis denotes normalized time and y axis
denotes number of trials. In EP+DEP, MAP in grasping occurs just after FVP in
reaching (within 60 msec). In LP+DLP, PI in grasping occurs just after MDP in
reaching (within 25 msec). FVP=first velocity peak in reaching velocity profile,
MDP=maximally deviated point in reaching trajectory, PI=preshape initialization in
grasping aperture profile, MAP=maximum aperture peak in grasping aperture profile.
1 9 7
Appendix D.
A two link arm Model with six Hill-type muscles
Each muscle generates a tension, dependent on the force-length characteristic (FL)
and the force-velocity characteristic (FV) (Lan, 2002; Zajac & Gordon, 1989)
p
T FV FL u F T + ⋅ ⋅ ⋅ =
max
(D.1)
where F
max
is the maximum force generated and u is a motoneuron activation between
zero and one. FL, FV and T
p
are given by
) 0 , 1 2 max(
2
+ + ⋅ ⋅ − ⋅ = k L k L k FL
m m
(D.2)
) 0 ,
) exp( 1
max(
c V b
a
FV
m
− ⋅ +
= (D.3)
) 0 , ) ( ) ( max(
1 0 0
t
l
u b b l l k T
m p
∂
∂
⋅ + − − ⋅ = (D.4)
where normalized muscle length L
m
is given by muscle length l divided by rest muscle
length l
m
, and normalized muscle contraction speed V
m
is given by muscle contraction
speed divided by 5 times rest muscle length l
m
(Lan, 2002). The rest muscle length is set
when the rest posture is on [0, π/2]. The muscle-wide parameters are k=-3.1888, a=1.5,
b=8, and c=log(2)/8. The other muscle specific parameters for muscles are shown in
Table D.1.
1 9 8
Fmax (N) Lm (cm) k0 (N/m) b0 (Ns/m) b1 (Ns/m)
shoulder extensor (E) 135.45 22 3078.4 500 1000
shoulder flexor (F) 135.45 22 3078.4 500 1000
elbow opener (O) 135.45 13 5209.6 500 1000
elbow closer (C) 135.45 13 5209.6 500 1000
biarticular bicep (B) 180.60 33 2736.4 500 1000
biarticular tricep (D) 180.60 33 2736.4 500 1000
Table D.1. Parameters of six muscles: shoulder extensor (E), shoulder flexor (F),
elbow opener (O), elbow closer (C), biarticular bicep (B), and biarticular triceps (D).
Fmax and lm is based on Lan (2002), parameters for passive force, k0, b0, and b1, were
selected in order to match with the shape of FL-FV characteristics in Kambara et al.’s
(2009).
The muscle tension pulled link and the related joint rotated. This relationship is
captured by the moment arm matrix. The generated tensions are transformed to joint
torques through a moment arm matrix (the equation D.5). Though the moment arm matrix
varies with the arm configuration (Ning Lan 2002; Spoelstra et al., 2000; and Katayama,
1993), we simplified it to a pulley system (Figure 6.2), keeping the moment arm matrix
constant.
T
a a a a
a a a a
A
T A
− −
− −
=
⋅ =
4 4 2 2
3 3 1 1
0 0
0 0
τ
(D.5)
where ] , , , [
4 3 2 1
a a a a =[4.0, 2.5, 2.8, 3.5] cm.
I used a two-link planar arm on a horizontal plane and omitted gravity compensation
in the arm dynamics. The arm dynamics is given by equation (D.6), where m
j
, l
gj
, l
j
, and I
j
denote a mass, distance from the center of mass to the joint, length of link and rotary
inertia of jth link (Table D.2). Parameters and notation is borrowed from Kambara et al.
(2009).
1 9 9
2 1
211 2
2
2
22 2
1
2
21 2
2 1
112
2 2
122 2
2
2
12 2
1
2
11 1
) (
) )( ( 2 ) (
t
h
t
M
t
M
t t
h
t
h
t
M
t
M
∂
∂
+
∂
∂
+
∂
∂
=
∂
∂
∂
∂
+
∂
∂
+
∂
∂
+
∂
∂
=
θ θ θ
τ
θ θ θ θ θ
τ
(D.6)
where
) sin(
) cos(
) cos( 2
2 1 2 2 211 112 122
2 22
2 1 2 2 2 21 12
2
1 2 2 1 2 2 2 1 11
θ
θ
θ
l l m h h h
I M
l l m I M M
l m l l m I I M
g
g
g
− = − = =
=
+ = =
+ + + =
(D.7)
Upper arm Lower arm
m (kg) 1.59 1.44
l (m) 0.3 0.35
L
g
(m) 0.18 0.21
I (kgm
2
) 6.78 ·10
-2
7.99 ·10
-2
Table D.2. Parameters of two-link arm (Kambara et al., 2009).
Abstract (if available)
Abstract
Reach-to-grasp action is a principal action in our daily lives
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Design of adaptive automated robotic task presentation system for stroke rehabilitation
PDF
Hemispheric specialization of reach-to-grasp actions
PDF
Modeling motor memory to enhance multiple task learning
PDF
Minimum jerk model for control and coarticulation of arm movements with multiple via-points
PDF
Cerebellar learning of internal models for reaching and grasping: Adaptive control in the presence of delays.
PDF
Experimental and computational explorations of different forms of plasticity in motor learning and stroke recovery
PDF
Computational model of stroke therapy and long term recovery
PDF
The task matrix: a robot-independent framework for programming humanoids
PDF
Bayesian methods for autonomous learning systems
PDF
The representation, learning, and control of dexterous motor skills in humans and humanoid robots
PDF
Computational transcranial magnetic stimulation (TMS)
PDF
Iterative path integral stochastic optimal control: theory and applications to motor control
PDF
Computational modeling and utilization of attention, surprise and attention gating
PDF
Computational models and model-based fMRI studies in motor learning
PDF
Modeling the mirror system in action observation and execution
PDF
Computational principles in human motor adaptation: sources, memories, and variability
PDF
Learning reaching skills in non-disabled and post-stroke individuals
PDF
Computational modeling and utilization of attention, surprise and attention gating [slides]
PDF
Data-driven autonomous manipulation
PDF
Biomimetic tactile sensor for object identification and grasp control
Asset Metadata
Creator
Han, Cheol (author)
Core Title
Modeling human reaching and grasping: cortex, rehabilitation and lateralization
School
Andrew and Erna Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
03/11/2010
Defense Date
07/14/2009
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
individualization,motor control,motor cortex,neuronal coding,OAI-PMH Harvest,Rehabilitation,stroke
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Arbib, Michael A. (
committee chair
), Gordon, James (
committee member
), Schaal, Stefan (
committee member
), Schweighofer, Nicolas (
committee member
)
Creator Email
cheolhan@gmail.com,cheolhan@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m2603
Unique identifier
UC1421261
Identifier
etd-Han-3183 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-262248 (legacy record id),usctheses-m2603 (legacy record id)
Legacy Identifier
etd-Han-3183.pdf
Dmrecord
262248
Document Type
Dissertation
Rights
Han, Cheol
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
individualization
motor control
motor cortex
neuronal coding
stroke