Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Modeling the mirror system in action observation and execution
(USC Thesis Other)
Modeling the mirror system in action observation and execution
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
MODELING THE MIRROR SYSTEM IN ACTION OBSERVATION AND
EXECUTION
by
James Bonaiuto
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(NEUROSCIENCE)
August 2010
Copyright 2010 James Bonaiuto
ii
Dedication
To my grandfathers – McCaughin and Bonaiuto
iii
Acknowledgements
The time I spent at USC was one the best periods of my life. I got to engage in
exciting research with Michael Arbib and travel to other countries to present my work
and learn from many interesting people. I am most thankful to Michael Arbib for being a
better advisor and mentor than I could have hoped for. I should also thank Erhan Oztop
for setting the stage for most of my research with his work, Mihail Bota for teaching me
about neuroinformatics, and Nicholas Schweighofer for teaching me about modeling.
I would also like to thank my thesis committee members Nina Bradley, Laurent Itti,
and Roberto Delgado as well as my qualification exam committee members Stefan
Schaal and Nayuta Yamashita. Nina Bradley offered many useful comments and
suggestions that helped shape my view of infant motor development. Without Laurent Itti
this work would not have been possible. I owe him a debt of gratitude for his support
early on and use of his computational resources.
I am very thankful to Ipke Wachsmuth for giving me the opportunity to visit the
Center for Interdisciplinary Research (ZiF) in Bielefeld, Germany as a junior fellow. This
was a particularly great period of my PhD studies where I met many great friends and
collaborators including Stefan Kopp, Scott Jordan, Gunther Knoblich, Jurgen Streek,
Kristinn Thorisson, Susan Duncan, Manuela Lenzen, Matthias Uhl, Matthias Weigert,
John Baressi, Bruno Galantucci, and Maggie Shifrar. I would especially thank Kristinn
Thorisson for an exciting collaboration, and Stefan Kopp for teaching me about 3d avatar
simulation – a topic that became very useful in my work.
I owe a debt of gratitude to Janet Wiles for allowing me to visit her lab in Brisbane,
Australia, and tolerating my excessive use of internet bandwidth. I learned a lot about the
iv
Izhikevich neural model during this time which proved very valuable in my research. I
would like to thank Mark Wakabayashi for interesting discussions of neural simulators,
Peter Stratton for teaching me about Izhikevich neurons, and Chris Nolan and Michael
Milford for their friendship.
I must thank Clement Moulin-Frier for hosting Jill and I during our stay in Grenoble,
France. Thanks to Jean-Luc Schwartz and Marc Sato for allowing me to come and give a
talk and for stimulating conversation. I also thank Clement and his friends Guillame,
Jean, and Laurent for the wine and good times.
I owe a lot to Scott Frey for his support and collaboration. I should also thank
Stephane Jacobs for teaching me about TMS and Jolinda Smith and her family for
allowing me stay at their place while I was in Oregon.
I thank Beth Fisher for her support and for training me in TMS. Thanks also to the
TMS lab members including Alice Lee, Jason Pong, Shailesh Kantak, and Erica Pitsch.
During my PhD I got to work with many good friends in the Hedco Neuroscience
Building. Thanks to the iLab crew - Nathan Mundhenk for all of his technical help and
good humor, Dave Berg for great suggestions and feedback and for fun outside the lab,
Lior Elazary for help with the 3d arm simulator, and Farhan Baluch for help with genetic
algorithms, Rand Voorhies, and Nader Noori. I’d also like to thank my fellow members
of the USC Brain Project - Edina Rosta for her friendship and collaboration on MNS2
and ACQ, Anon Plangasprok and Ugur Demiryurek for their help with BODB, Cheol
Han for help with Matlab, Jinyong Lee for his friendship and introduction to Korean
BBQ, Rob Schuler, and Raymond Lee. Thanks to Katie Garrison for her collaboration on
v
our TMS project. I am also thankful to Laura Lopez, Gloria Wan, Vanessa Clark, and
Linda Bazilian for all of their help.
I owe a lot to some fantastic professors I had at Drexel University for preparing me
for PhD research. My best computer science professors were my algorithms professor Ali
Shoukofandeh, artificial intelligence professor David Shultz, and linear algebra professor
Herman Gollwitzer. I am greatly indebted to Steve Platek, who got me interested in
neuroscience and let me work in his lab. I also had some great bosses during my time in
Philadelpha – Erik Bergenholtz took a chance on me and gave me a job at HP where I
matured as a programmer, and Max Zilberman at CrossCurrent taught me to work like a
maniac.
I’d also like to thank my good friends from college that I don’t keep in touch with
enough – “Cheese” Mike Schachter (I wouldn’t have gotten into computational
neuroscience if it wasn’t for you!), Steve “Rooster” Kaplan, “Magic” Mike Martin, Ivan
Panyavin, Jesse “Spacemonkey” Spacco, and Shawn Riley. Also thanks to the Bradenton
Punks 34205 – Mike “Leroy” LeRoy, Andrew Craig, Bill “the Heart-Attack” Hartman,
Mike Hartman, Chris Flint, Jason Dickerson, Scotty Steiner, Andrew “Drewbie” Holmes,
Mike Simpson “Samson”, Craig “Huggy Bear” Rowlson, Ryan “Frank” Muldermans,
Josh Spade, “Nasty” Nate Knight, and Mike Brindisi. This work would have been much
harder without the help of Al “King Pappa” Johnson.
This thesis would not have been possible without my girlfriend Jillian Tromp who
tolerated me through the toughest times and offered her warmth, love, and support. The
biggest thanks goes to my family, Mary Anne, Robert, Amy, and especially my parents,
for all of their love.
vi
Table of Contents
Dedication ii
Acknowledgements iii
List of Tables viii
List of Figures x
Abbreviations xviii
Chapter 1 - Introduction 1
1.1 Biological Background 2
1.1.1 Parietal Areas 3
1.1.2 Premotor Areas 11
1.1.3 Basal Ganglia 18
1.1.4 Hypothalamus 19
1.1.5 Human Mirror System 20
1.2 Related Models 25
1.2.1 Models of Grasping and the Mirror System 25
1.2.2 Action Selection Models 31
1.2.3 Synthetic Brain Imaging Approaches 32
Chapter 2 - Mirror System Model of Action Recognition 44
2.1 An Overview of the MNS2 Model 44
2.2 Methods 51
2.2.1 Reach and Grasp 51
2.2.2 Visual Analysis of Hand State 52
2.2.3 Action Recognition 53
2.2.4 Training 60
2.3 Simulation Results 62
2.3.1 Recurrent neural network performance 62
2.3.2 Lesioned network performance 65
2.3.3 Audio-Visual Mirror Neurons 70
2.3.4 Hidden Grasp Simulations 73
2.4 Discussion 83
2.4.1 Audio-Visual Mirror Neurons 85
2.4.2 Inferring Hidden Actions 86
Chapter 3 - Learning to Grasp and Extract Affordances 87
3.1 Methods 90
3.1.1 Integrated Learning of Grasping and Affordances 90
3.1.2 Reinforcement 91
3.1.3 Primary Motor Module – Reach and Grasp Generation 92
3.1.4 Parietal Module – Object Feature / Affordance Extraction 96
vii
3.1.5 Premotor Module – Grasp Planning 106
3.1.6 Training 118
3.2 Results 121
3.2.1 AIP Representation 121
3.2.2 Grasp Training 125
3.3 Discussion 131
3.3.1 Predictions 133
3.3.2 Conclusion 136
Chapter 4 - Mirror Systems in Learning Sequential Action Production 137
4.1 Methods 142
4.1.1 System Overview 142
4.1.2 Simulation Protocol for Alstermark’s Cat 146
4.2 Results 149
4.2.1 Motor Program Reorganization in a Novel Environment 149
4.2.2 Motor Program Reorganization After a Lesion 152
4.2.3 Testing the Efficacy of the Mirror System 154
4.3 Discussion 157
4.3.1 Predictions 158
Chapter 5 - Synthetic Brain Imaging 159
5.1 Neural Basis of the BOLD Signal 160
5.2 Methods 162
5.2.1 A New Model of Synthetic Brain Imaging 162
5.2.2 Neural Model 164
5.2.3 Winner-Take-All Circuit 167
5.3 Results 169
5.3.1 Synthetic PET on a Model of Praxis 169
5.3.2 Synthetic fMRI on a Model of Motion Direction Discrimination 172
5.4 Discussion 180
Chapter 6 - Future Work 183
6.1 Inferring Hidden Actions 183
6.2 Inferring Intentions 184
6.3 Integration of MNS2 and ILGA 185
6.4 Skilled Grasping 186
6.5 Context-Dependent Grasps 189
6.6 Integration of MNS2, ILGA, and ACQ 191
6.7 Extensions to Synthetic Brain Imaging 192
6.8 Synthetic Brain Imaging in Analyzing Real Imaging Data 194
References 196
Appendix 224
viii
List of Tables
Table 1-1 A meta-analysis of synthetic brain imaging studies in terms of the
mechanisms included: neural model (LI=leaky integrator, SU=sigmoidal
units, NaLI=sodium concentration leaky integrator, GF=gamma function,
MFA=mean field approximation, DI=decaying impulse, LIF=leaky integrate-
and fire, PSP=postsynaptic potential, CM=compartmental model,
NM=neural mass model, IZ=Izhikevich neuron), synaptic model
(CBK=conductance-based kinetic model), neurovascular coupling signal
(WI=sum of absolute value of connection weight times input, Na/K=ATP
consumption by Na/K pump, FA=field activity, SI=sum of absolute value of
synaptic currents, SSC=sum of synaptic conductances, NP=number PSPs,
TCC=transmembrane capacitive currents, NAS=number of active synapses),
rCBF generation, BOLD signal generation, temporal smoothing, adjacent
voxel crosstalk, neural noise, and network connection variability (see Table
A-1 for the full meta-analysis). The present study is analyzed in the last row. 42
Table 4-1 Set of relevant actions with preconditions and effects. 147
Table A-1 A meta-analysis of synthetic brain imaging studies in terms of the
mechanisms included: neural model (LI=leaky integrator, SU=sigmoidal
units, NaLI=sodium concentration leaky integrator, GF=gamma function,
MFA=mean field approximation, DI=decaying impulse, LIF=leaky integrate-
and fire, PSP=postsynaptic potential, CM=compartmental model,
NM=neural mass model, IZ=Izhikevich, ), synaptic model
(CBK=conductance-based kinetic model), neurovascular coupling signal
(WI=sum of absolute value of connection weight times input, Na/K=ATP
consumption by Na/K pump, FA=field activity, SI=sum of absolute value of
synaptic currents, SC=sum of synaptic conductances, NP=number PSPs,
TCC=transmembrane capacitive currents, NAS=number of active synapses),
rCBF generation, BOLD signal generation, temporal smoothing, oxygen
metabolism (0
2
), glucose metabolixm, adjacent voxel crosstalk, neural noise,
scanner noise, network connection variability, scan repetition time (TR), and
scanner field strength (B
0
). The present study is analyzed in the last row. 224
Table A-2 External input layer → Hidden layer weights 229
Table A-3 Recurrent input layer → Hidden layer weights 230
Table A-4 Hidden layer → Output layer weights 230
Table A-5 Recurrent output layer → Recurrent input layer weights 232
Table A-6 Audio network external output layer → Main layer external output
layer weights 232
ix
Table A-7 Parameters of the arm/hand model 236
Table A-8 Parameters of the ILGA model 239
Table A-9 DNF parameters 241
Table A-10 Target angles for the finger and thumb joints for the preshape and
enclose phase of side, precision, tripod, and power grasps. 244
Table A-11 Set of relevant actions with preconditions and effects. 249
Table A-12 Imitation model summary 254
Table A-13 Imitation model populations 254
Table A-14 Imitation model connectivity 255
Table A-15 Imitation model neural implementation 256
Table A-16 Imitation model synapse implementation 256
Table A-17 Imitation model input 256
Table A-18 Imitation model parameters 257
Table A-19 RDMDD model summary 257
Table A-20 RDMDD model populations 258
Table A-21 RDMDD model connectivity 258
Table A-22 RDMDD model neural implementation 259
Table A-23 RDMDD model synapse implementation 259
Table A-24 RDMDD model input 259
Table A-25 RDMDD model parameters 260
Table A-26 Neural model parameters 261
x
List of Figures
Figure 1-1 Left: Dorsal view (from Tsutsui et al., 2005). Right: Lateral view
(from Sakata et al., 1997). 6
Figure 1-2 The technique used in many early synthetic fMRI studies. The total
absolute synaptic activity of all neurons in the network is convolved with a
hemodynamic response function to produce a simulated BOLD response. 38
Figure 1-3 Friston et al.'s (2000) extended balloon model with Zheng et al.'s
(2002) capillary model extension (modified from Zheng et al., 2002). 39
Figure 2-1 The components of the hand state (a(t), o1(t), o2(t), o3(t), o4(t), d(t),
v(t)). (from Oztop and Arbib, 2002) 45
Figure 2-2 The main recurrent network at the heart of the MNS2 model. The
visual and recurrent input layers are fully connected to the hidden layer,
which is fully connected to the external and recurrent output layers. The
recurrent output layer is fully connected to the recurrent input layer and the
output of the audio recurrent network is fully connected to the external
output layer. The external output layer corresponds to F5 mirror neurons
while the target pattern is generated by F5 canonical neuron activity. 46
Figure 2-3 System diagram for the MNS2 model (updating the MNS model of
Oztop and Arbib, 2002). The main recurrent network, shown more explicitly
in Figure 2-2, models the areas 7b and F5 mirror, shown here in the blue
rectangle, by the activity of its hidden and external output layers,
respectively. The audio recurrent network models the nonprimary auditory
cortex. The grey rectangles enclose portions of the model unique to MNS2.
The orange rectangles the schemas relating to a portion of the FARS model
(Fagg and Arbib, 1998) of visually directed grasping of an object. This
includes F5 canonical neurons, which in the MNS2 model provide the F5
mirror neurons with a target pattern of activity. 51
Figure 2-4 Activation of the model's external output units to partially hidden
grasps with and without dynamic remapping and using the upper arm, elbow,
or forearm position to perform the remapping (plus: precision grasp output
unit, cross: side grasp output unit, asterisk: power grasp output unit). In each
set of plots, the plot to the right of the grasp figure shows the network
activity during a fully visible grasp, while the plots below each grasp figure
show the network activity using (columns from left to right) the upper arm,
elbow, or forearm for dynamic remapping and without dynamic remapping
for hidden grasps where the hand disappears behind the screen after (rows
from top to bottom) 5, 6, 7, 8, 9, or 10 time steps. Using the position of the
upper arm, elbow, or forearm to dynamically remap the working memory
representation of the hand location allows extrapolation of the hand state
xi
trajectory. The use of the forearm for dynamic remapping provides a slightly
better approximation than using the upper arm or elbow to the output activity
pattern during a visible grasp. 60
Figure 2-5 Top Left: Example of a generated precision grasp. The squares in all
grasp figures denote the wrist trajectory. Top Right: External output unit
activation for this grasp (plus: precision grasp output, cross: side grasp
output, asterisk: power grasp output). The black vertical lines indicate the
time step in which the hand first contacts the object. The grasp is correctly
recognized as a precision grasp well before the hand contacts the object.
Middle Left: Example of a generated side grasp. Middle Right: External
output unit activation for this grasp (plus: precision grasp output, cross: side
grasp output, asterisk: power grasp output). The grasp is correctly recognized
as a side grasp. Bottom Left: Example of a generated power grasp. Bottom
Right: External output unit activation for this grasp (plus: precision grasp
output, cross: side grasp output, asterisk: power grasp output). The power
grasp is correctly recognized well before the hand contacts the object. 63
Figure 2-6 Network activity of (from left to right) the network's recurrent input
(plus: recurrent input unit 0, cross: recurrent input unit 1, asterisk: recurrent
input unit 2, square: recurrent input unit 3, filled square: recurrent input unit
4), external visual input (plus: aper1, cross: ang1, asterisk: ang2, square:
speed, filled square: dist, circle: axisdisp1, filled circle: axisdisp2), hidden
(plus: hidden unit 0, cross: hidden unit 1, asterisk: hidden unit 2, square:
hidden unit 3, filled square: hidden unit 4, circle: hidden unit 5, filled circle:
hidden unit 6, triangle: hidden unit 7, filled triangle: hidden unit 8, upside-
down triangle: hidden unit 9, filled upside-down triangle: hidden unit 10,
diamond: hidden unit 11, filled diamond: hidden unit 12, pentagon: hidden
unit 13, filled pentagon: hidden unit 14), external output (plus: precision
grasp unit, cross: side grasp unit, asterisk: power grasp unit), and recurrent
output layers (plus: recurrent output unit 0, cross: recurrent output unit 1,
asterisk: recurrent output unit 2, square: recurrent output unit 3, filled square:
recurrent output unit 4) during the precision (top row), side (middle row),
and power (bottom row) grasps shown in Figure 2-5. 64
Figure 2-7 Left: Generated grasps. Middle: Recurrent network output unit
activation for each grasp (plus: precision grasp unit, cross: side grasp unit,
asterisk: power grasp unit). The network successfully recognizes all three
grasps. Right: Output unit activation of network with lesioned recurrent
connections for each grasp (plus: precision grasp unit, cross: side grasp unit,
asterisk: power grasp unit). No output unit reaches a significant level of
activity for any grasp. 66
Figure 2-8 Network activity of (columns from left to right) the network's recurrent
input (plus: recurrent input unit 0, cross: recurrent input unit 1, asterisk:
recurrent input unit 2, square: recurrent input unit 3, filled square: recurrent
input unit 4), external visual input (plus: aper1, cross: ang1, asterisk: ang2,
xii
square: speed, filled square: dist, circle: axisdisp1, filled circle: axisdisp2),
hidden (plus: hidden unit 0, cross: hidden unit 1, asterisk: hidden unit 2,
square: hidden unit 3, filled square: hidden unit 4, circle: hidden unit 5, filled
circle: hidden unit 6, triangle: hidden unit 7, filled triangle: hidden unit 8,
upside-down triangle: hidden unit 9, filled upside-down triangle: hidden unit
10, diamond: hidden unit 11, filled diamond: hidden unit 12, pentagon:
hidden unit 13, filled pentagon: hidden unit 14), external output (plus:
precision grasp unit, cross: side grasp unit, asterisk: power grasp unit), and
recurrent output layers (plus: recurrent output unit 0, cross: recurrent output
unit 1, asterisk: recurrent output unit 2, square: recurrent output unit 3, filled
square: recurrent output unit 4) during the (rows from top to bottom)
precision, side, and power grasps shown in Figure 2-7 with and without
recurrent connections. 68
Figure 2-9 Activation of three discriminative hidden units for each type of grasp
in the intact network (left column) and the same network with lesioned
recurrent connections (right column) (dark line: power grasp, light gray line:
precision grasp, medium gray line: side grasp). 69
Figure 2-10 Activation of three indiscriminative hidden units for each type of
grasp in the intact network (left column) and the same network with lesioned
recurrent connections (right column) (dark line: power grasp, light gray line:
precision grasp, medium gray line: side grasp). 70
Figure 2-11 Left: Activation of the external output layer of the model's audio
recurrent network when presented with the output of the Lyon Passive Ear
model for (from top to bottom) a slapping sound, no sound, a slapping sound,
and a wood cracking sound (plus: slap output unit, cross: wood output unit,
asterisk: paper output unit). Middle: Activation of the external output layer
of the model's main recurrent network when presented with a power grasp
sequence containing (from top to bottom) visual and congruent audio, visual
only, audio only, and visual and incongruent audio information (plus:
precision grasp output unit, cross: side grasp output unit, asterisk: power
grasp output unit). The black vertical lines indicate the time step at which the
hand made contact with the object and the sound information is input into the
auditory network. Waveforms of the slapping and wood cracking sounds are
shown at the bottom of the Auditory and Auditory + Visual - Incongruent
Sound displays, respectively. Right: Activation of an audiovisual mirror
neuron responding to (from top to bottom) the visual and audio components,
visual component alone, and audio component alone of a peanut-breaking
action. A waveform of the peanut breaking sound is shown at the bottom of
the audio alone condition display. (reproduced from Kohler et al., 2002
copyright 2002, AAAS) 72
Figure 2-12 Left: Mirror neuron activation for (from top to bottom) visible
pantomimed grasp, visible grasp, partially hidden grasp, and partially hidden
pantomimed grasp conditions. All grasps were power grasps (reproduced
xiii
from Umilta et al., 2001 with permission from Elsevier) Right: Activation of
the model's external output units under the same conditions (plus: precision
grasp output unit, cross: side grasp output unit, asterisk: power grasp output
unit). The black vertical lines indicate the time step at which the hand was no
longer visible to the network. The only output unit showing a significant
level of activity in any plot is the one encoding power grasps. 74
Figure 2-13 Activation of the model's external output units to partially hidden
grasps with and without dynamic remapping (plus: precision grasp output
unit, cross: side grasp output unit, asterisk: power grasp output unit). The
plot directly to the right of the grasp diagram shows the network's response
to a fully visible grasp. Each response in the remaining plots is to the same
generated grasp with the time step of the hand's disappearance varying from
5 to 15. In each pair of columns, the left column shows the network's
response to a hidden grasp with dynamic remapping, while the right columns
shows the network's response to the same hidden grasp without dynamic
remapping. The black vertical lines indicate the time step at which the hand
was no longer visible to the network. 76
Figure 2-14 Network activity of (columns from left to right) the network's
recurrent input (plus: recurrent input unit 0, cross: recurrent input unit 1,
asterisk: recurrent input unit 2, square: recurrent input unit 3, filled square:
recurrent input unit 4), external visual input (plus: aper1, cross: ang1,
asterisk: ang2, square: speed, filled square: dist, circle: axisdisp1, filled
circle: axisdisp2), hidden (plus: hidden unit 0, cross: hidden unit 1, asterisk:
hidden unit 2, square: hidden unit 3, filled square: hidden unit 4, circle:
hidden unit 5, filled circle: hidden unit 6, triangle: hidden unit 7, filled
triangle: hidden unit 8, upside-down triangle: hidden unit 9, filled upside-
down triangle: hidden unit 10, diamond: hidden unit 11, filled diamond:
hidden unit 12, pentagon: hidden unit 13, filled pentagon: hidden unit 14),
external output (plus: precision grasp unit, cross: side grasp unit, asterisk:
power grasp unit), and recurrent output layers (plus: recurrent output unit 0,
cross: recurrent output unit 1, asterisk: recurrent output unit 2, square:
recurrent output unit 3, filled square: recurrent output unit 4) for (rows from
top to bottom) the fully visible grasp, the time step 5 hidden grasp with
dynamic remapping, and the time step 5 hidden grasp without dynamic
remapping shown in Figure 2-13. 78
Figure 2-15 The hidden units with the greatest similarity in their activation
patterns during the fully visible grasp and partially hidden grasp with
dynamic remapping conditions and the greatest difference between their
activation patterns during the partially hidden grasp with and without
dynamic remapping conditions, are the anti-grasp indiscriminative hidden
units 3 (top) and 10 (bottom). Each line shows the unit's activity during
different conditions (dark line: visible grasp, light gray line: hidden grasp
xiv
with dynamic remapping, medium gray line: hidden grasp without dynamic
remapping). 79
Figure 2-16 Activation of the model's external output units to partially hidden
reaches that overshoot the object with and without dynamic remapping (plus:
precision grasp output unit, cross: side grasp output unit, asterisk: power
grasp output unit). The plot to the right of the grasp diagram shows the
network's response to a fully visible grasp. Each response in the remaining
plots is to the same generated grasp with the time step of the hand's
disappearance varying from 10 to 19. In each pair of columns, the left
column shows the network's response to a hidden grasp with dynamic
remapping, while the right columns shows the network's response to the
same hidden grasp without dynamic remapping. The black vertical lines
indicate the time step at which the hand was no longer visible to the network. 80
Figure 2-17 Network activity of (columns from left to right) the network's
recurrent input (plus: recurrent input unit 0, cross: recurrent input unit 1,
asterisk: recurrent input unit 2, square: recurrent input unit 3, filled square:
recurrent input unit 4), external visual input (plus: aper1, cross: ang1,
asterisk: ang2, square: speed, filled square: dist, circle: axisdisp1, filled
circle: axisdisp2), hidden (plus: hidden unit 0, cross: hidden unit 1, asterisk:
hidden unit 2, square: hidden unit 3, filled square: hidden unit 4, circle:
hidden unit 5, filled circle: hidden unit 6, triangle: hidden unit 7, filled
triangle: hidden unit 8, upside-down triangle: hidden unit 9, filled upside-
down triangle: hidden unit 10, diamond: hidden unit 11, filled diamond:
hidden unit 12, pentagon: hidden unit 13, filled pentagon: hidden unit 14),
external output (plus: precision grasp unit, cross: side grasp unit, asterisk:
power grasp unit), and recurrent output layers (plus: recurrent output unit 0,
cross: recurrent output unit 1, asterisk: recurrent output unit 2, square:
recurrent output unit 3, filled square: recurrent output unit 4) for (rows from
top to bottom) the fully visible regular grasp, the fully visible overshot grasp,
the time step 15 hidden overshot grasp with dynamic remapping, and the
time step 15 hidden overshot grasp without dynamic remapping shown in
Figure 2-16. 81
Figure 2-18 Two anti-grasp indiscriminative hidden units - hidden units 3 (top)
and 10 (bottom), show a very similar pattern of activity during the visible
regular grasp and hidden overshot grasp without dynamic remapping
conditions and during the visible overshot grasp and hidden overshot grasp
with dynamic remapping conditions (dark gray line: visible regular grasp,
light gray line: visible overshot grasp, black line: hidden overshot grasp with
dynamic remapping, medium gray line: hidden overshot grasp without
dynamic remapping). 83
Figure 3-1 An overview of the ILGA model. Connections modifiable by
reinforcement learning are shown in red. The parietal regions LIP and V6a
provide the premotor region F2 with object position information to plan the
xv
reach. V6a and the cIPS populations project to AIP, which projects to the
signal-related populations of the other premotor regions. Each premotor
region selects a value for the parameter it encodes and projects to the primary
motor region F1 which controls the movement. Grasp feedback is returned to
somatosensory area S1 which provides the reinforcement signal to the model
and somatosensory feedback to F1. Each execution-related premotor
population additionally receives tonic inhibitory inhibit (not shown) that is
released when a go signal is detected. 89
Figure 3-2 The wrist, reach, and grasp motor controllers. Each uses population
decoders to decode reach and grasp parameter values from premotor inputs
and set joint angle targets for PD controllers which move the limbs by
applying torque to the joints. The reach motor controller combines the
shoulder-centered object position, object-centered reach offset, and current
wrist position to compute a wrist error vector. The error vector is used to set
goal values for dynamic motor primitives, which generate a reach trajectory
for the wrist. An inverse arm kinematics module computes target joint angles
for each target wrist position. The grasp motor controller contains dynamic
motor primitives for the preshape and enclose phases that are triggered by
reach and tactile events. These dynamic motor primitives generate
normalized trajectories for each virtual finger that are converted into target
joint angles by VF→real finger mapping modules. 94
Figure 3-3 The object primitive variables represented in each parietal region for
the handle (top row) and head (bottom row) of a hammer. The area V6A
represents the shoulder-centered direction of the center of the primitive, φs
and θs, LIP represents the shoulder-centered distance, ρs, and cIPS
represents the object primitive’s orientation, ox, oy, oz, size, sx, sy, sz, and
orientation of surface normal vectors (n1, n2, and n3 in this case). 97
Figure 3-4 The reach and grasp parameters encoded by the premotor cortex. The
blue circle denotes the planned reach offset point. Area F2 encodes the
shoulder-centered object position, φ
s
, θ
s
, ρ
s
, F7 encodes the object-centered
reach offset, φ
o
, θ
o
, ρ
o
, F5 encodes the VF combination and maximum
aperture used for the grasp, and F2/F5 encodes the wrist orientation wr
x
, wr
y
,
wr
z
. 108
Figure 3-5 AIP activation for different objects in various locations and
orientations. Each panel shows a third-person view (left) and AIP activation
(right). 122
Figure 3-6 Top: Object specificity statistics for the AIP population during training
(solid=maximum PI, dashed=mean PI, dotted=minimum PI). Bottom:
Numbers of highly (solid), moderately (dashed), and non- (dotted) object
specific neurons throughout training. 124
xvi
Figure 3-7 Stable grasps generated by the model of different objects with various
positions and orientations. Each panel shows a third-person view on the left
and the model’s first-person view on the right. 125
Figure 3-8 A series of frames showing the progression of a precision pinch of a
flat plate generated by the model. At 2s after object presentation hand
preshaping has already begun. The enclose phase is triggered at 2.3s and the
object is first contacted at 2.4s. 127
Figure 3-9 Firing rates of neurons in the cIPS AOS-cylinder, AIP, F7-direction,
and F5 populations during grasps to the same object in the same location, but
with different orientations. Activity in other populations was not
significantly different during each grasp (due to using the same object and
locations as the target) and is therefore not shown. 129
Figure 3-10 Firing rates of neurons in AIP (left column) and F5 (middle column)
while grasping a cylinder with a precision pinch (top row), plate with a
precision pinch (second row), plate with a tripod grasp (third row), and plate
with a power grasp (bottom row). 131
Figure 4-1 The experimental setup used in Alstermark’s experiments. A
horizontal tube containing food is facing the cat and the cat must reach into
the tube with its paw to extract the food. (A-E): A cat able to grasp the food
with its paw. (F-J): A cat unable to grasp the food with its paw eventually
learns to rake it from the tube and grasps it with its mouth (reproduced from
Alstermark et al., 1981 with permission of the author). 141
Figure 4-2 A simplified version of the ACQ system. 143
Figure 4-3 A) The original motor program for eating a piece of food initially in a
horizontal tube. B) The motor program that describes the behavior that is
learned after the Grasp-Paw motor schema is lesioned. 150
Figure 4-4 A) The mean desirability connection weights for each action after
training. The error bars show the standard deviation. B) The mean
desirability of each motor schema after lesioning the Grasp-Paw motor
schema and retraining the network. 152
Figure 4-5 Mean number of trials until the first successful trial after lesion of the
Grasp-Paw motor schema for each number of irrelevant actions tested (0-
100). Solid: The model with reinforcement based on successful intended and
apparent actions (mirror system). Dashed: The alternate model version with
reinforcement based solely on successful intended actions (no mirror
system). The error bars denote the standard error. 155
Figure 4-6 Mean number of trials until recovery (the first 4 out of 5 intentionally
successful trials) after lesion of the Grasp-Paw motor schema for each
xvii
number of irrelevant actions tested (0-100). Solid: The model with
reinforcement based on successful intended and apparent actions (mirror
system). Dashed: The alternate model version with reinforcement based
solely on successful intended actions (no mirror system). The error bars
denote the standard error. 156
Figure 5-1 Left: The membrane potential (mV) of the pyramidal neurons and
inhibitory interneurons in a one-dimensional winner-take-all network for 5s
after two conflicting inputs are applied. While excitatory and inhibitory
neurons are shown in separate layers to lay bare the mathematical structure
of the model, the two types of neuron co-occur in each voxel. Right: The
corresponding firing rate (Hz) of the pyramidal neurons of this network. 168
Figure 5-2 The firing rates (Hz) of the pyramidal cells in the input praxicon (top
row) and output praxicon (bottom row) networks after application of low
intensity / low contrast (left column) and high intensity / high contrast (right
column) inputs. The hemodynamic response of each network is shown in the
middle column during the familiar (blue) and novel (red) conditions. 171
Figure 5-3 The firing rate of the pyramidal populations of (rows, top to bottom):
MT, LIP, and FEF during the (columns, left to right): 3.2%, 12.8%, and
51.2% stimulus coherence level conditions when the net direction of dot
motion is to the right. The solid white lines denote stimulus onset and the
dashed white lines denote the time of response. 175
Figure 5-4 The mean response time (left) and accuracy (right) as a function of
motion strength. The error bars denote standard error. The solid curves show
the fitted psychometric functions (A’=.6, k=18, t
R
=.29, ln(L)=3.2). 176
Figure 5-5 The response time (top) and accuracy behavioral measures during the
control (solid), MT stimulation (dashed) and LIP stimulation (dotted)
simulations for each stimulus strength tested. The curves show the fitted
chronometric and psychometric functions. The error bars denote standard
error. 178
Figure 5-6 The percent change in fMRI response amplitude across multiple
coherence levels and linear functions fit to the data for MT (solid line), LIP
(dashed line), and FEF (dotted line). The error bars denote standard error. 180
Figure 6-1 AIP activation (top row) when the model is presented with a cylinder
(left column), rectangular prism (middle column), and a cylinder and
rectangular prism combined in a hammer (right column). 191
Figure A-1 Unless specified, each joint has 1 DOF. The simulated arm/hand has a
total of 22 DOFs. 234
xviii
Abbreviations
7a, parietal area 7a; area PG
7b, parietal area 7b; area PF
7ip, parietal area 7ip
7m, mesial part of area 7; area PGm
ACQ, Augmented Competitive Queuing
AIP, anterior intraparietal area; part of area 7
AIT, anterior inferotemporal cortex
AMPA, α-amino-3-hydroxy-5-methyl-4- isoxazolepropionic acid
AOS, axis-orientation selective
ATP, adenosine-5'-triphosphate
BA, Brodmann area
BMU, best-matching unit
BOLD, blood oxygenation level-dependent
BPTT, back propagation through time
Ca, calcium
CBF, cerebral blood flow
CBK, conductance-based kinetic model
CIP, caudal intraparietal area
cIPS, caudal intraparietal sulcus; part of area 7
Cl, chlorine
CM, compartmental model
CMA, cingulate motor area
xix
CQ, competitive queuing
DI, decaying impulse
dlPFC, dorsolateral prefrontal cortex
DMP, dynamic motor primitive
DNF, dynamic neural field
DOF, degree of freedom
EBA, extra-striate body area
ECD, equivalent current dipole
EEG, electroencephalography
F1, frontal area 1, primary motor cortex; M1
F2, frontal area 2, caudal part of dorsal premotor area; PMdc
F2d, F2 dimple region
F2vr, F2 ventrorostral region
F3, frontal area 3, supplementary motor area; SMA
F4, frontal area 4, caudal part of ventral premotor area; PMvc
F5, frontal area 5, rostral part of ventral premotor area; PMvr
F5ab, bank of the arcuate sulcus in area F5
F5c, cortical convexity of the arcuate sulcus in area F5
F6, frontal area 6, pre-supplementary motor area; pre-SMA
F7, frontal area 7, rostral part of dorsal premotor area; PMdr
FA, field activity
FARS, Fagg-Arbib-Rizzolatti-Sakata model
FEF, frontal eye fields
xx
fMRI, functional magnetic resonance imaging
GABA, gamma-aminobutyric acid
GAEM, Grasp Affordance Extraction Model
GF, gamma function
GP, globus pallidus
GPe, external segment of globus pallidus
GPi, internal segment of globus pallidus
H, hyrdrogen
HRF, hemodynamic response function
ILGA, Integrated Learning of Grasps and Affordances model
ILGM, Infant Learning to Grasp Model
IPL, inferior parietal lobule
IPS, intraparietal sulcus
IT, inferotemporal cortex
IZ, Izhikevich neuron
K, potassium
LFP, local field potential
LI, leaky integrator
LIF, leaky integrate-and fire
LIP, lateral intraparietal area, part of area 7
LO, lateral occipital cortex
LOP, lateral occipital parietal area
M1, motor area 1, primary motor cortex
xxi
MDP, Markov decision process
MDS, multi-dimensional scaling
MEG , magnetoencephalography
MFA, mean field approximation
Mg, magnesium
MIP, medial intraparietal area; part of area 5/7
MMRL, multiple model-based reinforcement learning
MNS, mirror neuron system
MRI, magnetic resonance imaging
MST, medial superior temporal area; part of STS
MT, middle temporal area; part of STS
MUA, multi-unit activity
Na, sodium
NaLI, sodium concentration leaky integrator
NAS, number of active synapses
NM, neural mass model
NMDA, N-methyl-D-aspartic acid
NO, nitric oxide
NP, number post-synaptic potentials
NSL, Neural Simulation Language
ODE, Open Dynamics Engine
PCA, principal component analysis
PD, proportional-derivative
xxii
PE, parietal area PE; part of area 5
PEa, parietal area PEa; part of area 5
PEc, parietal area PEc; part of area 5
PET, positron emission tomography
PF, parietal area PF, part of area 7
PFC, prefrontal cortex
PG, parietal area PG, part of area 7
PGm, medial PG
PI, preference index
PIP, posterior intraparietal area
PMdc, caudal region of dorsal premotor cortex
PMdr, rostral region of dorsal premotor cortex
PMvc, caudal region of ventral premotor cortex
PMvr, rostral region of ventral premotor cortex
PO, parieto-occipital cortex
pre-SMA, pre-supplementary motor area
PSC, postsynaptic current
PSP, postsynaptic potential
rCBF, regional cerebral blood flow
RDMDD, random dot motion direction discrimination
RPS, rock-paper-scissors
RT, response time
S1, first somatosensory cortex, SI
xxiii
S2, second somatosensory cortex, SII
SC, superior colliculus
SD, standard deviation
SEF, supplementary eye fields
SEM, standard error of mean
SI, sum of absolute value of synaptic currents
SMA, supplementary motor area
SN, substantia nigra
SNc, substantia nigra pars compacta
SNr, substantia nigra pars reticulate
SOFM, self-organizing feature map
SOS, surface orientation selective
SPL, superior parietal lobule
SSC, sum of synaptic conductances
STN, subthalamic nucleus
STS, superior temporal sulcus
SU, sigmoidal units
TCC, transmembrane capacitive currents
TD, temporal difference
TMS, transcranial magnetic stimulation
TR, repetition time
V1, visual area 1
V3A, visual area 3A
xxiv
V4, visual area 4
V6A, visual area 6A, area 19
VF, virtual finger
VIP, ventral intraparietal area, part of area5/7
VTA, ventral tegmental area
WAV, waveform audio file format
WI, sum of absolute value of connection weight times input
WLA, winner-lose-all
WTA, winner-take-all
xxv
Abstract
Both premotor and parietal cortex of the macaque brain contain mirror neurons each
of which fires vigorously both when the monkey executes a certain limited set of actions
and when the monkey observes some other perform a similar action. Turning to the
human, we must rely on brain imaging rather than single-neuron recording. The goals of
this thesis are to a) develop biologically plausible models of the mirror system and its
interactions with other brain regions in grasp observation and execution, b) suggest a new
role for the mirror system in self-observation and feedback-based learning, and c) present
an extension of synthetic brain imaging that allows computational models to address
monkey and human data
1
Chapter 1 - Introduction
Both premotor area F5 and parietal area PF of the macaque brain contain mirror neurons
each of which fires vigorously both when the monkey executes a certain limited set of
actions and when the monkey observes some other perform a similar action. By contrast,
canonical neurons in F5 fire vigorously when the monkey executes certain actions but not
when it observes the actions of others. Turning to the human, we must rely on brain imaging
rather than single-neuron recording. Imaging data show that the human brain contains mirror
systems in both frontal and parietal lobes, namely regions that show high activation both
when a human performs a manual action and when the human observes a manual action, but
not when the human simply observes an object. It is widely assumed that such mirror regions
contain mirror neurons, based on similarities between the human and macaque brain.
We have developed three models involving the mirror system and related regions that are
constrained by neuroanatomy and neurophysiology and pose challenges to experimentalists
in the form of testable hypotheses. The organization of the thesis is as follows:
In this chapter, the literature on the brain regions involved in and related to the mirror
system and reaching and grasping, as well as related models is reviewed.
Chapter 2 presents the Mirror Neuron System (MNS2) model, which builds upon a
previous model of the mirror system with a more biologically plausible implementation and
the inclusion of other systems whose interaction with mirror neurons explains new
experimental data.
2
Chapter 3 develops the Integrated Learning of Grasps and Affordances (ILGA) model,
which gives an account of how F5 canonical and AIP visual neurons, which provide input to
the MNS2 model, simultaneously gain their properties through development.
Chapter 4 shows how systems like ILGA and MNS2 can be deployed and used in a
system called Augmented Competitive Queuing (ACQ) to chain together sequences of
actions and rapidly reorganize these sequences in the face of disruption.
Chapter 5 presents a new version of synthetic brain imaging, a method of averaging or
otherwise transforming neural network activity to generate simulated brain imaging signals.
This chapter develops two examples that show how computational models such as these can
bridge the gap between monkey neurophysiology and human brain imaging.
Chapter 6 concludes the thesis by discussing possibilities for future research directions.
1.1 Biological Background
Since the distinction was made between the dorsal and ventral visual streams
(Ungerleider and Mishkin, 1982), the dorsal stream has been further subdivided into the
dorsal-medial and dorsal-ventral streams (Rizzolatti and Matelli, 2003). It has been suggested
that the dorsal-medial stream, involving superior parietal and intraparietal regions and the
dorsal premotor cortex, controls reaching while the dorsal-ventral stream, including inferior
parietal and intraparietal regions and the ventral premotor cortex, controls grasping
(Jeannerod et al., 1995; Wise et al., 1997). The main regions involved in this thesis are
therefore the intraparietal sulcus and premotor cortex, but also include the superior temporal
sulcus, inferior parietal lobule, and basal ganglia. The intraparietal sulcus and inferior
parietal lobule are involved in the representation and transformation of sensory inputs in
3
order to plan movements. Various premotor regions use these parietal representations of
action affordances in order to plan behaviors such as reach and grasping. The superior
temporal sulcus is implicated in biological motion processing and provides input to regions
in the inferior parietal lobule. The basal ganglia are thought to control action selection by
disinhibiting cortical representations of action.
1.1.1 Parietal Areas
There is converging evidence (Jeannerod et al., 1995) that affordance extraction is
accomplished by posterior parietal cortex (Sakata et al., 1998), while the inferior parietal
cortex has been found to be involved in action recognition in the macaque (Fogassi et al.,
2005; Rizzolatti et al., 1996a).
1.1.1.a Lateral Intraparietal Area (LIP)
The area LIP shows reliable and robust responses to visual stimulation (Andersen et al.,
1985; Colby and Duhamel, 1996). The major sources of input to LIP come from the
extrastriate visual cortex (Blatt et al., 1990; Bullier et al., 1996; Colby et al., 1988; Felleman
and Van Essen, 1991; Seltzer and Pandya, 1986). It projects to the premotor cortex (Cavada
and Goldman-Rakic, 1989), including the dorsal premotor cortex (Tanne-Gariepy et al.,
2002).
It has been suggested that the spatial dimensions of potential targets such as direction and
distance are processed independently in parallel (Battaglia-Mayer et al., 2003). In support of
this idea, direction and distance reach errors dissociate (Gordon et al., 1994; Soechting and
Flanders, 1989) and distance information decays faster than direction information in working
memory (McIntyre et al., 1998). Stimuli depth is largely indicated by disparity signals and
4
accommodative cues which modulate activity in area LIP (Ferraina et al., 2002; Gnadt and
Beyer, 1998). More directly, it has been shown that LIP neurons encode three-dimensional
distance in an egocentric reference frame (Gnadt and Mays, 1995). Individual cells have
broad response profiles centered on their preferred depth, and the region is thus capable of
providing a population code of egocentric target distance. While neurons in the LIP can be
selective for shape, the region is not as selective as regions in the ventral stream such as the
anterior inferotemporal cortex (AIT, Lehky and Sereno, 2007). It has been implicated in
visual target memory (Gnadt, 1988) and motor intentions (Snyder et al., 1997, 2000) and it
has been suggested that it represents salient spatial locations (Colby and Goldberg, 1999).
1.1.1.b Area V6a
Area V6a contains mostly visual cells (Galletti et al., 1997; Rizzolatti et al., 1998) that
are modulated by somatosensory stimulation (Breveglieri et al., 2002; Fattori et al., 2005).
The visual receptive fields of cells in V6a cover whole visual field and represent each portion
of it multiple times (Galletti et al., 1993; Galletti et al., 1999). So-called real-position cells
are able to encode spatial location of objects in the visual scene in at least head-centered
coordinates, with visual receptive fields that remain anchored despite eye movements
(Galletti et al., 1993). Intermingled with real-position cells are retinotopic cells, whose visual
receptive fields shift with gaze, suggesting that the region is involved in converting
coordinates from retinotopic to head- or bod-centered reference frames (Galletti et al., 1993).
Many neurons in V6a only respond when the arm is directed toward a particular region of
space (Fattori et al., 2005; Galletti et al., 1997). Lesions of the region result in misreaching
with the contralateral arm (Battaglini et al., 2002). It has been thus been suggested that the
5
region is involved in encoding the direction of movement required to bring the arm to
potential reach targets (Galletti et al., 2003; Rizzolatti et al., 1998). Area V6a receives input
from central and peripheral visual field representations in V6 (Shipp et al., 1998) and projects
to the premotor areas F2 (Luppino et al., 2005; Matelli et al., 1998; Shipp et al., 1998) and F7
(Caminiti et al., 1999; Luppino et al., 2005; Marconi et al., 2001; Shipp et al., 1998).
1.1.1.c Caudal Intraparietael Sulcus (cIPS)
The caudal intraparietal sulcus (cIPS) is a region located the caudal part of the lateral
bank and fundus of the intraparietal sulcus (Shikata et al., 1996). It was originally referred to
as the posterior intraparietal area (PIP) by Colby et al., (1988), and probably overlaps the
lateral occipital parietal area (LOP) of Lewis and an Essen, (2000). It is located between area
V3a and the lateral intraparietal sulcus (LIP), being cytoarchitectonically distinct from both
and heavily myelinated (Lewis and Van Essen, 2000).
The cIPS receives input mainly from V3a (Figure 1-1), whose neurons are sensitive to
binocular disparity and have small, retinotopic receptive fields (Sakata et al., 2005), and
projects primarily to the anterior intraparietal sulcus (AIP, Nakamura et al., 2001). The
supragranular layers of cIPS project to the granule layer of AIP (Borra et al., 2007;
Nakamura et al., 2001), and the projections from V3a neurons terminate in the vicinity of the
cIPS neurons that project to AIP (Nakamura et al., 2001).
6
Figure 1-1 Left: Dorsal view (from Tsutsui et al., 2005). Right: Lateral view (from Sakata et
al., 1997).
Neurons in area cIPS have large receptive fields (10-30 degrees in diameter) with no
retinotopic organization (Tsutsui et al., 2005). The region is activated in fMRI experiments
on macaque monkeys by observation of 3D objects with different shape cues (texture, motion
parallax, shading, Sereno et al., 2002) and a random-dot stereogram with static and moving
disparity depth cues (Tsao et al., 2003). Two functional classes of neurons in area cIPS have
been described: surface orientation selective (SOS) neurons that are selective to the
orientation of flat surfaces, and axis orientation selective (AOS) neurons that respond best to
an elongated object whose principal axis is oriented in a particular direction. Both types of
neurons respond best to binocular stimuli (Sakata et al., 1997) and are spatially intermingled
(Nakamura et al., 2001). Muscimol induced inactivation of this region disrupts performance
on a delayed match-to-sample task with oriented surfaces using perspective and disparity
7
cues (Tsutsui et al., 2001; Tsutsui et al., 2005). Both types of cells include some neurons that
are selective for the object’s dimensions (Kusunoki et al., 1993; Sakata et al., 1998).
AOS cells are tuned to the 3D orientation of the longitudinal axis of long and thin stimuli
(Sakata and Taira, 1994; Sakata et al., 1997; Sakata et al., 1999). These cells prefer bars tilted
in the vertical, horizontal, or saggital planes (Sakata et al., 1998; Sakata et al., 1999). Some
are selective for shape (rectangular versus cylindrical), and probably represent surface
curvature (Sakata et al., 2005). Their discharge rate increases monotonically with object
length and their width response curve is monotonically decreasing in the 2-32cm range
(Kusunoki et al., 1993; Sakata et al., 1998). It is thought that these cells integrate orientation
and width disparity cues to represent principal axis orientation (Sakata et al., 1998).
SOS cells are tuned to the surface orientation in depth of flat and broad objects (Sakata et
al., 1997; Sakata et al., 1998; Sakata et al., 1999; Shikata et al., 1996). These cells respond to
a combination of monocular and binocular depth cues (texture and disparity gradient cues) in
representing surface orientation (Sakata et al., 2005). Neurons sensitive to multiple depth
cues are widely distributed and spatially intermingled with those sensitive to only one depth
cue (Tsutsui et al., 2005). The magnitude of the response of these cells decreased and
increased according to the object width and length (Sakata et al., 1998; Shikata et al., 1996).
In an event-related fMRI study, Shikata et al. (2001) found that both cIPS and AIP were
activated during a surface orientation discrimination task, but that cIPS was activated to a
larger degree. In a follow-up study, Shikata et al. (2003) found that the cIPS was active
during surface orientation discrimination, but not during the adjustment of finger positions to
8
match orientation. Sakata et al. (1997) suggest that area cIPS extracts 3D object shape
features and projects to AIP.
1.1.1.d Anterior Intraparietal Area (AIP)
The anterior intraparietal area AIP is located on the lateral bank of the anterior
intraparietal sulcus and contains visually response neurons selective for 3D features of
objects, motor dominant neurons that only respond during grasping, and visuomotor neurons
that are activated by grasping and modulated by sight of the object (Sakata et al., 1998). The
region receives its main input from area cIPS (Nakamura et al., 2001), but also receives input
from V6a (Shipp et al., 1998) and projects most strongly to the premotor region F5 (Borra et
al., 2007).
Affordances are defined by Gibson (1966) as opportunities for action that are directly
perceivable without recourse to higher-level cognitive functions. It has been widely
suggested that area AIP is involved in the extraction of affordances for grasping (Fagg and
Arbib, 1998; Nakamura et al., 2001; Sakata et al., 1998). In monkeys it has been found that
neurons in the anterior intraparietal sulcus (AIP) are responsive to 3D features of objects
relevant for manipulation (Murata et al., 2000) such as shape, size, and orientation, regardless
of their location (Taira et al., 1990). More recently, Gardner et al. (2007) showed that these
neurons are maximally active during the hand preshape during object approach, their activity
peaks upon object contact, and thereafter declines.
A series of imaging experiments with humans have identified a homologous region in the
anterior intraparietal sulcus that also seems to be involved in manual affordance extraction.
Similar to the visually dominant neurons in macaque area AIP, this region is active during
9
visually guided grasping and simple observation of graspable objects without the intent to
grasp them (Binkofski et al., 1998; Chao and Martin, 2000; Faillenot et al., 1997).Virtual
lesions of this area have been shown to disrupt the online control of grasping in the face of
perturbance (Tunik et al., 2005). Interestingly, a role for AIP in online control of grasping in
the monkey was predicted on the basis of cells in the region with visual-motor and motor
dominant properties (Sakata et al., 1995). However the only muscimol inactivation study on
the region to date found only a deficit in hand preshaping (Gallese et al., 1994), likely
because the region was inactivated prior to the start of the trial. Converging evidence from
multiple experimental methods strongly suggests that the human area AIP has a high degree
of homology with the corresponding region in the monkey (Culham and Valyear, 2006).
1.1.1.e Inferior Parietal Lobule (IPL)
Previous studies had shown that many IPL neurons discharge both when the monkey
performs a given motor act and when it observes a similar motor act done by another
individual (“parietal mirror neurons”; Fogassi et al., 2005). Fogassi et al. (2005) recorded
165 neurons from the rostral sector of IPL in two monkeys, all of which were active in
association with hand grasping movements. There were two main conditions: (i) The
monkey, starting from a fixed position, reached and grasped a piece of food located in front
of it, and brought it to the mouth and ate the food. (2) The monkey reached and grasped a
piece of food, located as described above, and placed it into a container. It was rewarded
after correct accomplishment of the task with an even more desirable piece of food, so it
quickly learned not to eat the less desirable piece of food when the container was present. In
initial experiments, the container was located on the table near the initial position of the food
10
to be grasped. In later experiments, the container was placed on the monkey’s shoulder so
that the arm movements would be as similar as possible in bringing the food to the mouth
and bringing the food to the container.
While some neurons discharged with the same strength regardless of the motor act that
followed grasping, the large majority was influenced by the subsequent motor act. For
example, “grasp-to-eat” neurons discharged during grasping only when grasping was
followed by bringing the food to the mouth whereas “grasp-to-place” neurons discharged
very strongly during grasping only when grasping was followed by placing. All neurons that
discharged best during grasping-to-place in the basic condition maintained the same
selectivity when placing was done in the container located near the mouth, in spite of the
different movement kinematics in the two conditions. The main factor that determined the
discharge intensity was, the authors conclude, the goal of the action and not the kinematics.
Further, Fogassi et al. (2005) examined 41 mirror neurons active for grasping, i.e., all
discharged both during grasping observation and grasping execution. These neurons were
then recorded in a visual task in which the experimenter performed, in front of the monkey,
the same actions that the monkey did in the motor task, e.g. grasping to eat and grasping to
place. A crucial fact is that the experimenter performed the grasp to place if and only if the
container was present. Some neurons discharged with the same strength regardless of the
motor act following the observed grasping. The majority of neurons, however, were
differentially activated depending on whether the observed grasping was followed by
bringing to the mouth or by placing. For a typical example, their unit 87 discharged
vigorously when the monkey observed the experimenter grasping a piece of food, provided
11
that he brought subsequently the food to his mouth. In contrast, the neuron’s discharge was
much weaker when, after grasping an object, the experimenter placed it into the container.
Three quarters of the mirror neurons in their sample were influenced by the final goal of the
action, with a majority of these being grasp-to-eat neurons. For all neurons the selectivity
remained the same regardless of whether food or a geometrical object was grasped, although
8 neurons selective for the observation of grasping to eat did discharge more strongly in the
placing condition when the grasped object was food.
1.1.2 Premotor Areas
The premotor cortex has been implicated in the control of both proximal and distal
manual actions (canonical neurons, Rizzolatti et al., 1988) and in action recognition (mirror
neurons, Rizzolatti et al., 1996a).
1.1.2.a F2
Within the premotor cortex, the caudal portion F2 most likely codes reach movements in
an arm-centered reference frame (Caminiti et al., 1991; Cisek and Kalaska, 2002; Rizzolatti
et al., 1998). The region was first defined by Matelli et al. (1985) and appears to correspond
to the dorsal caudal premotor area (PMdc) defined by Wise et al. (1997). Area F2 was later
subdivided in to the F2 dimple (F2d) and ventrorostral (F2vr) subregions (Matelli et al.,
1998). Area F2 is located just rostral to the leg and arm representation in area F1 and extends
rostrally 2-3mm in front of the genu of the arcuate sulcus and laterally to the spur of the
arcuate sulcus (Fogassi et al., 1999). It contains an arm field lateral to the superior precentral
dimple (Dum and Strick, 1991; Godschalk et al., 1995; He et al., 1993; Kurata, 1989). Visual
inputs to area F2 come mainly from the superior parietal lobe (Caminiti et al., 1996; Johnson
12
et al., 1993). The subregion F2vr receives projections from area V6A (part of the parietal-
occipital area, PO, Marconi et al., 2001; Matelli et al., 1998; Shipp et al., 1998; Shipp and
Zeki, 1995) and the lateral intraparietal area LIP (Tanne-Gariepy et al., 2002). Weak input
comes from the dorsolateral prefrontal cortex (dlPFC, Luppino et al., 2003; Wang et al.,
2002). Thalamic inputs to F2 come mainly from the VPLo/VLc complex which is the target
of the basal ganglia’s “motor circuit” (Matelli and Luppino, 1996). The main output of F2
projects to F1 (Dum and Strick, 2005), but it also projects weakly back to V6a (Caminiti et
al., 1999).
The region contains two broad categories of cells: movement-related cells that discharge
on movement onset and signal- and set-related cells that show anticipatory activity prior to
the start of movement (Cisek and Kalaska, 2002; Kurata, 1994; Wise et al., 1997). The region
contains a rostro-caudal gradient of cell types with signal-related cells found predominantly
in F2vr and execution-related cells located in F2d, the caudal portion adjacent to F1 (Johnson
et al., 1996; Tanne et al., 1995). Signal-related cells are 43% of F2 neurons and respond to
the visual target for reaching (Weinrich and Wise, 1982). Execution-related cells have
changes in activity that are synchronized with the onset of movement (Weinrich and Wise,
1982). Some execution-related cells are only active after the Go signal and these are more
common caudally (Crammond and Kalaska, 2000). This categorization of cells seems to
correspond to a similar modality-based classification used by Fogassi et al. (1999) and Raos
et al. (2004) which describes cells as purely motor, visually modulated, or visuomotor.
Purely motor cells are not affected by object presentation or visual feedback of the hand,
visually modulated cells discharge differentially when reaching in the light vs. dark, and
13
visuomotor cells discharge during object fixation without movement. Most of visually
modulated or visuomotor cells are in F2vr (Fogassi et al., 1999), and therefore likely
correspond to the signal-related cells described by Cisek & Kalaska (2002).
Many of the cells in F2 are directionally tuned and their population activity appears to
encode a vector representing the direction of arm movement and not the position of the end
target (Caminiti et al., 1991; Weinrich and Wise, 1982). Most cells are sensitive to amplitude
and direction, with very few cells sensitive to only amplitude (Fu et al., 1993; Messier and
Kalaska, 2000). Most cells are additionally sensitive to speed (Churchland et al., 2006).
However, muscimol inactivation causes increases in directional errors when conditional cues
are presented, but amplitude and velocity were unchanged (Kurata, 1994).
In addition to reach target selection, the dorsal premotor cortex is implicated in wrist
movements (Kurata, 1993; Riehle and Requin, 1989). Neurons have been described in area
F2 that become active in relation to specific orientations of visual stimuli and to
corresponding hand-wrist movements (Raos et al., 2004). That same paper showed that 66%
of grasp neurons in F2 were highly selective for grasp type and that 72% were highly
selective for wrist orientation.
1.1.2.b F5
The ventral premotor area F5 lies in the arcuate sulcus. It is subdivided into F5ab, the
bank of the arcuate sulcus which predominantly contains canonical neurons, and F5c, the
cortical convexity of the arcuate sulcus which predominantly contains mirror neurons
(Matelli et al., 1985). F5ab neurons where found that responded to the sight of graspable
objects and during grasp performance. Many neurons of these neurons fire in association
14
with specific types of manual action, such as precision grip, finger prehension, and whole
hand prehension (Rizzolatti et al., 1988) as well as tearing and holding. Some neurons in F5
discharge only during the last part of grasping; others start to fire during the phase in which
the hand opens and continue to discharge during the phase when the hand closes; finally a
few discharge prevalently in the phase in which the hand opens. Grasping appears, therefore,
to be coded by the joint activity of populations of neurons, each controlling different phases
of the motor act. Raos et al. (2006) found that “grip posture rather than object shape”
determined F5 activity, and that F5 neurons selective for both grip type and wrist orientation
maintained this selectivity when grasping in the dark. Reversible inactivation of the bank of
the arcuate sulcus disrupts hand preshape before grasping, but not reaching (Fogassi et al.,
2001). Simultaneous recording from F5 and F1 showed that F5 neurons were selective for
grasp type and phase, while an F1 neuron might be active for different phases of different
grasps (Umilta et al., 2007). This suggests that F5 neurons encode a high-level representation
of the grasp motor schema while F1 neurons (or, at least, some of them) encode the
component movements or components of a population code for muscle activity of each grasp
phase.
The premotor cortex could influence motor activity through projections to the primary
motor cortex (M1) or by directly projecting to the spinal cord. In support of the first scheme,
Shimazu et al. (2004) found no corticospinal output given stimulation of F5 alone. However,
they find that when F5 was stimulated before stimulating M1, the corticospinal output
normally seen from stimulating M1 was greatly enhanced. Cattaneo et al. (2005) used TMS
to show the increase in the excitability in cortical inputs to the motor cortex before grasping
15
movements is specific to those motor cortex neurons that innervate muscles involved in
grasping and is object-specific. F5 thus modulates grasp-related outputs from M1 rather than
triggering activity in M1.
Work on PMv outside of Rizzolatti’s group has looked at its role in transformation from
visual to motor coordinates using experimental paradigms that dissociate the two. Kakei et al.
(2001, 2003) recorded from F5 and F4 during a wrist movement task with the wrist starting
in different positions, allowing them to dissociate muscle activity, direction of joint
movement (intrinsic coordinates), and the direction of the hand movement in space (extrinsic
coordinates). They three types of neurons: those that coded the direction of movement in
space (extrinsic-like), those that encoded the direction of movement in space but whose
magnitude of response depended on the forearm posture (gain modulated extrinsic-like), and
those whose activity covaried with muscle activity (muscle-like). Most directionally-tuned
PMv neurons were extrinsic-like (81%) or gain modulated extrinsic-like (12%), while M1
contained more equal numbers of each type. Based on these findings, they propose a model
where projections from PMv to M1 perform the transformation from extrinsic to intrinsic
coordinate frames. Kurata & Hoshi (2002) recorded from PMv (F5 and F4) and M1 during a
visually guided reaching task after the animals had been trained with prisms that shifted the
visual field left or right. They found that PMv mainly contained neurons whose activity
depended on the target location in the visual field with or without the prisms, and M1
contained these neurons as well as those sensitive to target location in motor coordinates only
and showed different activity for both visual and motor coordinates. This is consistent with
the distribution of extrinsic-, gain modulated extrinsic-, and muscle-like neurons in PMv and
16
M1 described by Kakei et al. (2001, 2003). In a similar experimental setup involving
manipulation of visual information, Ochiai et al. (2005) recorded from PMv (F4 and F5)
neurons while monkeys captured a target presented on a video display using a video
representation of their own hand movement. The video display either presented the hand
normally or inverted horizontally in order to dissociate its movement from the monkey’s
physical hand movements. PMv activity reflected the motion of the controlled image rather
than the physical motion of the hand, supporting the proposed role of PMv as an early stage
in extrinsic-to-instrinsic coordinate frame transformation. Furthermore, half of the direction-
selective PMv neurons were also selective for which side of the video hand was used to
contact the object regardless of starting position, suggesting that these visuomotor
transformations are body part specific.
The findings of the tool-use experiments performed by Umilta et al. (2008) allow these
lines of research to be reconciled. They trained monkeys to use normal or escargot pliers.
The former translate a power grasp into a precision pinch, while the latter translate hand
opening into a precision pinch (reversing the relationship between the movement of the hand
and that of the end of the tool). The question that they ask is, "When an object is grasped by a
tool instead of the hand, will the cortical motor neurons code the movement of the hand or
the distal goal achieved by the tool?" However, both of these must be encoded somewhere,
so a more appropriate question is "When an object is grasped by a tool instead of the hand,
which cortical motor neurons code the movement of the hand and which code the distal goal
achieved by the tool?” They found that F5 neurons encoded the movement of the end-
effector – the tip of the pliers, while F1 contained similar neurons as well as those that
17
encoded the movement of the hand. The authors interpret these findings as evidence that F5
and F1 contain “goal-selective” neurons. However, this language is misleading, and
considered in light of previous work in PMv this seems consistent with F5 and F1 containing
differing ratios of neurons encoding movements in extrinsic and intrinsic coordinates. The
use of the tool changes this mapping, similar to how it is changed when using the prisms or
viewing a video representation of one’s hand as in Kakei et al. (2001, 2003) and Ochiai et
al.’s (2005) experiments. What is interesting, and not addressed by this study, is how F5
learns the new transformations necessary for successful tool use, how attention shifts to the
tip of the tool as the end-effector, and how contextual cues like tool identity allow F5 to
switch between different transformations. None of the studies to date have been able to
address these questions since they record from F5 after training with tool, prisms, or video
display.
1.1.2.c F7
The rostral portion of the dorsal premotor cortex, area F7 (approximately equal to PMdr,
Wise et al., 1997), can be separated into the dorsorostral supplementary eye field (SEF) and a
lesser-known ventral region. The SEF is known to contain neurons which encode space in an
object-centered reference frame (Olson and Gettner, 1995), but the region is implicated in
control of eye movements. While the properties of ventral F7 are not well-known, it does
contain neurons related to arm movements (Fujii et al., 1996; Fujii et al., 2002), receives the
same thalamic input as the arm region of F6, and receives input from the same region of the
superior temporal sulcus that projects to F2vr (Luppino et al., 2001).
18
1.1.3 Basal Ganglia
The basal ganglia contain neurons that show movement- and pre-movement-related
activity in monkeys performing an elbow flexion-extension task (Alexander and Crutcher,
1990; Crutcher and Alexander, 1990; Jaeger et al., 1993). Within the basal ganglia, the
striatum and subthalamic nucleus (STN) receive topographic projections from pre-SMA and
SMA (Inase et al., 1996; Parthasarathy et al., 1992). Excitation of the STN increases tonic
inhibition of the thalamus (Hamada and DeLong, 1992) and the globulus pallidum internal
segment (GPi) releases it from inhibition (Mink and Thatch, 1991; Mitchell et al., 1987).
The main output nuclei of the basal ganglia are the internal segment of the globus
pallidus (GPi) and the substantia nigra pars reticulate (SNr), which exert a GABAergic
inhibitory influence on the thalamus (Ueki et al., 1977; Uno and Yoshida, 1975). It is
generally accepted that the basal ganglia tonically inhibit unselected actions and release
inhibition on selected actions. It may thus be regarded as a Winner-Lose-All (WLA) network
– output cells encoding the winner have the lowest firing rate, thus disinhibiting celld
downstream to initiate the winning action.
It has been hypothesized that the firing rate of dopamine neurons in the midbrain
represent reward probability and uncertainty (Fiorillo et al., 2003), reward prediction error
(Waelti et al., 2001) and that they modify long-term action selection policy through the
modulation of basal ganglia synapses (Morris et al., 2006). They seem to implement
reinforcement learning using prediction error and transfer responses from primary rewards to
reward-predictors (Schultz, 1998). TD learning utilizes reward prediction error to propagate a
reinforcement signal back to the earliest predictor of reward. The dopamine response to
19
reward prediction error appears to perform the same function as the TD reinforcement error
(Schultz, 1998). Barto (1995) models the basal ganglia as an adaptive actor-critic
architecture, where the critic computes the reward prediction error and on the basis of this
error modifies connection weights in the actor by reinforcement. The critic corresponds to
dopaminergic neurons in the substantia nigra pars compacta (SNc) and ventral tegmental area
(VTA), while the actor represents the striatum and the STN. The dopaminergic nigrostriatal
pathway is well established, and the dopaminergic neurons of both SNc and VTA have been
found to also directly modulate STN neurons (Cragg et al., 2004; Hassani et al., 1997). Smith
& Bolam (1990) suggest that the limbic striatum performs reward prediction in the adaptive
critic.
Dopamine targets in the striatum appear to be the sites of value and policy learning (Suri
and Schultz, 2001). The striatum is separated into a dorsal division connected with motor and
association cortices and a ventral division which receives inputs from limbic cortex. It is
commonly suggested that these divisions represent actor and critic components respectively.
(Joel and Weiner, 2000) suggest that the VTA dopaminergic projections to the ventral
striatum facilitate value function learning, those from the SNc to the dorsal striatum facilitate
policy learning. This is supported by the fact that there is little difference in firing in VTA
and SNc (Schultz et al., 1993).
1.1.4 Hypothalamus
The lateral hypothalamus contains multiple regions that seem to encode homeostatic
variables that define the internal state (Clark et al., 1991; Winn, 1995) and projects to the
SNc, VTA, and ventral striatum (Brog et al., 1993; Fadel and Deutch, 2002; Saper and
20
Loewy, 1982). The accumbens shell of the ventral striatum is reciprocally connected with the
lateral hypothalamus and has been called a “sensory sentinel” or “visceral striatum” (Kelley,
2004; Stratford and Kelley, 1999). Indeed it has been found that food deprivation can
influence the magnitude of dopamine release in the ventral striatum (Ahn and Phillips, 1999;
Wilson et al., 1995) and that sexual satiety is signaled by serotonin from the lateral
hypothalamus to the ventral striatum, which reduces dopamine levels (Lorrain et al., 1999).
Hypothalamic-midline thalamic-striatal projections carry internal state information to
cholinergic interneurons of the dorsal striatum and are thought to modulate dorsal striatal
output neurons (Kelley et al., 2005).
1.1.5 Human Mirror System
Although lacking data on single neurons of the human brain, we do have brain imaging
data (PET and fMRI) showing that the human brain contains mirror regions in both frontal
and parietal lobes, namely regions that show high activation both when a human performs a
manual action and when the human observes a manual action, but not when the human
simply observes an object (Buccino et al., 2001; Grafton et al., 1996; Iacoboni et al., 1999;
Rizzolatti et al., 1996b). It is widely assumed that such mirror regions contain mirror
neurons, based on similarities between the human and macaque brain. But it must be stressed
that humans have capacities denied to monkeys. Mirror regions in the human can be
activated when the human subject imitates an action, or even just imagines it.
While single unit recording studies of the monkey mirror system show action
selectivity (Gallese et al., 1996), studies of the human mirror system have generally failed to
do so due to methodological restrictions. Most human mirror system studies use an imitation
21
paradigm with the logic that mirror areas will be more active during action imitation than
observation and execution (Buccino et al., 2004; Chaminade et al., 2005; Chiavarino et al.,
2007; Decety et al., 2002; Decety et al., 1997; Iacoboni et al., 1999; Koski et al., 2002;
Rumiati et al., 2005; Vogt et al., 2005). It is likely that such a paradigm does capture mirror
system activation, however in general large regions of cortex and many subcortical structures
are identified as having mirror properties (based solely on the criteria of being more active
during imitation than observation or execution). There are other processes such as working
memory, motivation, and response selection that may be involved in all three conditions and
more taxed in the imitation condition, but would nonetheless hardly be called mirror regions.
It is also possible that mirror neurons simply are not more active during imitation than during
observation and execution.
An important difference between the human and monkey mirror systems is that the
monkey mirror system seems to encode action goals, abstracting away from the movement
details. This focus on action goals is evident in the monkey mirror system’s lack of response
to the observation of intransitive actions. In contrast, the human mirror system is able to
encode intransitive actions in a somatotopic manner (Buccino et al., 2001). Although the
human mirror system is thought to encode movement details, the specifics of the
transformation from a visual to a motor representation have not yet been elucidated.
This review examines two recent fMRI studies of the human mirror that attempt to
remedy these deficits. Dinstein et al. (2007) used an fMRI adaptation protocol with a rock-
paper-scissors (RPS) game to attempt to pinpoint movement-specific human mirror regions.
Shmuelof and Zohary (2006) presented video clips of object manipulation performed in
22
different visual hemifields and by different hands in an attempt to tease apart the shift from
visual (hemifield specific) to motor (contralateral limb specific) representations. Their results
are compared with a previous fMRI adaptation study (Shmuelof and Zohary, 2005) and
accord well with the results of Dinstein et al. (2007).
The first study (Dinstein et al., 2007) utilizes the technique of fMRI adaptation
(response suppression). This technique is based on single unit primate studies which indicate
that sensory neurons selective for an attribute reduce their firing rate when their preferred
stimulus is repeatedly presented. Two complementary tasks were performed by subjects in
this study – a RPS task and an imitation task. In the RPS task, subjects played the rock-
paper-scissors game against an opponent, but could not see their own hands. During the
course of the game, subjects sometimes repeatedly observed, or repeatedly executed, or
observed then executed or executed then observed the same gesture (the rock, paper, or
scissors gesture). The authors compared movement repeat trials (rock preceded by rock,
production or observation) with nonrepeat trials (rock preceded by paper, production or
observation) and looked for an adaptation effect (repetition suppression) in the BOLD
response. The idea is that mirror areas will show adaptation after both repeated observation
and repeated execution trials, and may show crossmodal adaptation (repetition suppression
when executing the same action just observed or observing the same action just executed). In
order to compare the results of this study with the published results of previous studies, an
imitation task was also utilized where subjects observed, executed, or imitated the rock,
paper, and scissors gestures.
23
In the RPS experiment, an overlap of adaptation effects (adaptation during
observation and during movement) was found in the anterior inferior frontal sulcus, ventral
premotor, anterior intraparietal sulcus, superior intraparietal sulcus, and posterior
intraparietal sulcus. Adjacent (but not overlapping) adaptation effects during movement and
observation repetition trials were found in the lateral occipital cortex (LO). No areas were
found to exhibit crossmodal adaptation. The same regions found to show overlapping
adaptation effects were also found to be active during the imitation task. However, activation
in the imitation was far more widespread. Some additional cortical areas were more active
during imitation but showed no movement selectivity, such as the cingulate motor area, early
visual areas, primary motor and somatosensory areas, and the posterior sylvian fissure. This
further pushes the point that a mirror system alone is not sufficient for imitation and that it
must be embedded in a larger network.
An unexpected finding was the adjacent (although not overlapping) observation and
execution adaptation effects in LO. However, given that the activation may include the so-
called extrastriate body area (EBA, Downing et al., 2001; Downing et al., 2006; Saxe et al.,
2006), this suggests that it may reflect motor imagery. Another unexpected finding was a
lack of areas showing adaptation for crossmodal repetition. Three possible explanations for
this mentioned by the authors are that mirror neurons do not adapt to crossmodal repetition,
crossmodal adaptation occurs on a longer timescale than that studied, or that the adaptation
effects seen are due to presynaptic mechanisms. Adaptation actually corresponds to a number
of phenomenon including presynaptic neurotransmitter depletion, postsynaptic receptor
trafficking, postsynaptic receptor desensitization, and hyperpolarization leading to spike
24
frequency adaptation (Zucker and Regehr, 2002). For adaptation due to postsynaptic
mechanisms the neuron adapts regardless of the input source, however for adaptation due to
presynaptic mechanisms the neuron adapts to a particular input. If the adaptation seen in
these experiments is due to presynaptic mechanisms, then crossmodal adaptation would not
be observed.
The second study, conducted by Shmuelof and Zohary (2006), tested subjects during
manual observation and execution tasks. The observation task consisted of video clips of
object manipulation using the right or left hand in either the right or left visual field. The
execution task was intended to investigate the motor properties of the areas identified by the
observation task and involved execution of actions with various body parts (hands, ankles,
tongue) without visual feedback
They found that transverse occipital sulcus, occipitotemporal cortex and CIP, and
superior parietal lobule activation were dependent on the visual field in which the stimulus
appeared, while AIP and postcentral gyrus activation was specific for the identity of the hand
observed (right or left). Assuming that AIP representations are computed from more
posterior input, some visual-to-motor transformation must have occurred between the caudal
and anterior parietal activation. Interestingly the anterior intraparietal cortex also exhibited
contralateral specificity during action execution (without visual feedback), suggesting that its
activation in the observation task reflects an internal motor representation of observed
movement. A post-experimental analysis attempted to further illustrate the mirror properties
of the anterior intraparietal sulcus by looking for adaptation effects in repeated trials. AIP
showed repetition suppression during repeated grasp observation, but during repeated object
25
observation. The area showing this effect overlaps with AIP activation in a previous study of
grasp viewing adaptation by the same authors (Shmuelof and Zohary, 2005).
The authors suggest (consistent with a conceptual model derived from macaque
electrophysiology, Sakata et al., 1995) that AIP is involved in integrating an efference copy
of motor commands with incoming visual input to guide hand action, predicting the outcome
of the current motor action. It was found that while the left AIP cortex was greatest activated
during viewing contralateral hand actions, there was a convergence of hand identity and
visual field effects in right AIP. It is commonly reported that meaningless intransitive
gestures are either bilaterally controlled or right lateralized, while meaningful (transitive and
symbolic gestures) rely on a left-lateralized network (Decety et al., 1997; Hermsdorfer et al.,
2001; Rumiati et al., 2005; Tanaka et al., 2001). Since the right parietal cortex appears to
integrate visual and motor representations (more so than the left AIP), these findings suggest
that perhaps bilateral meaningless gesture activation is due to necessary visuospatial
processing when aspects of an observed action cannot be mapped onto motor representations.
1.2 Related Models
1.2.1 Models of Grasping and the Mirror System
Previous related models of infant motor development include Berthier’s (1996) model of
learning to reach and the Infant Learning to Grasp Model (ILGM, Oztop et al., 2004). The
thread common to both of these models is reinforcement-based learning of intrinsically
motivated goal-directed actions based on exploratory movements, or motor babbling:
movements are generated erratically in response to a target and the mechanisms generating
the movements are modified via reinforcement. ILGM modeled early grasp learning, with
26
limited visual processing in that the affordance extraction module only represented the
presence, position, or orientation of an object. Only motor parameters specifying the
approach to the object and grasp enclosure speed were learned. In ILGM model, all fingers
were extended in the initial “preshape” portion of the grasp to a maximal aperture. Initially,
the enclosure was triggered by the palmar reflex upon object contact. The final grip
configuration was entirely determined by the enclosure and was not prespecified in any way.
The FARS model of primate grasping incorporated neurophysiological, connectivity, and
behavioral data from the macaque to form an integrated model of parieto-premotor
interactions in transforming visual object information into motor commands for grasping.
This model was centered on the anterior intraparietal sulcus (AIP), a region thought to be
involved in representation of affordances for grasping (Murata et al., 2000), and the premotor
region F5, a region associated with the control of goal-directed manual actions (Rizzolatti et
al., 1987).
In the FARS model, ventral premotor region F5 contains populations of neurons that are
selective for each grasp type (precision pinch, power grasp, etc.). Within each population,
subpopulations are selective for each phase of the grasp. In the behavioral protocol used in
the FARS simulations, this includes Set, Extension, Flexion, Hold, and Release; however
only Set, Extension and Flexion are considered here. Neurons in each Set subpopulation
excite neurons in the same subpopulation and inhibit those neurons in Set subpopulations
selective for other grasps. This connectivity implements a winner-take-all dynamic that
selects which grasp to perform based on affordance input from the anterior intraparietal area
AIP. The grasp is controlled by activating F5 populations that encode each phase of the grasp
27
(although some neurons might be active across more than one consecutive phase). Feedback
projections from F5 neurons to AIP modulate its activity according to the grasp phase.
In FARS, the secondary somatosensory cortex (SII) detects when the hand aperture
reaches the predicted maximal value for the planned grasp and triggers grasp enclosure by
activating the appropriate Flexion subpopulation in F5. ILGM includes detection of object
contact with the palm in SI that automatically triggers enclosure. These represent
feedforward and naïve feedback strategies, respectively. However a more sophisticated
feedback strategy might be used where the detection of the hand approaching the object is
used to trigger the grasp enclosure. This is still a feedback strategy in that the relative
position of hand and target object is used to trigger the enclosure, but does so before the hand
contacts the object as in the naïve feedback strategy.
The original FARS model did not address the generation of reaching movements, but
suggested that the ventral intraparietal area (VIP) and ventral premotor area F4 control them.
Indeed the two regions are connected (Luppino et al., 1999) and VIP does represent the
location of objects in peripersonal space (Colby et al., 1993). However the consensus
emerging from more recent data is that the dorsal premotor cortex, specifically the caudal
region F2 controls the selection of reach targets. On the basis of connectivity, it has been
suggested that VIP could bridge between motion-sensitive middle temporal cortex and reach-
related areas like V6a/MIP by visually monitoring the hand trajectory in space (Caminiti et
al., 1996). If so, F4 receptive field properties might be used to trigger the grasp enclosure as
the hand approaches the object.
28
Most related grasping models focus on learning inverse kinematics for the hand, selecting
contact points on the object’s surface, and developing feedback-based control of the hand.
Most models plan the grasp in terms of kinematics, but at least one model stresses control of
grasp forces. While many models use trial-and-error learning, they are not developmental
models like ILGM in the sense that they begin learning at a phase corresponding to grasping
development in infants 9 months and older. Some models are based on neurophysiological
data, but many are built purely using machine learning and robotics techniques.
Two models that stress learning inverse kinematics transformations for the hand are those
of Molina-Vilaplana et al. (2007) and Rezzoug and Gorce (Gorce and Rezzoug, 2004; 2003).
Molina-Vilaplana et al.’s (2007) model first learns the inverse kinematics functions of the
fingers, and then learns to associate object properties with grasp postures (a function they
relate to AIP/F5 functionality). The model first learns inverse kinematics for the thumb,
index, and middle fingers so that it knows the relationships between finger motor commands
and their sensory consequences (proprioceptive and visual), then learns to associate object
features with grasp postures with a local network called GRASP. The input to GRASP is a 7-
dimensional vector encoding object shape (cube, sphere or cylinder), object dimensions, and
whether to grasp with two or three fingers. Similarly, Rezzoug and Gorce (Gorce and
Rezzoug, 2004; 2003) have a model with a modular architecture that first learns inverse
kinematics for the fingers using backpropagation, then learns hand configuration for grasping
using reinforcement. The model then learns finger postures given the desired position of the
fingertip.
29
Several models plan grasps in terms of finger contact points on the object. Molina-
Vilaplana et al. (2007) use heuristics to select contact points on the object surface for the
fingers. Rezzoug & Gorce’s (Gorce and Rezzoug, 2004; 2003) model learns contact locations
for each finger. Kamon, Flash et al. (1996) split the problem of grasp learning into two
problems: choosing grasping points and predicting the quality of the grasp. Each grasp type
has a certain set of location and quality parameters used to select grasp locations and predict
quality, which are supplied beforehand as task-specific knowledge. Grasp quality is predicted
by the angles between the fingers and the surface normals at the contact points, and the
distance between the opposition axis and the center of mass. They suggest that the grasp
specification and evaluation modules run in an alternating manner until a suitable grasp is
selected. Faldella, Fringuelli et al. (1993) present an interesting model where a neural
mechanism for matching object geometry to hand shape interacts with a symbolic rule-based
expert system. The symbolic system performs some geometric analysis such as identifying
curvature type, selects candidate hand contact positions, identifies symmetric situations (to
reduce neural module input), and ranks the selected grasp according to task constraints.
Multilayer perceptrons trained using backpropagation were used to determine potential
grasps based on geometric information.
The Mirror Neuron System (MNS) model (Oztop and Arbib, 2002) of the monkey mirror
system was designed to associate activity in canonical neurons encoding the type of grasp
with visual input encoding the trajectory of a hand relative to an observed object. The
canonical neurons were modeled as an array of neurons whose activity determined the type
of grasp executed.
30
Two models focus on the role of F5 mirror neurons in feedback control of grasping.
Metta, Sandini et al. (2006) present a model where F5 canonical and mirror neurons form an
internal model with mirror neurons as part of a feedback loop in action control. In this model,
F5 is part of a larger circuit involving various parietal areas, STS, and other premotor and
frontal areas. As in FARS, motor program representations in F5 are activated on the basis of
object features represented in AIP and contextual signals. Inverse models are represented in
projections from visual areas to the cerebellum, which compares them with a delayed output
from F5 mirror neurons, which implement forward models. They note that these models and
object affordances can be learnt by trial-and-error. During action observation, visual
information goes to the inverse models and the F5 mirror forward models in parallel, with the
F5 canonical neurons filtering the F5 mirror responses. Maistros and Hayes (2004) present
work inspired by FARS where mirror and canonical neurons are involved in grasping, with
mirror neurons providing feedback for fine control. Their mirror system component consists
of a SOFM connected to a set of motor schemas. The system operates in learning mode
where no output is produced and recall phase. The motor system in this model is a simple
inverse model. In imitation experiments, a noisy set of observed joint angles is used as input
to the SOFM of the mirror system. Motor schemas receive a series of joint angle postures
through their linked SOFM node.
For Grupen and Coelho (2000), grasping is primarily a force domain task, emphasizing
force closure around an object over form (of the hand) closure. Their model used closed-loop
control, utilizing tactile feedback to reposition contact forces based on models of interaction
between the contacts and object surface. A Markov Decision Process (MDP) framework (Q-
31
learning, specifically) is used to select a sequence of controllers to maximize object stability
without knowing the object’s identity, geometry, or pose. This model is complementary to
those that focus on visually-based kinematic grasp planning in that it learns to use haptic
feedback in order to reposition contact forces to stabilize the object.
1.2.2 Action Selection Models
Fu & Andersen (2006) present a model that similarly uses a winner-take-all process to
select among actions based on their TD-learned value. Their model selects among
productions, which are similar to motor schemas in that they specify a set of preconditions
and a set of effects. Fu & Anderson’s model simulate extrinsic reward-driven learning with
little or no involvement of higher-level cognitive processes.
Cooper & Shallice (2000) model the control of routine sequential action with a
hierarchical schema network, in which action schemas excite goal nodes, which in turn excite
lower-level action schemas. This structure continues down until the base-level action
schemas are reached. Thus, each action schema defines a conjunctive list of subgoals and
specific preconditions for its activation. Each base-level action schema specifies a list of
object arguments and resource requirements (such as a hand). Each goal node defines a
disjunctive list of action schemas for achieving that goal. In this model, schemas may be
activated either by environmental stimuli or by top-down influence from higher-level
schemas. Symbolic flags are used to gate activation from higher- to lower-level schemas such
that only action schemas whose goal has not been achieved, but whose preconditions are met
may be activated. Each schema also has a recurrent self-projection that can be switched to be
either excitatory or inhibitory, depending on whether or not the schema’s goal has been
32
accomplished. Schemas with the same parent goal node, and those who share the same
subgoal nodes compete for activation via lateral inhibition. This model includes object
representations that contain a vector of activation values for various functions the object can
be employed for. Objects compete within functional roles via lateral inhibition to be selected
as the arguments to the selected action schema.
Botvinick & Plaut (2004) model the control of routine sequential action with an Elman-
type recurrent neural network. In this model the network learns hierarchically structured
sequences through BP training to develop hidden layer representations that are contextually
shaded in a manner appropriate to the given task. In this model, object selection is a learned
part of the sequence, with fixate actions used to select objects as targets of following actions.
While Botvinick & Plaut seem to present their work as a refutation of Cooper & Shallice
(2000) and hierarchical models of action selection in general, I see it as a further
specification of the implementation of hierarchical models, with some necessary
restructuring. While Botvinick & Plaut’s model addresses learning, it generates only the
sequences of actions it was trained on. Although the network was trained on sequences where
some steps may be executed in variable order or substituted for others, the sequences of
action it generates simply replicate those it has learned with the frequency that they appeared
in the training set.
1.2.3 Synthetic Brain Imaging Approaches
Synthetic brain imaging was developed in order to bridge the gap between
neurophysiological studies of monkeys and brain imaging studies on humans. It is used with
computational models based on detailed neural spiking data from individual brain regions in
33
order to compare the model activity with the global measures of brain activity given by
neuroimaging experiments using techniques such as PET or fMRI that measure the vascular
response to neural activation. The results of such a comparison can then be used to reinterpret
imaging data in computational terms and to validate and update the model. In order to guide
a comparison of previous synthetic brain imaging approaches, we have identified what we
consider to be the main issues involved:
• Neural and synaptic model
• Neurovascular coupling mechanism
• Vascular signal generation
• Neural localization
• Scanner simulation
• Data normalization
1.2.3.a Neural and Synaptic Model
The choice of neural and synaptic model depends on the data that the overall model is
intended to address. Early synthetic brain imaging approaches used simple firing rate neural
models (Arbib et al., 1995) while later models used mean field approximations (Corchs and
Deco, 2002, 2004), leaky integrate-and-fire models (Deco et al., 2004), and compartmental
models (Riera et al., 2006). Poznanski & Riera (2006) have argued for the need for analytic
models of local networks based on ionic cable theory and volume conductor modeling of the
neuropil; however it is not immediately clear how to simulate large-scale networks in this
manner.
34
A suggestion of the earliest synthetic brain imaging study (Arbib et al., 1995) which was
used by later approaches (Riera et al., 2006; Tagamets and Horwitz, 1998) was that the
hemodynamic response reflected synaptic activity rather than the spiking output of a region.
This makes the synaptic model chosen an important component of any synthetic brain
imaging technique. Synaptic models in previous approaches have ranged from the absolute
values of connection weights multiplied by presynaptic firing rate (Arbib et al., 1995;
Tagamets and Horwitz, 1998) to simple models of synaptic conductances for basic receptor
types (AMPA, NMDA, GABA-A, GABA-B Deco et al., 2004)
1.2.3.b Neurovascular Coupling Mechanism
Early approaches to modeling the neurovascular coupling mechanism focused on the
metabolic basis for the BOLD signal (Jueptner and Weiller, 1995). Based on the reasoning
that increased synaptic activity resulted in increased neural metabolism with a consequent
increase in local blood flow, these models integrated the total synaptic activity in a region in
order to compute a qualitative measure of rCBF (Arbib et al., 1995; Tagamets and Horwitz,
1998). In order to compute synaptic activity, y, these models summed the produce of each
neuron’s input with the absolute value of its connection weight (to include the effects of
inhibitory synapses on rCBF):
( ) ( ) ( )
n m n
m n
y t w z t
→
=
∑ ∑
where the outer sum is over each neuron, m, in a region, z
n
(t) is the output of presynaptic
neuron n at time t, and w
n→m
is the weight of the projection from neuron n to neuron m.
35
Similarly, later models used the sum of the absolute value of synaptic currents (Deco et al.,
2004)
( ) ( ) ( ) ( ) ( )
( )
AMPA NMDA GABAa GABAb
m m m m
m
y t I t I t I t I t = + + +
∑
where ( )
n
m
I t is the synaptic current mediated by receptor type n in neuron m at time t.
However this is not a pure measure of synaptic activity since each synaptic current
approaches zero as the membrane potential approaches its associated reversal potential.
Therefore, Izhikevich & Edelman (2008) use the sum of synaptic conductances:
( ) ( ) ( ) ( ) ( ) ( )
AMPA NMDA GABAa GABAb
AMPA m NMDA m GABAa m GABAb m
m
y t g s t g s t g s t g s t = + + +
∑
where
n
g is the maximal conductance of receptor type n, and ( )
n
m
s t is the fraction of open
receptors of type n on neuron m at time t. More detailed studies have modeled neural
metabolism, including glucose and oxygen (Sotero et al., 2009) and ATP consumption
(Aubert et al., 2007), but these issues will not be considered here. To date, no studies have
compared how sensitive the overall rCBF or BOLD predictions are to these alternatives, but
they likely yield qualitatively similar results.
It is now thought that several mechanisms coexist to regulate blood flow in response to
neural activity including the neuron-astrocyte pathway (Koehler et al., 2006), vasomotor
GABAergic interneurons (Cauli et al., 2004), and nitric oxide (NO) diffusion (Metea and
Newman, 2006). It is therefore becoming increasingly popular to use a generic blood flow-
inducing signal that subsumes these different mechanisms as input to a vascular model, as
suggested by Friston et al (2000):
36
( )
( ) 1
in
i f
f di i
y t
dt
ε
τ τ
- = - +
where i is the decaying blood flow-inducing signal, y is some measure of synaptic or neural
activity, ε is the efficacy of the neurovascular coupling, f
in
is the induced blood flow
providing positive feedback, and τ
i
and τ
f
are the time constants of the signal decay and
autoregulatory feedback, respectively. The time constant values are based on the frequency
of vasomotor constrictions and diffusion rate of NO. Riera et al. (2006) introduced a delay in
signal generation, τ
h
, and a baseline value, y
0
:
( )
( )
0
1
in
h
i f
f di i
y t y
dt
ε τ
τ τ
- = - - - +
Poznanski and Riera (2006) review synthetic brain imaging approaches and argue for the
need to model networks of astrocytes connected via gap-junctions and connected to the
vascular system. While we agree in principle, it is not currently practical to simulate large
networks of multiple brain regions at this level of detail and in almost all cases the data
needed to construct such models is not available. However, it is possible to gain significant
insight into the large scale organization of the brain by modeling at a coarser grain if the
model is constrained by enough experimental data. Synthetic brain imaging on the FARS
model of primate control of grasping (Fagg and Arbib, 1998) predicted the influence of PFC
on the anterior intraparietal area AIP ten years before it was verified anatomically (Borra et
al., 2007) and functionally (Baumann et al., 2009).
37
1.2.3.c Vascular Signal Generation
The first synthetic brain imaging approaches (Arbib et al., 1995; Tagamets and Horwitz,
1998) were applied to PET data since PET measures rCBF and does not include some of the
nonlinearities of the BOLD signal which also involves changes in blood volume and
oxygenation. Synthetic fMRI studies began to convolve measures of synaptic activity with
functions representing the hemodynamic response in order to simulate the signal smoothing
and delay seen in the BOLD signal (Figure 1-2). At the same time, models of the BOLD
response to increased blood flow were developed which became collectively known as the
“hemodynamics approach” and were used in later synthetic brain imaging techniques to
apply computational models to fMRI data (Babajani et al., 2005; Babajani and Soltanian-
Zadeh, 2006; Riera et al., 2006; Riera et al., 2004b; Sotero et al., 2009).
The earliest applications of synthetic brain imaging to fMRI used basically the same
technique as that used for PET, but modified the integration time in order to account for the
slice acquisition time of fMRI scanners (Arbib et al., 2000; Horwitz and Tagamets, 1999).
Horwitz and Tagamets (1999) additionally convolved the total synaptic activity with a
Poisson function in order to simulate hemodynamic delay. Deco, Rolls and Horwitz (2004)
used an analytic hemodynamic response function (HRF) from Glover (1999) which was more
accurate than any of the HRFs used in synthetic fMRI till then. Although the parameters of
this function can be fitted to experimental data for each subject, Deco et al. used the
parameters from Glover (1999) for a canonical HRF.
38
0 2 4 6 8 10 12 14 16 18 20 -0.2
0
0.2
0.4
0.6
0.8
1
1.2
Time
Normalized Signal
Σ|.|
Σ|.|
+
convolution
Pyramidal Synaptic Activity
Interneuron Synaptic Activity
Total Pyramidal
Synaptic Activity
Total Interneuron
Synaptic Activity
Total Synaptic Activity
Hemodynamic
Response Function
Simulated BOLD Signal
0 1 2 3 4 5 6
x 10
4
0
1000
2000
3000
4000
5000
6000
0 1 2 3 4 5 6
x 10
4
0
1000
2000
3000
4000
5000
6000
0 1 2 3 4 5 6
x 10
4
0
1000
2000
3000
4000
5000
6000
0 100 200 300 400 500 600
−200
0
200
400
600
800
Figure 1-2 The technique used in many early synthetic fMRI studies. The total absolute
synaptic activity of all neurons in the network is convolved with a hemodynamic response
function to produce a simulated BOLD response.
The hemodynamics approach utilizes biomechanical models of vasculature that
incorporate changes in blood oxygenation and volume. These models originate in Buxton et
al.’s (1998) “balloon model”, which has blood flow as its sole input and outputs a BOLD
signal consisting of extra- and intra-vascular components (Figure 1-3). The main feature of
this model is that it is able to reproduce the transients of the BOLD response such as the
initial dip and post-undershoot. Friston et al.’s (2000) extension of the balloon model embeds
it in a regional cerebral blood flow (rCBF) model of how a flow-inducing signal (described
in the section, Neurovascular Coupling Mechanism) causes changes in blood flow. This was
a linear dynamical model, assuming that the relationship between the signal and rCBF is
linear and that nonlinearity comes in during the transformation from rCBF to the BOLD
response.
39
PET fMRI
Flow-Inducing
Signal
-0.00025
0.04
Time
% Change
0.95
2.25
Time
% Change
0.98
1.375
0.2
0.55
0.65
1.01
Figure 1-3 Friston et al.'s (2000) extended balloon model with Zheng et al.'s (2002) capillary
model extension (modified from Zheng et al., 2002).
The portion of the balloon model which involves the way in which blood flow affects
oxygen delivery to tissue (originally developed by Buxton and Frank, 1997) was updated by
Zheng Y et al. (2002). The balloon model only applied to steady-state conditions. Zheng et
al.’s model includes a compartment for tissue oxygen (later extended to a total of three
compartments by Zheng et al., 2005) and a full capillary model with dynamics in order to be
applicable to transient as well as steady-state conditions.
1.2.3.d Neural Localization
Most synthetic brain imaging studies on large-scale neural models simply group all of the
neurons in each modeled region together in order to generate the simulated vascular
response. To date, only one study has localized simulated neurons in voxels obtained from
human MRI data (Izhikevich and Edelman, 2008). Only two studies have simulated the
crosstalk in the neurovascular coupling mechanism from adjacent voxels (Babajani et al.,
40
2005; Babajani and Soltanian-Zadeh, 2006), but neither of these was conducted on a large-
scale neural model. Note, however, that the volume conductance model of the neuropil
suggested by Poznanski and Riera (2006) would address this voxel crosstalk issue.
1.2.3.e Scanner Simulation
The properties of the scanning equipment itself are increasingly being taken into account
in synthetic brain imaging models, a crucial component for quantitative comparison with
experimental data. Most models integrate neural activity over the slice acquisition time of an
fMRI machine (Arbib et al., 2002; Corchs and Deco, 2002, 2004; Husain et al., 2004). Some
models additionally sample the simulated BOLD signal according to the repetition time (TR)
of the scanner (Horwitz et al., 2005; Husain et al., 2004; Lee et al., 2006). The Balloon model
includes parameter values for the BOLD signal equation that can be adjusted according to the
field strength of the scanner (Buxton et al., 1998; Friston et al., 2000). While most models
include noise in the neural model (Corchs and Deco, 2002, 2004; Deco et al., 2004; Husain et
al., 2004; Riera et al., 2004b; Tagamets and Horwitz, 2000), only a few include scanner noise
(Babajani et al., 2005; Riera et al., 2004b).
1.2.3.f Data Normalization
The first use of synthetic PET (Arbib et al., 1995) integrated the total absolute synaptic
activity in a region over the duration of the scan for two tasks. The percent increase in this
signal for one task compared to the other is taken as the percent increase in rCBF (Arbib et
al., 2002 offer relative synaptic activity as a more robust measure of synaptic activity).
Tagamets and Horwitz (2000) attempt to address the problem that synthetic brain imaging
simulations can only qualitatively and not quantitatively predict the results of actual brain
41
imaging experiments. To account for the different values of baseline activity and different
scales of rCBF and synaptic activity, they normalized experimental and simulated PET data
to a common reference area like V1.
A problem that confronts any large-scale synthetic brain imaging model, but hasn’t yet
been addressed, is that of the relative number of activated neurons in each region. The
number of activated neurons affects the simulated hemodynamic response in any synthetic
brain imaging model developed thus far. This means that in any model that lumps neurons in
each region together, each modeled region must contain the same proportion of neurons
activated by the task as the real brain region. However, the technique of localization of model
neurons in voxels from MRI data (Izhikevich and Edelman, 2008) could be extended using a
probabilistic brain atlas (Mazziotta et al., 2001) and detailed electrophysiological data in
order to approach this problem.
42
Table 1-1 A meta-analysis of synthetic brain imaging studies in terms of the mechanisms
included: neural model (LI=leaky integrator, SU=sigmoidal units, NaLI=sodium
concentration leaky integrator, GF=gamma function, MFA=mean field approximation,
DI=decaying impulse, LIF=leaky integrate-and fire, PSP=postsynaptic potential,
CM=compartmental model, NM=neural mass model, IZ=Izhikevich neuron), synaptic model
(CBK=conductance-based kinetic model), neurovascular coupling signal (WI=sum of
absolute value of connection weight times input, Na/K=ATP consumption by Na/K pump,
FA=field activity, SI=sum of absolute value of synaptic currents, SSC=sum of synaptic
conductances, NP=number PSPs, TCC=transmembrane capacitive currents, NAS=number of
active synapses), rCBF generation, BOLD signal generation, temporal smoothing, adjacent
voxel crosstalk, neural noise, and network connection variability (see Table A-1 for the full
meta-analysis). The present study is analyzed in the last row.
Neural
model
Synaptic
model
Neurovascular
coupling signal
rCBF BOLD Temporal
smoothing
Voxel
crosstalk
Neural
noise
Connection
variability
Arbib et al
(1995)
LI WI x
Tagamets &
Horwitz (1998)
SU WI x x
Horwitz &
Tagamets
(1999)
SU WI x x x
Friston et al
(2000)
x x x
Arbib et al
(2000)
LI WI x x
Tagamets &
Horwitz (2000)
SU WI x x
Aubert et al
(2001)
NaLI Na/K
Tagamets &
Horwitz (2001)
SU WI x x
Mechelli et al
(2001)
GF x x x
Arbib et al
(2002)
LI WI x
Husain et al
(2002)
SU WI x
Corchs & Deco
(2002)
MFA FA x x
Almeida &
Stetter (2002)
MFA WI x
Aubert &
Costalat (2002)
NaLi Na/K x x x
Buxton et al
(2004)
DI x x x
Corchs & Deco
(2004)
MFA FA x x x
Husain et al
(2004)
SU WI x x x
43
Table 1-1: Continued
Riera et al
(2004b)
x x x x
Deco et al
(2004)
LIF CBK SI x x x
Babajani et al
(2005)
PSP NP x x x x x
Horwitz et al
(2005)
SU WI x x x x
Riera et al
(2005)
x x x x
Chadderdon &
Sporns (2006)
SU WI x x x
Lee et al (2006) SU WI x x x
Riera et al
(2006)
CM TCC x x x x
Babajani &
Soltanian-
Zadeh (2006)
NM NP x x x x
Winder et al
(2007)
SU WI x x x
Sotero &
Trujillo-Barreto
(2008)
NM NAS x x x
Izhikevich &
Edelman
(2008)
IZ CBK SSC x
Babajani-
Feremi et al
(2008)
PSP NP x x x x x
Sotero et al
(2009)
NM NAS x x x x
current study IZ CBK SSC x x x x
44
Chapter 2 - Mirror System Model of Action Recognition
MNS2 is a new version of the original model using a recurrent architecture that is
biologically more plausible than that of the original model. Moreover, MNS2 extends the
capacity of the model to address data on audio-visual mirror neurons (Kohler et al., 2002)
and on response of mirror neurons when the target object was recently visible but currently
hidden (Umilta et al., 2001).
2.1 An Overview of the MNS2 Model
The MNS model (Oztop and Arbib, 2002) of the monkey mirror system was designed to
associate activity in canonical neurons encoding the type of grasp with visual input encoding
the trajectory of a hand relative to an observed object. The canonical neurons were modeled
as an array of neurons whose activity determined the type of grasp executed. The learning
mechanism was a feedforward backpropagation network of units with one hidden layer.
Hand state information for a grasp - the relation of hand to object - was represented as a 7
dimensional trajectory encoding certain key relations (Figure 2-1). The major drawback of
the MNS model was its treatment of the trajectory. At each time point, the initial segment of
the trajectory up to that time was fitted by a cubic spline, and then sampled at 30 times
spanning the segment to produce a 210 dimensional input vector to the network. In this way,
the temporal representation of hand state was pre-processed such that it could be encoded in
a spatial representation for input into the feedforward network. The network was trained on a
set of "self-performed" grasps using back propagation. This system was shown to correctly
classify observed grasps, often before the hand contacted the object. In addition, the network
45
yielded neurophysiological predictions concerning the time course of mirror neuron
activation and activity during the resolution of an ambiguous grasp.
Figure 2-1 The components of the hand state (a(t), o1(t), o2(t), o3(t), o4(t), d(t), v(t)). (from
Oztop and Arbib, 2002)
However, the unnatural coding required for the hand-state trajectory (relating hand and
object) led us to look for a model that could process the time series of hand-object
relationships without extensive recoding. We thus turned to recurrent networks. These
networks have been shown to be computationally powerful and useful for reducing problem
dimensionality (Jones, 1992). We investigated the use of a Jordan-type recurrent network
(Jordan, 1986) to classify grasps based on the temporal sequence of hand state information -
but now without the input pre-processing of the MNS model (see the right-hand loop in
Figure 2-2). The raw 7 dimensional hand state vector (Figure 2-1) is simply input to the
46
network at each time step. The network was again trained on a set of self-generated grasps,
but this time using backpropagation through time (Werbos, 1990). We show that this system
also correctly classifies different types of grasps, often before the feedforward
implementation does, and preserves the neurophysiological predictions made by the model.
Moreover, we have extended the model in a fashion consistent with available data on the
macaque brain to explain a range of further experimental data.
Audio Input Layer (Auditory
Recurrent Network Output Layer) Visual Input Layer Recurrent Input Layer
Hidden Layer
External Output Layer
Target Activity
Recurrent Output Layer
Hebbian Learning BPTT
Figure 2-2 The main recurrent network at the heart of the MNS2 model. The visual and
recurrent input layers are fully connected to the hidden layer, which is fully connected to the
external and recurrent output layers. The recurrent output layer is fully connected to the
recurrent input layer and the output of the audio recurrent network is fully connected to the
external output layer. The external output layer corresponds to F5 mirror neurons while the
target pattern is generated by F5 canonical neuron activity.
The new model also addresses the fact that natural actions typically involve an audio as
well as a visual component. The audio properties of mirror neurons are of major interest
47
because they may have been crucial in the transition from gesture to vocal articulation in the
evolution of language (Arbib, 2005). Kohler et al. (2002) found that some of the mirror
neurons in area F5 of the macaque premotor cortex that are responsive for the observation of
actions associated with characteristic noises (such as peanut breaking and paper ripping) are
just as responsive for the sounds of these actions. Area F5 is located in the ventro-rostral
portion of area 6 in the caudal inferior arcuate sulcus (Rizzolatti et al., 1996a). Audio
information reaches inferior caudal arcuate cortex via direct connections from the auditory
cortex (Deacon, 1992) and reaches area 6 via indirect connections through area 8 (Arikuni et
al., 1988; Romanski et al., 1999). The macaque nonprimary auditory cortex has been found
to respond to complex sounds (Rauschecker et al., 1995) while the primary auditory cortex
was found to have a tonotopic organization (Morel et al., 1993). It thus seems that auditory
input is extensively pre-processed by the time it reaches premotor cortex.
We model this auditory pre-processing with a two-tiered mechanism. We used Lyon's
Passive Ear model (Lyon, 1982) to calculate auditory nerve firing probabilities given a
particular sound. The normalized output of this model is then input to another Jordan-type
recurrent network, separately trained to recognize particular sounds. The output layer of this
audio recurrent neural network is directly and fully connected to the output layer of the main
recurrent neural network, corresponding to a direct connection from auditory cortex to F5.
These connection weights are modified using Hebbian learning. In this way, any sound that is
consistently perceived during the course of an executed action becomes associated with that
action and incorporated into its representation. This type of audio information is inherently
actor-invariant and this allows the monkey to recognize that another individual is performing
48
that action when the associated sound is heard. In all actions tested by Kohler et al. (2002),
the sound was associated with the final phase of the action. Mirror neurons responsive to
audio and visual stimuli were thus found to be active later during audio only conditions than
conditions with a visual component. The activation of these neurons during conditions with
only audio information was confined to the duration of the audio input.
Another challenge for our new model is provided by the finding of Umilta et al. (2001)
that mirror neurons in the macaque monkey can recognize an action if they have seen the
target object which was then hidden, but cannot recognize the action lacking current or recent
input on the affordances and location of the object. In their experiments, each monkey was
shown an object that was then obscured by a screen. When the monkey observed the
experimenter reaching behind the screen to grasp the object, the same mirror neurons were
activated that responded to a visible grasp to the same object. The same neuron does not
respond to a reach when no object is visible, or if the human reaches behind a screen that was
not previously shown to conceal an object. To recognize that another individual is executing
a grasping action even when the goal and final component of that action is hidden, the
monkey must possess a working memory trace of the object that the action is directed
towards. While the work of Umilta et al. (2001) did not address the duration of working
memory traces, the neurons responding to partially hidden grasps showed a significant level
of activity up to approximately 1000-1500 msec after the hand disappeared behind the
screen. Because our simulations only tested grasps with a 450-750 msec long hidden
component, we did not address working memory decay in this model.
49
To model the ability to recognize grasps even when the final stage of object contact is
hidden and must be inferred, we have hypothesized that the brain contains two working
memories, one for the shape and position of the hand and the other for the relevant
affordance of the object and its location. Brodmann Area (BA) 46 is typically associated with
working memory (Courtney et al., 1998; D'Esposito et al., 1998; McCarthy et al., 1994), and
in this model is hypothesized to be the location of the hand and object working memories. In
this model, visual information about the hand is provided to working memory through the
superior temporal sulcus (STS) and the working memory relays this information to areas 7a
and 7b when the hand is not visible. Visual information about the object reaches object
working memory from AIP and the medial, lateral, and ventral intraparietal areas
(MIP/LIP/VIP). These projections are supported by the work of Seltzer & Pandya (1989),
who found multiple connections from different areas of STS to area 46 and Neal et al.
(1990), who found a connection from area 46 to areas 7a and 7b, and Schall et al. (1995),
who found connections between LIP and area 46. Friedman & Goldman-Rakic (1994) have
found activity in both the dorsolateral prefrontal cortex and inferior parietal cortex (areas 7A,
7B, 7IP, and 7M) of the macaque monkey during spatial working memory tasks. Our model
predicts that this sustained activity is due to reciprocal connections with working memory in
area 46.
We use dynamic remapping to extrapolate the observed grasp trajectory once the hand
disappears behind the screen. Dynamic remapping is a process whereby perceptual
representations are updated based on generated motor commands, or related perceptual
information. In the present model, at each time step that the screen obscures the hand, the
50
representation of the movement of the still-visible forearm in STS is used to update the
working memory representation in area 46 of the hand position. In this way, if the model
observes an object that is then hidden by a screen, and then observes a grasp that disappears
behind that screen, the hand trajectory will be extrapolated - and if it appears to end at the
remembered object location then the grasp will be recognized.
The system diagram is shown in Figure 2-3. Auditory information about actions reaches
the F5 mirror neurons via the auditory cortex. When available externally, visual information
about the hand (shape and position) and object (features, affordances, and position) is input
into the Hand-Object Spatial Relation Analysis and Object Affordance-Hand State
Association schemas and into Hand- and Object- Working Memory, respectively. When the
visual information about the hand is not available externally, the Hand Working Memory
trace is input into the Hand - Object Spatial Relation Analysis and Object Affordance-Hand
State Association schemas and likewise for the Object Working Memory trace when visual
information about the object is not available externally. Detection of arm movement in STS
is used to dynamically remap the Hand Working Memory representation of the hand position.
51
Ha nd
Wo rking
Me mory
Object
Wo rking
Me mory
BA 46
Pr imary Au ditor y
Cor tex
S ound re cognition
Object
afforda nce
e xtra ction
Motor
progra m
(G ras p)
Motor
e xe cution
M1
Object fe atures
Object
location
Motor
progra m
(R e a ch)
F4
MI P /LI P/VI P
V isual C or tex
c IP S
S T S
PG
Ha nd s ha pe
recognition
& Ha nd
motion
detection
Ar m motion
detection
Object
afforda nce −
hand s ta te
as s ocia tion
PF
AI P
F5 m irror
Ac tion
recognition
(Mirror
Ne urons)
F5
c a n o nic a l
Ha nd−Object
s pa tial rela tion
a na lys is
No n p rima ry
au d itory
c orte x
Figure 2-3 System diagram for the MNS2 model (updating the MNS model of Oztop and
Arbib, 2002). The main recurrent network, shown more explicitly in Figure 2-2, models the
areas 7b and F5 mirror, shown here in the blue rectangle, by the activity of its hidden and
external output layers, respectively. The audio recurrent network models the nonprimary
auditory cortex. The grey rectangles enclose portions of the model unique to MNS2. The
orange rectangles the schemas relating to a portion of the FARS model (Fagg and Arbib,
1998) of visually directed grasping of an object. This includes F5 canonical neurons, which in
the MNS2 model provide the F5 mirror neurons with a target pattern of activity.
2.2 Methods
2.2.1 Reach and Grasp
We used the multi-joint 3D kinematics simulator of Oztop & Arbib (2002) to plan a grasp
and reach trajectory and execute it in a simulated 3D world. This simulator is a non-neural
implementation of the FARS model of primate grasping (Fagg and Arbib, 1998) that controls
a virtual 19 degrees of freedom (DOF) arm/hand and performs realistic grasps. Grasps are
52
planned by determining the points of desired contact of fingers on the object (based on the
type of grasp: power, precision, or side) and then finding the required arm/hand joint
configuration to produce this grasp (the inverse kinematics problem). The final arm/hand
joint configuration is determined by gradient descent with noise and the grasp trajectory is
then generated by warping time with a cubic spline. The parameters of this spline are derived
from empirical studies to fit the natural reach-to-grasp aperture and velocity profile. This
simulator was used to generate realistic grasps to train and test the model.
2.2.2 Visual Analysis of Hand State
The visual information input into the network for grasp recognition is the trajectory of a
7-dimensional vector encoding hand-object relations (the hand state, see Figure 2-1). As for
Oztop & Arbib (2002), this information is calculated from the joint configuration of the
simulated arm/hand and 3D object. The components of the hand state are
a(t): aperture of virtual fingers involved in grasping,
o1(t): cosine of the angle between the object opposition axis and the (index finger tip -
thumb tip) vector
o2(t): cosine of the angle between the object opposition axis and the (index finger
knuckle - thumb tip) vector
o3(t),o4(t): angle between the thumb and the side of the hand, and the thumb and the
inner surface of the palm,
d(t): distance from the wrist to the target at time t, and
v(t): tangential velocity of the wrist.
53
The hand state is calculated in an object-centered framework, allowing self-generated and
observed grasps to evoke similar hand state trajectories. The units of the network's input
layer (see Figure 2-2, visual input layer) encode the hand state with their activation values
(corresponding to firing rate), which are bounded by 0.0 and 1.0. In order to translate the raw
hand state values into the firing rates of the network's input layer units, each hand state
element is normalized according to its maximum possible value. At each time that the hand
and object are visible, this hand state is calculated directly from the simulated arm/hand and
object and applied to the input layer of the main recurrent neural network. The object and
hand attributes needed to calculate the hand state are stored in working memory. At each
time that the hand or the object is invisible, the hand state is calculated from the working
memory and input to the main recurrent neural network. The typical duration of grasps
generated by the simulator to objects at approximately arm's length was 32-34 simulation
time steps. Bennet & Castiello (1994) found the mean duration of a reach-to-grasp movement
to a target object 35 cm away to be 844 msec (SD=140) for healthy control subjects. Rand et
al. (2000) found the mean duration of a 40 cm reach-to-grasp movement to be 784.4 msec
(SEM=67.75) for healthy controls. Therefore, each time step of a simulated grasp
corresponds to approximately 25-30 msec. After the completion of each grasp, the visual
hand state was calculated for 10 more time steps, simulating observation of a static grasp for
250-300 msec.
2.2.3 Action Recognition
Grasps are recognized from audio and visual input into a recurrent neural network
augmented with working memory. The visual input is the 7 dimensional hand state, and the
54
auditory input is a 3 dimensional vector distinguishing different sounds. The network output
is a 3 dimensional vector, each element of which encodes a type of grasp (power, precision,
or side). The most active element in the network's output layer indicates the grasp
classification.
2.2.3.a Main Recurrent Network Setup
We used a Jordan-type recurrent network (Jordan, 1986) containing 7 external input
units, 5 recurrent input units, 15 hidden units, 3 external output units, and 5 recurrent output
units. Each layer is fully connected to the layer above it, and the recurrent output units are
fully connected with the recurrent input units (see Figure 2-2). The learning algorithm used is
backpropagation through time (Werbos, 1990). Backpropagation through time (BPTT) is a
learning method for recurrent neural networks expanding on the backpropagation learning
method for feedforward networks (Rumelhart et al., 1986). In BPTT, the network is
"unfolded" for a number of time steps L into a large feedforward network with connections
between copies of the network replacing the recurrent connections. After running the
network forward for L time steps, the output layer error is propagated backwards "through
time" along the unfolded network. We used a value for L equal to the length of the entire
sequence.
2.2.3.b Audio Input
Upon object contact, power and precision grasps were accompanied by slapping and
wood cracking sounds respectively. The WAV files containing these sounds were
preprocessed by Lyon's Passive Ear model (Lyon, 1982) and a recurrent neural network
modeling the auditory cortex. Lyon's model describes how the cochlea and hair cells
55
transduce acoustic energy into action potentials. The implementation of Lyon's Passive Ear
model was provided by Auditory Toolbox ver. 2 (Slaney, 1998), and we used a sample rate
of 10,000 and a decimation factor of 100. This produced sequences of 71-dimensional
vectors representing auditory nerve firing probabilities along the cochlea for the duration of
each sound. The normalized output of this model was then applied to the input units of the
audio recurrent network. The audio recurrent network contained 71 external input units, 5
recurrent input units, 20 hidden units, 3 external output units, and 5 recurrent input units. As
in the main recurrent network, each layer is fully connected to the layer above it, the
recurrent output units are fully connected to the recurrent input units, and the network
connection weights were modified using BPTT. Starting from the time step of hand contact
for each type of grasp, the sound associated with that type of grasp was processed by the
model. During this time the visual hand state input was static, corresponding to the
observation of a maintained grasp. The time step duration of the main network is 15 times
longer than that of Lyon's Passive Ear model. Therefore, the activity of the auditory network
output layer was averaged over 15 time steps before being propagated to the external output
layer of the main network (see Figure 2-2).
The weights of the audio network connections to the external output layer of the main
network (labeled Hebbian Learning in Figure 2-2) were modified using a Hebbian learning
rule.
( ) ( )
1 ij ij j i
W W A t MR t η = +
where A
j
(t) is the activity of the j
th
audio network external output unit and MR
i
(t) is the
activity of the i
th
main network external output unit at time t, W
ij
is the weight of the
56
connection between A
j
and MR
i
, and η
1
is the Hebbian learning rate. The value used in our
simulations for η
1
was 0.01. These connections were then normalized using the sum of
connection weights to each output unit. This sum was multiplied by the constant 10.0 to
serve as a scaling factor so that normalization bounds connection weights by 0.0 and 10.0.
3
1
10
ij
ij
ij
j
W
W
W
=
=
∑
2.2.3.c Working Memory and Dynamic Remapping
Working memory was implemented as arrays holding 3d coordinates and scalar values
for attributes of both the object and the hand, while dynamic remapping was used to
extrapolate the observed grasp trajectory once the hand disappears behind the screen.
The following values were stored in the object working memory array:
c1(t): 3d coordinates of the object's center
x(t): 3d vector representing its affordance opposition axis orientation
The following values were stored in the hand working memory array:
v1(t),v2(t): 3d coordinates of the tips of the virtual fingers involved in grasping along the
stipulated opposition axis.
c2(t): 3d coordinates of the center of the (index finger tip - thumb tip) aperture
w(t), w(t-1): 3d coordinates of the wrist, and time-delayed 3d coordinates of the wrist
ft(t): the (index finger tip - thumb tip) vector
kt(t): the (index finger metacarpophalangeal joint - thumb tip) vector
th(t): 3d coordinates of the metacarpophalangeal joint of the thumb
thb(t): 3d coordinates of the carpometacarpal joint of the thumb
57
s(t): 3d coordinates of the side of the hand
p(t): 3d coordinates of the center of the palm
d(0): the distance between the wrist and the target at t=0
No attempt was made to provide neural models for working memory (various alternatives
are available in the literature; see Durstewitz et al., 2000, for one review). Instead, for each
time step that the object or hand was visible, the above-mentioned attributes were stored in
their respective object or hand working memory array. When either the hand or the object
were invisible, the values held in working memory for the invisible entity (hand, object, or
both hand and object) were used to compute the hand state for network input. The hand state
elements were calculated from working memory as follows:
( ) ( ) ( ) ( )
2
1 2 a t v t v t = -
( )
( ) ( )
( ) ( )
1
ft t x t
o t
ft t x t
=
×
i
( )
( ) ( )
( ) ( )
2
k t x t
o t
kt t x t
=
×
i
( )
( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
3 arccos
th t thb t s t thb t
o t
th t thb t s t thb t
- - =
- × - i
( )
( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
4 arccos
th t thb t s t thb t
o t
th t thb t p t thb t
- - =
- × - i
( ) ( ) ( ) ( )
2
1 2 d t c t c t = -
58
( )
( ) ( ) ( )
( )
2
1
0
w t w t
v t
d
- - =
As in the fully visible condition, these hand state values were normalized and encoded in
the network's input layer on the range bounded by 0.0 and 1.0. When the hand or object was
invisible and there was no working memory trace of their attributes, the units encoding the
network hand state input were set to 0.0 (their minimal firing rate).
Dynamic remapping was carried out on the working memory representation of the wrist
position and index finger-thumb aperture center position in each time step that the hand was
not visible. This served to update the working memory representation of the hand position by
extrapolating the hand's trajectory based on the trajectory of the still-visible arm. The wrist
position and index finger-thumb aperture center position working memory traces were
displaced by the same magnitude and direction as the change in forearm position from the
previous step in a manner similar to that used by Dominey & Arbib (1992) to remap saccade
target locations. This was accomplished by calculating the difference in position of a point on
the arm between the two latest time steps, and using this value to update the working
memory representations:
( ) ( ) ( )
1
wrist wrist
WM WM r t r t = + - -
( ) ( ) ( )
1
aper aper
WM WM r t r t = + - -
where WM
wrist
is the working memory representation of the wrist's 3 dimensional
coordinates, WM
aper
is the working memory representation of the 3 dimensional coordinates
of the center of the index finger-thumb tip aperture, and r(t) is the 3 dimensional coordinates
59
of the point on the arm used for remapping at time t. We ran a series of simulation
experiments to determine the best position along the arm to use for remapping in which we
tested a point on the upper arm halfway between the elbow and the shoulder, the elbow, and
a point on the forearm halfway between the elbow and the wrist (see Figure 2-4). The best
performance came from using the center of the forearm, and this was used for r(t) in all
subsequent simulation experiments. This is not so surprising; since the forearm is the closest
to the hand of all the points tested, its trajectory would most accurately reflect the hand's
trajectory. The elbow trajectory is sometimes misleading, as the forearm can rotate about the
elbow without any elbow translation. This can cause the working memory representation of
the hand position to remain static, although the hand is indeed moving.
60
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
Figure 2-4 Activation of the model's external output units to partially hidden grasps with and
without dynamic remapping and using the upper arm, elbow, or forearm position to perform
the remapping (plus: precision grasp output unit, cross: side grasp output unit, asterisk: power
grasp output unit). In each set of plots, the plot to the right of the grasp figure shows the
network activity during a fully visible grasp, while the plots below each grasp figure show the
network activity using (columns from left to right) the upper arm, elbow, or forearm for
dynamic remapping and without dynamic remapping for hidden grasps where the hand
disappears behind the screen after (rows from top to bottom) 5, 6, 7, 8, 9, or 10 time steps.
Using the position of the upper arm, elbow, or forearm to dynamically remap the working
memory representation of the hand location allows extrapolation of the hand state trajectory.
The use of the forearm for dynamic remapping provides a slightly better approximation than
using the upper arm or elbow to the output activity pattern during a visible grasp.
2.2.4 Training
The training set for the audio recurrent network consisted of the output of the Lyon
Passive Ear model to 3 WAV files of different sounds (slapping, wood cracking, and paper
ripping). This network was trained using backpropagation through time until all sounds were
correctly identified prior to training the main recurrent network. After training, this network
61
was able to unambiguously identify each sound very early in its duration. The mechanisms of
environmental sound recognition are outside the scope of this paper, and thus an analysis of
the dynamics of the trained auditory recurrent network is not presented here.
The training set for the main recurrent network was constructed by making the simulator
perform various grasps in the following way. The objects used were a cube of changing size
(a generic sized cube scaled by a random factor between 0.5 and 1.5), a disk (approximated
as a thin prism, scaled randomly by a factor between 0.75 and 1.5), and a ball (approximated
as a dodecahedron, scaled randomly by a number between 0.75 and 1.5). In the training set
formation, a certain object always received a certain grasp (either power, side, or precision)
and power and precision grasps were associated with a distinct sound at the grasp completion
(slapping for power, and wood cracking for precision).
The object locations were chosen from a portion of the surface of a sphere centered on
the simulated arm's shoulder joint. The portion was defined by bounding the longitude and
latitude lines on the sphere's surface by -45
o
and 45
o
. During the generation of training data,
this portion of the sphere's surface was traversed in increments of 10
o
. Thus the simulator
made 9×9=81 grasps per object. Unsuccessful grasp attempts were identified as those in
which the resulting trajectory did not bring the hand in contact with the object; these were
discarded from the training set. For each successful grasp, one negative example was added
to the training set to stress that the distance to target was important. The target location was
perturbed and the grasp was repeated (to the original target position).
The main recurrent neural network was trained with this set using backpropagation
through time until 95% of the grasps in the set were correctly identified. The training signal
62
for each grasp was determined by a 3-dimensional vector distinguishing each grasp type and
was applied to the network at each time step after the first five. The connections from the
audio recurrent network output layer to the main recurrent network output layer were
simultaneously trained using Hebbian association.
2.3 Simulation Results
2.3.1 Recurrent neural network performance
After training, the recurrent network was able to efficiently classify grasps given the hand
state trajectory. Most grasps are initially ambiguous, but are eventually resolved by the
network often well before the hand makes contact with the object. Figure 2-5 shows
examples of precision, side, and power grasps generated by the simulator and the time course
of the network's output unit activity for each grasp. In these simulations there was no
auditory component to the grasps. The results of these simulations show that the recurrent
neural network functions just as effectively in grasp recognition as the feedforward
implementation developed by Oztop & Arbib (2002).
63
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
Figure 2-5 Top Left: Example of a generated precision grasp. The squares in all grasp figures
denote the wrist trajectory. Top Right: External output unit activation for this grasp (plus:
precision grasp output, cross: side grasp output, asterisk: power grasp output). The black
vertical lines indicate the time step in which the hand first contacts the object. The grasp is
correctly recognized as a precision grasp well before the hand contacts the object. Middle
Left: Example of a generated side grasp. Middle Right: External output unit activation for this
grasp (plus: precision grasp output, cross: side grasp output, asterisk: power grasp output).
The grasp is correctly recognized as a side grasp. Bottom Left: Example of a generated power
grasp. Bottom Right: External output unit activation for this grasp (plus: precision grasp
output, cross: side grasp output, asterisk: power grasp output). The power grasp is correctly
recognized well before the hand contacts the object.
Figure 2-6 displays the response of each layer of the network to the grasps shown in
Figure 2-5. For all three types of grasps, the recurrent output and recurrent input layers show
64
a very similar pattern of activity. This suggests that the network does not utilize the recurrent
unit activity in performing grasp classification, but perhaps in monitoring grasp progression.
The dip in the activity of the power output unit at time step 12 during observation of a power
grasp is caused by a reversal in the direction of change of the activity level of input unit
o3(t). This reflects the changes in one of the 2 degrees of freedom of the carpometacarpal
joint of the thumb needed to form a power grasp hand configuration from the initial hand
configuration of this particular grasp. This dip in power output unit activity is not seen in
other power grasps generated from different initial joint configurations.
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
Figure 2-6 Network activity of (from left to right) the network's recurrent input (plus:
recurrent input unit 0, cross: recurrent input unit 1, asterisk: recurrent input unit 2, square:
recurrent input unit 3, filled square: recurrent input unit 4), external visual input (plus: aper1,
cross: ang1, asterisk: ang2, square: speed, filled square: dist, circle: axisdisp1, filled circle:
axisdisp2), hidden (plus: hidden unit 0, cross: hidden unit 1, asterisk: hidden unit 2, square:
hidden unit 3, filled square: hidden unit 4, circle: hidden unit 5, filled circle: hidden unit 6,
triangle: hidden unit 7, filled triangle: hidden unit 8, upside-down triangle: hidden unit 9,
filled upside-down triangle: hidden unit 10, diamond: hidden unit 11, filled diamond: hidden
unit 12, pentagon: hidden unit 13, filled pentagon: hidden unit 14), external output (plus:
precision grasp unit, cross: side grasp unit, asterisk: power grasp unit), and recurrent output
layers (plus: recurrent output unit 0, cross: recurrent output unit 1, asterisk: recurrent output
unit 2, square: recurrent output unit 3, filled square: recurrent output unit 4) during the
precision (top row), side (middle row), and power (bottom row) grasps shown in Figure 2-5.
65
2.3.2 Lesioned network performance
To determine the necessity of a recurrent network, we attempted to train a feedforward
network with the same hidden layer size on the last 10 steps of each grasp in the training set.
This network was able to successfully classify grasps in their final stages, but was unable to
generalize enough to "predict" the classification of a grasp early in its trajectory. This
indicates that the spatial-to-temporal transformation carried out on hand state input in the
previous MNS model (Oztop and Arbib, 2002) was necessary for early grasp recognition by
a feedforward network. To further investigate the importance of the recurrent connections in
the recurrent network we tested the effectiveness of the network in grasp recognition after
lesioning these connections (see Figure 2-7). Without the recurrent connections, the network
was unable to recognize any grasps at all, indicating that the recurrent unit activity
significantly contributes to the correct operation of this particular trained network.
66
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
Figure 2-7 Left: Generated grasps. Middle: Recurrent network output unit activation for each
grasp (plus: precision grasp unit, cross: side grasp unit, asterisk: power grasp unit). The
network successfully recognizes all three grasps. Right: Output unit activation of network
with lesioned recurrent connections for each grasp (plus: precision grasp unit, cross: side
grasp unit, asterisk: power grasp unit). No output unit reaches a significant level of activity for
any grasp.
Further analysis of the network activity (see Figure 2-8) yields more insight into the
function of the hidden units and recurrent connections after training. The hidden units can be
classified into two groups based on their activity during the three types of grasps:
"discriminative" hidden units whose final activity levels seem to classify different types of
grasps in the recurrent condition (hidden units 0, 2, 6, 7, 8, 9, 11, and 12; see Figure 2-9) and
"indiscriminative" hidden units whose final activity levels are the same or very similar for
each type of grasp in the recurrent condition (hidden units 1, 3, 4, 5, 10, 13, and 14; see
Figure 2-10). The discriminative hidden units can be further divided into two groups: "fully
67
discriminative" hidden units whose final activity levels are very different for each of the
three types of grasps (hidden units 2, 6, 7, 12, and 8), and "partially discriminative" hidden
units whose final activity levels only differentiate one type of grasp from the other two types
(hidden units 0, 9, and 11). The indiscriminative hidden units can also be divided into two
groups: "grasp indiscriminative" hidden units whose activity levels remain static at 1.0 or
increase to 1.0 as each type of grasp progresses (hidden units 4, 5, and 13), and "anti-grasp
indiscriminative" hidden units whose activity levels decrease to 0.0 as each type of grasp
progresses (hidden units 1, 3, 10, and 14). It should be noted that these are broad
categorizations, and that the activity pattern of any one hidden unit may not fall entirely
within one group. However, these groups do seem to be a fairly accurate description of the
clustering of activation patterns of the hidden units in the network and aid in understanding
the network's operation.
68
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
Figure 2-8 Network activity of (columns from left to right) the network's recurrent input (plus:
recurrent input unit 0, cross: recurrent input unit 1, asterisk: recurrent input unit 2, square:
recurrent input unit 3, filled square: recurrent input unit 4), external visual input (plus: aper1,
cross: ang1, asterisk: ang2, square: speed, filled square: dist, circle: axisdisp1, filled circle:
axisdisp2), hidden (plus: hidden unit 0, cross: hidden unit 1, asterisk: hidden unit 2, square:
hidden unit 3, filled square: hidden unit 4, circle: hidden unit 5, filled circle: hidden unit 6,
triangle: hidden unit 7, filled triangle: hidden unit 8, upside-down triangle: hidden unit 9,
filled upside-down triangle: hidden unit 10, diamond: hidden unit 11, filled diamond: hidden
unit 12, pentagon: hidden unit 13, filled pentagon: hidden unit 14), external output (plus:
precision grasp unit, cross: side grasp unit, asterisk: power grasp unit), and recurrent output
layers (plus: recurrent output unit 0, cross: recurrent output unit 1, asterisk: recurrent output
unit 2, square: recurrent output unit 3, filled square: recurrent output unit 4) during the (rows
from top to bottom) precision, side, and power grasps shown in Figure 2-7 with and without
recurrent connections.
69
Figure 2-9 Activation of three discriminative hidden units for each type of grasp in the intact
network (left column) and the same network with lesioned recurrent connections (right
column) (dark line: power grasp, light gray line: precision grasp, medium gray line: side
grasp).
The activity patterns of most discriminative hidden units remain qualitatively similar in
the lesioned recurrent condition (see Figure 2-9), however the activity of every
indiscriminative hidden unit is entirely dependent on the activity of the recurrent input layer,
as shown by their reversal in activity levels when the recurrent connections are lesioned (see
Figure 2-10). This and the fact that the hidden units broadly classified as indiscriminative
have stronger connections to the recurrent input and output layers than those classified as
discriminative indicates that the recurrent units monitor the progress of a grasp and determine
whether or not a grasp occurs at all. Each grasp indiscriminative hidden unit is slightly
excited by the distance input unit, inhibited by the velocity input unit, and excited by the
entire recurrent input layer. Each of these hidden units also inhibits the entire recurrent
70
output layer and their connection weights to the output layer combine to exert a net excitation
on each external output unit. The anti-grasp indiscriminative hidden units tend to be more
strongly excited by the distance input unit and inhibited by the entire recurrent input layer.
The connection weights of these hidden units to the external output layer form a net
inhibition on each external output unit. The information from the grasp/anti-grasp
indiscriminative hidden units seems to modulate the external output layer so that
classification is only performed on grasps that contact the object.
Figure 2-10 Activation of three indiscriminative hidden units for each type of grasp in the
intact network (left column) and the same network with lesioned recurrent connections (right
column) (dark line: power grasp, light gray line: precision grasp, medium gray line: side
grasp).
2.3.3 Audio-Visual Mirror Neurons
We tested the performance of the network in classifying an observed grasp under audio
only, visual only, audio-visual with congruent sounds, and audio-visual with incongruent
71
sound input conditions. Under each condition the simulated grasp and object were the same,
but the arm and object's visibility, the action's audibility, and the sound associated with the
action varied. The sound presented with the action was the same as that in training for the
congruent sound condition, while the sound associated with a different grasp in the training
set was presented with the action in the incongruent sound condition. In the audio-visual with
congruent sound input condition, the audio input served to slightly strengthen the output
activity. In the audio-visual with incongruent sound input condition, the incongruent audio
input was not strong enough to overcome the correct output unit activation induced by the
visual input. In the audio input only condition, the network correctly identified the grasp type
associated with the sound. In this condition, output unit activity was confined to the duration
of the audio input activity (see Figure 2-11).
72
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 10 20 30 40 50 60 70
Activation
Time
Figure 2-11 Left: Activation of the external output layer of the model's audio recurrent
network when presented with the output of the Lyon Passive Ear model for (from top to
bottom) a slapping sound, no sound, a slapping sound, and a wood cracking sound (plus: slap
output unit, cross: wood output unit, asterisk: paper output unit). Middle: Activation of the
external output layer of the model's main recurrent network when presented with a power
grasp sequence containing (from top to bottom) visual and congruent audio, visual only, audio
only, and visual and incongruent audio information (plus: precision grasp output unit, cross:
side grasp output unit, asterisk: power grasp output unit). The black vertical lines indicate the
time step at which the hand made contact with the object and the sound information is input
into the auditory network. Waveforms of the slapping and wood cracking sounds are shown at
the bottom of the Auditory and Auditory + Visual - Incongruent Sound displays, respectively.
Right: Activation of an audiovisual mirror neuron responding to (from top to bottom) the
visual and audio components, visual component alone, and audio component alone of a
peanut-breaking action. A waveform of the peanut breaking sound is shown at the bottom of
the audio alone condition display. (reproduced from Kohler et al., 2002 copyright 2002,
AAAS)
73
Although the incongruent audiovisual action condition tested in these simulation
experiments was not tested in the experiments of Kohler et al. (2002), there is a well-known
phenomenon called the "McGurk effect" that is similar to this condition. In the McGurk
effect, the observation of a spoken syllable of speech influences auditory judgments of a
different syllable (McGurk and MacDonald, 1976). For example, when presented with the
sight of a someone articulating the syllable "ba", but with the sound of someone articulating
the syllable "ta", what is perceived is neither "ba", not "ta", but "ka". This suggests a
crossmodal population encoding based on a set of articulators, rather than a winner-take-all,
competitive strategy in speech perception. In the general case, however, there may be
multiple objects and actors in a given scene and the problem is how to assign different
multimodal cues to different perceived objects and actions - the "binding problem"
(Treisman, 1996). If the cues from different modalities are typically associated with very
different objects or actions (as is the case with peanut breaking and paper tearing),
conflicting cue information from different modalities may be bound to different entities or
actions. If this were the case, the sound of a peanut breaking and the sight of a paper tearing
action would elicit the perception of both actions, without competitive inhibition of one or
the other.
2.3.4 Hidden Grasp Simulations
To simulate a hidden grasp, the object was visible to the network for the first 5 time steps
of the grasp and was then set to invisible. The hand was visible for the first 15 time steps, and
was then set to invisible as it reached to grasp the object behind the screen (Figure 2-12, third
row). In each case, a working memory trace was maintained after the visual input was no
74
longer visible. When either the hand or object was invisible, the hand state network input was
calculated from the working memory trace. During these simulations, no auditory
information was presented to the network. The initial presentation of the object allowed its
position and affordance information to be stored in working memory. The object and hand
working memory traces utilizing dynamic remapping to update the hand position were
sufficient for the network to correctly recognize a hidden grasp (see Figure 2-12).
Pantomimed grasps (Figure 2-12, top) were simulated by setting the hand visible and object
invisible for the whole grasp. To simulate a hidden pantomimed grasp (Figure 2-12, third
row), the hand was visible during the same time periods as in the hidden grasp, but the object
was set invisible for the whole grasp. Neither of these methods allowed a trace of the object
location and affordance information to be stored in working memory, and therefore the
network did not respond to either pantomimed grasp condition, in accord with the
experimental data.
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
Figure 2-12 Left: Mirror neuron activation for (from top to bottom) visible pantomimed grasp,
visible grasp, partially hidden grasp, and partially hidden pantomimed grasp conditions. All
grasps were power grasps (reproduced from Umilta et al., 2001 with permission from
Elsevier) Right: Activation of the model's external output units under the same conditions
75
(plus: precision grasp output unit, cross: side grasp output unit, asterisk: power grasp output
unit). The black vertical lines indicate the time step at which the hand was no longer visible to
the network. The only output unit showing a significant level of activity in any plot is the one
encoding power grasps.
To investigate the dynamics of the interaction between the hand state working memory
and the recurrent network (see Figure 2-13), we gradually decreased the number of time steps
that the hand was visible during the grasp. We compared the results of this procedure with
and without utilization of dynamic remapping. As the time step that the hand "disappeared
behind the screen" increased, the network gradually approximated the output pattern it gave
for the same grasp when fully visible. This was the case with and without dynamic
remapping. Without dynamic remapping, however, the activity of the power grasp output
unit became static once the hand disappeared. Dynamic remapping allowed the hand
trajectory to be extrapolated so that the power grasp output unit reached its maximum
activation level in a time course similar to that in the fully visible grasp condition, yielding a
much better approximation of its activity.
76
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
Figure 2-13 Activation of the model's external output units to partially hidden grasps with and
without dynamic remapping (plus: precision grasp output unit, cross: side grasp output unit,
asterisk: power grasp output unit). The plot directly to the right of the grasp diagram shows
the network's response to a fully visible grasp. Each response in the remaining plots is to the
same generated grasp with the time step of the hand's disappearance varying from 5 to 15. In
each pair of columns, the left column shows the network's response to a hidden grasp with
dynamic remapping, while the right columns shows the network's response to the same hidden
grasp without dynamic remapping. The black vertical lines indicate the time step at which the
hand was no longer visible to the network.
77
The dependence of the network on the activity of the input units encoding the distance
between the hand and the object and the wrist velocity is illustrated in Figure 2-14. The
difference in the activity of the input layer between the visible grasp and the hidden grasp
with dynamic remapping conditions is that the activity of every input unit except for the
distance and velocity units becomes static at the time step that the hand disappears (time step
5 in this figure) in the hidden grasp with dynamic remapping condition. Dynamic remapping
successfully extrapolates the activity of the distance and velocity units to closely approximate
their pattern of activity in the visible grasp condition. The difference in the activity of the
input layer between the hidden grasp with dynamic remapping and hidden grasp without
dynamic remapping conditions is that the activity of the distance and velocity units also
become static at the time step that the hand becomes obscured (again, time step 5 in this
figure). This small difference in input patterns causes drastic changes in the activation
patterns of the hidden, output, and recurrent layers, illustrating the importance of the distance
and velocity input units on the operation of the hidden and recurrent layers.
78
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
Time Time
Figure 2-14 Network activity of (columns from left to right) the network's recurrent input
(plus: recurrent input unit 0, cross: recurrent input unit 1, asterisk: recurrent input unit 2,
square: recurrent input unit 3, filled square: recurrent input unit 4), external visual input (plus:
aper1, cross: ang1, asterisk: ang2, square: speed, filled square: dist, circle: axisdisp1, filled
circle: axisdisp2), hidden (plus: hidden unit 0, cross: hidden unit 1, asterisk: hidden unit 2,
square: hidden unit 3, filled square: hidden unit 4, circle: hidden unit 5, filled circle: hidden
unit 6, triangle: hidden unit 7, filled triangle: hidden unit 8, upside-down triangle: hidden unit
9, filled upside-down triangle: hidden unit 10, diamond: hidden unit 11, filled diamond:
hidden unit 12, pentagon: hidden unit 13, filled pentagon: hidden unit 14), external output
(plus: precision grasp unit, cross: side grasp unit, asterisk: power grasp unit), and recurrent
output layers (plus: recurrent output unit 0, cross: recurrent output unit 1, asterisk: recurrent
output unit 2, square: recurrent output unit 3, filled square: recurrent output unit 4) for (rows
from top to bottom) the fully visible grasp, the time step 5 hidden grasp with dynamic
remapping, and the time step 5 hidden grasp without dynamic remapping shown in Figure
2-13.
Figure 2-15 displays the hidden units that show both the greatest similarity in activation
patterns during the visible and hidden with dynamic remapping conditions and the greatest
difference in activation patterns during the hidden with and without dynamic remapping
conditions. These are the anti-grasp indiscriminative hidden units 3 and 10. These hidden
units are strongly excited by the distance input unit and slightly excited by the velocity input
unit. Hidden unit 10 inhibits the entire external output layer, especially the power output unit.
79
Its static firing rate of .9 in the hidden grasp without dynamic remapping condition signals
that no grasp is taking place.
0
0.2
0.4
0.6
0.8
1.2
5 10 15 20 25 30 35 40
Activation
1.0
Time
0
0.2
0.4
0.6
0.8
1.2
5 10 15 20 25 30 35 40
Activation
1.0
Time
Figure 2-15 The hidden units with the greatest similarity in their activation patterns during the
fully visible grasp and partially hidden grasp with dynamic remapping conditions and the
greatest difference between their activation patterns during the partially hidden grasp with and
without dynamic remapping conditions, are the anti-grasp indiscriminative hidden units 3
(top) and 10 (bottom). Each line shows the unit's activity during different conditions (dark
line: visible grasp, light gray line: hidden grasp with dynamic remapping, medium gray line:
hidden grasp without dynamic remapping).
We next simulated an unsuccessful hidden grasp in which the hand reaches for the object
behind a screen, but in fact overshoots the object (see Figure 2-16, network activity in Figure
2-17). This simulation was intended to determine the effectiveness of the dynamic remapping
of hand state working memory in successfully recognizing hidden grasps and discriminating
them from hidden reaches without object contact. The network's response to a fully visible
reach that overshoots the object initially resembles its response to a fully visible reach and
grasp, but then the output activation level declines as the hand passes the object. With
80
dynamic remapping, this output pattern is maintained for a hidden overshooting reach even
as the hand disappearance time step is pushed back until only 30 percent of the grasp is
visible. Without dynamic remapping, the output pattern becomes static as the hand
disappears, falsely predicting a successful grasp.
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
Figure 2-16 Activation of the model's external output units to partially hidden reaches that
overshoot the object with and without dynamic remapping (plus: precision grasp output unit,
cross: side grasp output unit, asterisk: power grasp output unit). The plot to the right of the
grasp diagram shows the network's response to a fully visible grasp. Each response in the
81
remaining plots is to the same generated grasp with the time step of the hand's disappearance
varying from 10 to 19. In each pair of columns, the left column shows the network's response
to a hidden grasp with dynamic remapping, while the right columns shows the network's
response to the same hidden grasp without dynamic remapping. The black vertical lines
indicate the time step at which the hand was no longer visible to the network.
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
0
0.2
0.4
0.6
0.8
1
1.2
0 5 10 15 20 25 30 35 40
Activation
Time
Figure 2-17 Network activity of (columns from left to right) the network's recurrent input
(plus: recurrent input unit 0, cross: recurrent input unit 1, asterisk: recurrent input unit 2,
square: recurrent input unit 3, filled square: recurrent input unit 4), external visual input (plus:
aper1, cross: ang1, asterisk: ang2, square: speed, filled square: dist, circle: axisdisp1, filled
circle: axisdisp2), hidden (plus: hidden unit 0, cross: hidden unit 1, asterisk: hidden unit 2,
square: hidden unit 3, filled square: hidden unit 4, circle: hidden unit 5, filled circle: hidden
unit 6, triangle: hidden unit 7, filled triangle: hidden unit 8, upside-down triangle: hidden unit
9, filled upside-down triangle: hidden unit 10, diamond: hidden unit 11, filled diamond:
hidden unit 12, pentagon: hidden unit 13, filled pentagon: hidden unit 14), external output
(plus: precision grasp unit, cross: side grasp unit, asterisk: power grasp unit), and recurrent
output layers (plus: recurrent output unit 0, cross: recurrent output unit 1, asterisk: recurrent
output unit 2, square: recurrent output unit 3, filled square: recurrent output unit 4) for (rows
from top to bottom) the fully visible regular grasp, the fully visible overshot grasp, the time
step 15 hidden overshot grasp with dynamic remapping, and the time step 15 hidden overshot
grasp without dynamic remapping shown in Figure 2-16.
The same two anti-grasp indiscriminative hidden units (hidden units 3 and 10) that allow
recognition of hidden grasps when the distance input is dynamically remapped (see Figure
82
2-15), also allow the network to correctly respond to a hidden grasp that in fact overshoots
the object without actually contacting it (see Figure 2-18). Both units are strongly excited by
the distance input unit, so their activity levels fall as the hand approaches the object. During
the visible overshot grasp, the activity level of the distance unit begins to increase as hand
passes the object and continues to move past it, exciting the anti-grasp indiscriminative
hidden units. This part of the grasp is obscured in the hidden overshot grasp conditions.
Dynamic remapping shifts the activation level of the distance input unit, allowing it to excite
the anti-grasp indiscriminative hidden units 3 and 10, allowing them to inhibit the external
output layer, especially the power output unit. Without dynamic remapping, the distance
input unit remains near 0.0, the anti-grasp indiscriminative hidden units 3 and 10 are not
sufficiently activated, and the hidden overshot grasp is incorrectly identified as a successful
power grasp.
83
0
0.2
0.4
0.6
0.8
1.2
5 10 15 20 25 30 35 40
Activation
1.0
Time
0
0.2
0.4
0.6
0.8
1.2
5 10 15 20 25 30 35 40
Activation
1.0
Time
Figure 2-18 Two anti-grasp indiscriminative hidden units - hidden units 3 (top) and 10
(bottom), show a very similar pattern of activity during the visible regular grasp and hidden
overshot grasp without dynamic remapping conditions and during the visible overshot grasp
and hidden overshot grasp with dynamic remapping conditions (dark gray line: visible regular
grasp, light gray line: visible overshot grasp, black line: hidden overshot grasp with dynamic
remapping, medium gray line: hidden overshot grasp without dynamic remapping).
2.4 Discussion
We have shown that the monkey grasp-related mirror system can be modeled as a
recurrent artificial neural network trained on the observation of self-generated grasps. The
addition of audio input, working memory, and dynamic remapping gives the network more
flexibility in action recognition, allowing it to correctly recognize actions when the final
component of the action is hidden or given only their characteristic sound if they have one.
The ability to recognize invisible actions may allow primates to effectively monitor the
actions and infer the intentions of their peers in crowded, partially occluded environments.
84
We chose the number of hidden units in the main network based on unpublished
simulation experiments in which we varied the hidden layer size from 5 to 20. Numbers of
hidden units above 20 provided no increase in performance. We finally settled on a value of
15 as a compromise between grasp classification performance and minimal hidden layer size
(to reduce overfitting during learning).
The response profiles of the hidden units of the trained network clustered into two
groups: those that responded similarly for each grasp type, and those whose activation level
discriminated between grasp types. These response profiles, which we have labeled
discriminative and indiscriminative are reminiscent of the "strictly-congruent" and "broadly-
congruent" mirror neurons described by Gallese et al. (1996). The fact that some hidden layer
units were tuned for particular grasps may relate to the discovery by Fogassi et al. (1998) of
mirror neurons in area 7b of the macaque inferior parietal cortex (the same brain region
represented by our network's hidden layer). Whereas the hidden units in our network have
some predictive capability for the currently observed action (some show differential
activation for various grasp types well before the grasp is completed), the firing of some
parietal mirror neurons described by Fogassi et al. predicted the action to be executed after
the observed action. Future extensions to this work might approach this data by the addition
of lateral connections between hidden layer units, with weights modified perhaps through
some variant of reinforcement learning.
It was interesting that the recurrent unit activity did not appear to be utilized in grasp
classification, but rather in grasp detection. These units showed similar patterns of activity in
3 trained instances of the network, each with different numbers of hidden units. This suggests
85
that this may be a general phenomenon and not a peculiarity of this particular set of trained
weights. We expected the state to be important in distinguishing various types of grasps, but
in hindsight it seems that this is not so surprising, since the types of actions we considered -
power, precision, and side grasps become more distinctive, rather than less, as their
trajectories unfold. Recognition of more complex actions with ambiguous later stages will
more likely require the use of recurrent connections.
2.4.1 Audio-Visual Mirror Neurons
We have shown that a recurrent neural network associating observed hand-object relation
trajectories with motor programs can incorporate signals from other modalities by Hebbian
association. This allows the monkey mirror neuron system to function as a multi-modal,
actor-invariant representation of action, rather than a simple associator of visual and motor
signals. It has been argued that Broca's area is the human homologue of area F5 in the
macaque (Rizzolatti and Arbib, 1998) and that human language arose from a gesture based
system which was later augmented with vocalization (Arbib, 2005). These multi-modal
mirror neurons may have allowed arbitrary vocalizations to become associated with
communicative gestures, facilitating the emergence of a speech-based language from a
system of manual gestures. If this is indeed the case, the development of audio-visual mirror
neurons may have implications for the recognition of communicative actions and ground the
multi-modality of language. Arbib (2005), Fogassi & Ferrari (2004) and MacNeilage &
Davis (2005) provide further analysis of mirror neurons in relation to the debate on whether
or not gesture (protosign) provided scaffolding for the evolution of (proto)speech.
86
2.4.2 Inferring Hidden Actions
The results of the MNS2 simulations show that the addition of a working memory with
dynamic remapping of its representations is sufficient to infer the result of an action whose
final distal component is hidden. In this model we have used external, hand-coded working
memory and dynamic remapping modules. However, previous modeling (Dominey and
Arbib, 1992; Fagg and Arbib, 1998) has shown that the same functionality exhibited by the
working memory module can be produced by thalamo-cortical and cortico-cortical loops and
that of dynamic remapping by shifting activation in a grid of neurons. While the inclusion of
these mechanisms would add to the biological realism of the current model, we feel that the
functionality of the core network would remain unchanged, and thus that our implementation
rests on valid simplifying assumptions.
87
Chapter 3 - Learning to Grasp and Extract Affordances
The notion of affordances as directly perceivable opportunities for action (Gibson, 1966)
has been used to interpret the activity of certain parietal neurons as encoding affordances for
grasping (Fagg and Arbib, 1998). While computational models of infant grasp learning
(Oztop et al., 2004) and affordance learning (Oztop et al., 2006a) have been developed that
work in a staged fashion, there do not exist any models that learn affordance extraction and
grasp motor programs simultaneously. This model follows from a suggestion of Arbib et al.
(2009) and implements a dual learning system that simultaneously learns both grasp
affordances and motor parameters for planning grasps using trial-and-error reinforcement
learning. As in the Infant Learning to Grasp Model (ILGM, Oztop et al., 2004) we model a
stage of infant development prior to the onset of sophisticated visual processing of hand-
object relations, but like the FARS model (Fagg and Arbib, 1998) we assume that certain
premotor neurons activate neural populations in primary motor cortex that synergistically
control different combinations of fingers. The task of the model is to learn a) “affordances”,
representations of object features that indicate paired surfaces whereby it can be grasped, and
b) motor parameters that can be used to successfully grasp objects based on these
representations.
Newborn infants aim their arm movements toward fixated objects (von Hofsten, 1982).
These early arm movements have been related to the development of object-directed reaching
(Bhat et al., 2005), leading to grasping (Bhat and Galloway, 2006), the development of which
continues throughout childhood (Kuhtz-Buschbeck et al., 1998). Grasping in development
seems to involve visual information increasingly in preprogramming the grasp (Ashmead et
88
al., 1993; Lasky, 1977; Lockman et al., 1984; Newell et al., 1993; von Hofsten and
Ronnqvist, 1988; Witherington, 2005). The present chapter introduces a new model ILGA
(Integrated Learning of Grasps and Affordances) which models the way in which affordance
extraction and grasp specification may be adapted simultaneously. It models the
developmental transition to hand preshape based on visual information (Schettino et al.,
2003; von Hofsten and Ronnqvist, 1988; Witherington, 2005) and utilizes the “virtual finger
hypothesis” for hand control during grasping. The virtual finger hypothesis states that
grasping involves the assignment of real fingers to so-called, virtual fingers (VFs) or force
applicators (Arbib, 1985). For example in a power grasp, one virtual finger might be the
thumb and the other might be the palm. The task of grasping is then to preshape the hand
according to the selected virtual fingers and the size of the object (Lyons, 1985) and bring the
opposition axis of the virtual fingers into alignment with the selected object surface
opposition axis (grasp affordance). Experimental evidence consistent with this hypothesis,
also known as hierarchical control of prehension synergies, has been found (Smeets and
Brenner, 2001; Winges, 2005; Zatsiorsky and Latash, 2004).
Since the distinction was made between the dorsal and ventral visual streams, the dorsal
stream has been further subdivided into the dorsal-medial and dorsal-ventral streams
(Rizzolatti and Matelli, 2003). It has been suggested that the dorsal-medial stream, involving
superior parietal and intraparietal regions and the dorsal premotor cortex, controls reaching
while the dorsal-ventral stream, including inferior parietal and intraparietal regions and the
ventral premotor cortex, controls grasping (Jeannerod et al., 1995; but see Mon-Williams and
McIntosh, 2000 for a contrary view; Wise et al., 1997). The main regions of the dorsal-
89
medial stream seem to include the lateral intraparietal area (LIP) and area V6a in the parietal
cortex, and area F2 in the dorsal premotor cortex. The spatial dimensions of potential targets
such as direction and distance are likely processed independently in parallel (Battaglia-Mayer
et al., 2003). In support of this idea, direction and distance reach errors dissociate (Gordon et
al., 1994; Soechting and Flanders, 1989) and distance information decays faster than
direction information in working memory (McIntyre et al., 1998). Our model therefore
dissociates the representation of direction and distance, but notes data suggesting that many
neurons are modulated by a combination of both variables (Fu et al., 1993; Messier and
Kalaska, 2000).
V6a
cIPS
Object
Location
Object Shape,
Size, Orientation
Signal Execution
F7
Reach Offset
Signal Execution
F5
Virtual Fingers /
Max Aperture
Signal Execution
F2 / F5
Wrist Rotation
Signal Execution
F2
Reach
Target Center
AIP SOS
AOS
Size
LIP
Object
Distance
F1 / Spinal
Cord
Motor
Control
S1
Grasp
Feedback
Reward Signal
Somato-
sensory
Feedback
Object Feature Extraction Affordance Extraction Reach Planning
Grasp Planning
Figure 3-1 An overview of the ILGA model. Connections modifiable by reinforcement
learning are shown in red. The parietal regions LIP and V6a provide the premotor region F2
with object position information to plan the reach. V6a and the cIPS populations project to
AIP, which projects to the signal-related populations of the other premotor regions. Each
premotor region selects a value for the parameter it encodes and projects to the primary motor
region F1 which controls the movement. Grasp feedback is returned to somatosensory area S1
which provides the reinforcement signal to the model and somatosensory feedback to F1.
90
Each execution-related premotor population additionally receives tonic inhibitory inhibit (not
shown) that is released when a go signal is detected.
3.1 Methods
The simulation environment was composed of 1) the Neural Simulation Language (NSL,
Weitzenfeld et al., 2002) simulator interfaced with the Open Dynamics Engine (ODE,
http://www.ode.org) for physics simulation and Java3d for visualization, 2) a new model of
the primate arm and hand, and 3) the implementation of the ILGA model.
3.1.1 Integrated Learning of Grasping and Affordances
In ILGA the AIP module receives basic object information such as location, size, shape,
and orientation in the form of population codes from the regions V6a/MIP and cIPS (Figure
3-1). Neurons in this module activate dynamic neural fields (DNFs; Amari, 1977; Erlhagen
and Schoner, 2002) in the premotor cortex that select parameters for grasping such as reach
offset, wrist rotation, grasp type, and maximum aperture. DNFs utilize cooperation and
competition between neurons depending on their preferred stimulus values. In their most
basic form, DNFs implement a winner-take-all (WTA) process, resulting in a population
code centered on the cell with the highest mean input. More generally, Amari & Arbib
(1977) show how a stereopsis model can enforce cooperation between a set of WTAs so that
the result encodes a surface. In ILGA, we use one-, two-, and three-dimensional DNFs as
WTA networks to select grasp parameters. Due to noise in each layer a random grasp plan
can be generated with a small probability. Reinforcement learning is used to modify the
connections to and from AIP, resulting in representations in AIP that combine features of the
object relevant for grasping it, and connection weights between AIP and premotor cortex that
91
result in selection of motor parameters appropriate for grasping it. Positive reinforcement is
given by the realization of a stable grasp of the target object and negative reinforcement is
given for grasps that do not contact the object or are unstable enough to allow the object to
slip from the hand. The result is that AIP neurons are shaped to provide “better” affordance
input for the premotor cortex, which in turn expands the repertoire of grasp actions providing
more data points for AIP learning. When this dual learning system stabilizes, the model is
endowed with a set of affordance extraction and robust grasp planning mechanisms.
3.1.2 Reinforcement
The ILGM model posited an intrinsic “joy of grasping” as the reward stimulus generated
from sensory feedback resulting from the stable grasp of an object. We use the same signal to
train the connection weights in this model using reinforcement learning (Sutton and Barto,
1998). However, this model uses a more realistic physics simulator than the ILGM model,
taking into account not only kinematics but also dynamics. This makes motor control a much
more difficult task, but simplifies grasp stability evaluation (see below). Another
consequence is that since the object can be moved, hand-object collision can knock the object
out of reach, making successful grasps much less likely to occur by chance during trial-and-
error learning. In order to increase the probability of successful grasps we pre-train the
connection weights that determine the direction of hand approach to the object and the wrist
orientation using a more basic reinforcement signal, what may be called the “joy of palm
contact”. After pre-training these connection weights the majority of attempted grasps make
at least transient palm contact with the object, increasing the number of stable grasps during
training.
92
The ILGM model used a kinematic simulator that did not handle dynamics and therefore
had to use a complex scheme that evaluated the final hand configuration to evaluate grasp
stability. Since we use a physics simulator that handles rigid-body dynamics including
friction, grasp stability evaluation is much simpler in this model than in ILGM. We ran all
simulations reported here with gravity turned off in order to simplify control of the arm and
hand. Even without gravity, the physics simulation will cause the object to slip from the
hand’s grasp if the grasp is unstable. The simulator informs the model of the list of contact
points between the hand and the object. If two contact points are achieved that are connected
by a vector that passes through the object, and these contact points are maintained for 2
seconds of simulation time, the grasp is declared successful. Note that none of the other
modeled regions have any notion of contact points – the grasp is planned and controlled in an
open-loop manner. However, contact point feedback could be used to learn internal models
for feedback-based grasp control (see Discussion).
3.1.3 Primary Motor Module – Reach and Grasp Generation
The Primary Motor module, decodes the motor parameters for the reach and grasp from
the activities of the premotor populations and directs the actual movement by setting the joint
angle targets of proportional-derivative (PD) controllers for each degree of freedom (DOF).
While this module is intended to correspond to primary motor area F1 and the spinal cord, it
should be noted that the grasp generation process is non-biological, and is intended simply to
achieve the planned grasp end state in order to evaluate the success of the chosen parameter
values in producing a stable grasp.
93
The wrist rotation, reach, and grasp components of the movement are handled by separate
controllers (Figure 3-2). Rather than model the coordination of the reach and grasp
components in detail, we couple them simply by starting the preshape phase of the grasp
once the reach target has been determined and triggering the enclose phase once the hand
reaches a certain distance from the object or achieves palm contact. Palm or inner thumb
contact is also used as a signal to stop the reach controller at the current wrist position. Each
controller receives input from premotor execution-related populations that is decoded using
the center-of-mass technique (Wu et al., 2002), transforms this input in some way along with
some proprioceptive or tactile feedback, and sets joint angle targets for PD controllers that
apply torque to each joint. The simplest controller, the wrist rotation controller, decodes the
target wrist angles from its premotor input and passes them to the wrist PD controllers.
The reach controller couples a trajectory planning mechanism (dynamic motor primitives,
DMPs) with an inverse arm kinematics module. Given a reach target location, the reach
planner uses DMPs (Schaal and Schweighofer, 2005) to generate a trajectory of desired wrist
locations to reach it starting from the current wrist position. DMPs can generate arbitrary
trajectories and dynamically adapt to new goals. They are defined by the following
differential equation:
( ) ( )
( )
( ) ( )
0
1
i i
i
i
i
u c x
v uK x u K g x Dv
u
ψ
ψ
+
= - + - - -
∑
∑
ɺ
where x is the current value of the controlled variable (the position of the wrist in this case),
x
0
is the initial value (the starting position of the end-effector), v is the current target velocity
of the variable, c
i
are equilibrium points of linear acceleration fields with nonlinear basis
94
functions ψ
i
, g is the goal value (the target reach position), K and D are gain and damping
parameters, and u is a phase variable which can be used to scale the duration of the
movement. DMPs therefore generate a trajectory from x
0
to g that can be straight or
parameterized to take any arbitrary path. In the reach module, the output of the DMP, ( )
ˆ x t ,
is used as an actual target position for the wrist that is input to the inverse kinematics
controller at each time step.
F2 execution-
related activity
Wrist Position
Dynamic
Motor
Primitives
Inverse Arm
Kinematics
Arm PD
Controllers
Arm Joint
Torques
Reach
Trajectory
Target Joint
Angles
Current
Joint Angles
Population
Decoder
Population
Decoder
Reach Offset
execution-
related activity
Coordinate
Conversion
Object Center
(shoulder-centered
coordinates)
Reach Offset
(object-centered
coordinates)
Wrist Error Vector
F2/F5 execution-
related activity
Population
Decoder
Wrist PD
Controllers
Wrist Joint
Torques
Target Joint
Angles
F5 execution-
related activity
Population
Decoder
Dynamic
Motor
Primitive
Dynamic
Motor
Primitive
Enclose
Trajectory
Preshape
Trajectory
VF Real
Finger
Mapping
VF Real
Finger
Mapping
Finger PD
Controllers
Target Joint
Angles
Virtual Finger
Combination
Maximal
Aperture
Finger Joint
Torques
Current
Joint Angles
Current
Joint Angles
Target Set
Preshape Trigger
Approach
Enclose Trigger
Object Contact
Enclose Trigger
Grasp Reach Wrist
Figure 3-2 The wrist, reach, and grasp motor controllers. Each uses population decoders to
decode reach and grasp parameter values from premotor inputs and set joint angle targets for
PD controllers which move the limbs by applying torque to the joints. The reach motor
controller combines the shoulder-centered object position, object-centered reach offset, and
current wrist position to compute a wrist error vector. The error vector is used to set goal
values for dynamic motor primitives, which generate a reach trajectory for the wrist. An
inverse arm kinematics module computes target joint angles for each target wrist position.
The grasp motor controller contains dynamic motor primitives for the preshape and enclose
phases that are triggered by reach and tactile events. These dynamic motor primitives generate
normalized trajectories for each virtual finger that are converted into target joint angles by
VF→real finger mapping modules.
95
Given a desired wrist location, the inverse arm kinematics controller computes the
required wrist displacement and then uses the pseudo-inverse of the Jacobian to compute the
required joint rotations. The body's Jacobian matrix describes how changes in shoulder (θ
1
,
θ
2
, θ
3
) and elbow (θ
4
) angles result in changes in the wrist's 3d position (x, y, z):
1
2
3
4
x
y
z
θ
θ
θ
θ
=
J
ɺ
ɺ
ɺ
ɺ
ɺ
ɺ
ɺ
The inverse of the Jacobian matrix then describes how much each joint must rotate in
order to effect a desired wrist displacement. In general the Jacobian is not invertible, so we
use the pseudo-inverse:
( )
1
T T
- +
= J J JJ
Each required joint rotation is used to input the target joint angle into the PD controller for
that DOF:
ˆ
x
y
z
+
= +
θ θ J
ɺ
ɺ
ɺ
where x ɺ , y ɺ , z ɺ describes the desired wrist displacement.
The grasp controller similarly uses DMPs to control the timing of the preshape and
enclosure phases, but here the DMP output is interpreted as a normalized timing signal,
rather than a physical target value. This is accomplished by setting the input to the DMP at 0
and the target to 1, and using its output at each time step to interpolate between the current
finger joint angles and the final target angles in order to generate targets for the PD
96
controllers. The preshape DMP is triggered as soon as a reach target is set, while the enclose
DMP is triggered once the wrist reaches a certain threshold distance from the object, κ, or
once the palm contacts the object, whichever happens first. Depending on the output of the
preshape and enclose DMPs, the controller translates the selected virtual finger combination
and maximum aperture into final target joint angles for each finger. Each virtual finger
combination is associated with a preshape hand configuration with certain finger angles that
can be modulated by the maximum aperture parameter, and a set of fingers to control during
the enclose phase. The possible virtual finger combinations define the following grasps:
precision pinch (index finger and thumb extended then enclosed), tripod grasp (index and
middle fingers and thumb extended then enclosed), whole hand prehension (all fingers and
thumb extended then enclosed), and side grasps (all fingers enclosed and thumb extended
then enclosed).
3.1.4 Parietal Module – Object Feature / Affordance Extraction
The populations in the Object Feature / Affordance Extraction module are based on
findings from a series of primate single-unit recording studies (Galletti et al., 2003; Sakata et
al., 1998) in parietal cortex. These experiments found that neurons in the anterior
intraparietal sulcus (AIP) are responsive to 3D features of objects relevant for manipulation
(Murata et al., 2000). Neurons in the caudal intraparietal sulcus (cIPS) are selective to objects
and their surfaces at preferred orientations. Subsets of these neurons have been described as
axis-orientation-selective and surface-orientation-selective (Taira et al., 1990). It has been
suggested that the region V6a is involved in encoding the direction of movement required to
bring the arm to potential reach targets (Galletti et al., 2003; Rizzolatti et al., 1998). The
97
lateral bank of the intraparietal sulcus contains an area known as the lateral intraparietal area
(LIP) that is typically associated with eye movements, but may additionally integrate retinal
and extraretinal information to encode the egocentric target distance in three-dimensional
space (Genovesio and Ferraina, 2004). Both cIPS and V6a project to AIP (Nakamura et al.,
2001; Shipp et al., 1998). These studies suggest that areas cIPS, LIP and V6a extract object
features and location and that V6a and cIPS project this information to AIP for grasp
affordance extraction (Sakata et al., 1998).
φs
θs
ρs
ox
oz
oy
sx
sz
sy
φs
θs ρs
ox
oz
oy
sx
sx
sy
n1
n2
n3
Figure 3-3 The object primitive variables represented in each parietal region for the handle
(top row) and head (bottom row) of a hammer. The area V6A represents the shoulder-centered
direction of the center of the primitive, φs and θs, LIP represents the shoulder-centered
distance, ρs, and cIPS represents the object primitive’s orientation, ox, oy, oz, size, sx, sy, sz,
and orientation of surface normal vectors (n1, n2, and n3 in this case).
The neurophysiological experiments upon which the model parietal regions are based
used simple object primitives (cube, cylinder, sphere, etc.) as stimuli. However more
98
complex objects such as hammers or coffee cups contain multiple affordances. A model of
the ventral visual stream known as geon theory has been proposed that recognizes complex
objects in terms of object primitives, or geons (Biederman, 1987). Geon theory suggests that
three-dimensional objects are decomposed into object primitives and recognized by
analyzing their relative position and size. However, in geon theory the metric details of each
primitive are not represented. We suggest that the dorsal visual stream similarly analyzes
objects in terms of a set of primitives, but accurately represents the metrics of each primitive
(Figure 3-3).
3.1.4.a LIP
The model region LIP represents the shoulder-centered distance to each object primitive
as a population code and provides this information to the premotor module F2 for
programming the reach. The area LIP shows reliable and robust responses to visual
stimulation (Andersen et al., 1985; Colby and Duhamel, 1996). The major sources of input to
LIP come from the extrastriate visual cortex (Blatt et al., 1990; Bullier et al., 1996; Colby et
al., 1988; Felleman and Van Essen, 1991; Seltzer and Pandya, 1986). It projects to the
premotor cortex (Cavada and Goldman-Rakic, 1989), including the dorsal premotor cortex
(Tanne-Gariepy et al., 2002). Stimulus depth is largely indicated by disparity signals and
accommodative cues which modulate activity in area LIP (Ferraina et al., 2002; Gnadt and
Beyer, 1998). More directly, it has been shown that LIP neurons encode three-dimensional
distance in an egocentric reference frame (Gnadt and Mays, 1995). Individual cells have
broad response profiles centered on their preferred depth, and the region is thus capable of
providing a population code of egocentric target distance.
99
We model LIP as a one-dimensional population code, with each unit having a preferred
distance,
ˆ
s
d , uniformly distributed between 0 and 3m. The activity of each LIP unit, i, at
time t is given by a Gaussian population code over the unit’s preferred distance and the actual
shoulder-centered distance, d
s
, to each object primitive, p:
( )
( ) ( ) ( )
2
2
ˆ
,
2
,
s s
LIP
d i d p t
LIP
p
i t e
σ
ε
- -
= +
∑
LIP
where σ
LIP
is the population code width and ε
LIP
is a noise term.
3.1.4.b V6a
The model region V6a represents the shoulder-centered direction of each object primitive
in spherical coordinates as a two-dimensional population code. The module provides this
information to parietal area AIP for affordance extraction and premotor area F2 for
programming the reach. Area V6a contains mostly visual cells (Galletti et al., 1997;
Rizzolatti et al., 1998) that are modulated by somatosensory stimulation (Breveglieri et al.,
2002; Fattori et al., 2005). The visual receptive fields of cells in V6a cover the whole visual
field and represent each portion of it multiple times (Galletti et al., 1993; Galletti et al.,
1999). So-called real-position cells are able to encode spatial location of objects in the visual
scene with visual receptive fields that remain anchored despite eye movements (Galletti et
al., 1993). Intermingled with real-position cells are retinotopic cells, whose visual receptive
fields shift with gaze, suggesting that the region is involved in converting coordinates from
retinotopic to head- or body-centered reference frames (Galletti et al., 1993). Many neurons
in V6a only respond when the arm is directed toward a particular region of space (Fattori et
100
al., 2005; Galletti et al., 1997). Lesions of the region result in misreaching with the
contralateral arm (Battaglini et al., 2002). It has been thus been suggested that the region is
involved in encoding the direction of movement in at least head-centered coordinates,
required to bring the arm to potential reach targets (Galletti et al., 2003; Rizzolatti et al.,
1998). Area V6a receives input from central and peripheral visual field representations in V6
(Shipp et al., 1998) and projects to the premotor area F2 (Luppino et al., 2005; Matelli et al.,
1998; Shipp et al., 1998).
We model the region as a two -dimensional population code with each unit selective for
both particular azimuth and elevation values of the shoulder-centered object direction. Each
unit of the population, i, j, has preferred angles,
ˆ
s
θ , ˆ
s
φ , with
ˆ
s
θ uniformly distributed
between 0 and π, and ˆ
s
φ uniformly distributed between -π and 0. The activity of each unit is
given by:
( )
( ) ( ) ( ) ( ) ( ) ( )
2
2
2
ˆ
ˆ , , , ,
2
6
, ,
s s s s
V6A
i j p t i j p t
V A
p
i j t e
θ θ φ φ
σ
ε
- + - -
= +
∑
V6A
where θ
s
(p, t) is the azimuth angle and φ
s
(p, t) is the elevation angle of object primitive p at
time t in a shoulder-centered reference frame, σ
V6A
is the population code width, and ε
V6A
is a
noise term.
3.1.4.c cIPS
The model region cIPS contains three populations that represent the object orientation,
size, and visible surface normal vectors as three-dimensional population codes. The caudal
intraparietal sulcus (cIPS) is a region located the caudal part of the lateral bank and fundus of
101
the intraparietal sulcus (Shikata et al., 1996). It was originally referred to as the posterior
intraparietal area (PIP) by Colby et al., (1988), and probably overlaps the lateral occipital
parietal area (LOP) of Lewis and Van Essen, (2000). The cIPS receives input mainly from
V3a (Figure 3-1), whose neurons are sensitive to binocular disparity and have small,
retinotopic receptive fields (Sakata et al., 2005), and projects primarily to the anterior
intraparietal sulcus (AIP) (Nakamura et al., 2001). The cIPS projects mainly to AIP (Borra et
al., 2007; Nakamura et al., 2001), and the projections from V3a neurons terminate in the
vicinity of the cIPS neurons that project to AIP (Nakamura et al., 2001).
Neurons in area cIPS have large receptive fields (10-30 degrees in diameter) with no
retinotopic organization (Tsutsui et al., 2005). Two functional classes of neurons in area cIPS
have been described: surface orientation selective (SOS) neurons that are selective to the
orientation of flat surfaces, and axis orientation selective (AOS) neurons that respond best to
an elongated object whose principal axis is oriented in a particular direction. Both types of
neurons respond best to binocular stimuli (Sakata et al., 1997) and are spatially intermingled
(Nakamura et al., 2001). Muscimol induced inactivation of this region disrupts performance
on a delayed match-to-sample task with oriented surfaces using perspective and disparity
cues (Tsutsui et al., 2001; Tsutsui et al., 2005). Both types of cells include some neurons that
are selective for the object’s dimensions (Kusunoki et al., 1993; Sakata et al., 1998). Again,
these neurons have only been tested with simple objects. We suggest that they actually
encode the features of object primitives that comprise complex objects. To simplify the
model, we include AOS and SOS cells as well as one population that encodes the size of each
102
object primitive. This size population, S, represents the size of an object primitive p in each
dimension (s
x
, s
y
, s
z
), with it the activity of each unit, i, j, k, given by:
( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
2 2 2
2
ˆ ˆ ˆ , , , , , , , , ,
2
, , ,
x x y y z z
CIPS
s i j k s p t s i j k s p t s i j k s p t
CIPS
p
i j k t e
σ
ε
- + - + - -
= +
∑
S
where σ
CIPS
is the population code width, and ε
CIPS
is a noise term.
Axis orientation selective (AOS) cells prefer bars tilted in the vertical, horizontal, or
saggital planes (Sakata et al., 1998; Sakata et al., 1999). Some are selective for shape
(rectangular versus cylindrical), and probably represent surface curvature (Sakata et al.,
2005). Their discharge rate increases monotonically with object length and their width
response curve is monotonically decreasing in the 2-32cm range. It is thought that these cells
integrate orientation and width disparity cues to represent principal axis orientation (Sakata et
al., 1998).
We model AOS cells as two subpopulations – one selective for rectangular and one for
cylindrical objects. Each subpopulation is a three-dimensional population code, with each
neuron i, j, k, selective for a combination of the components of the object’s main axis
orientation, o
x
, o
y
, and o
z
. When the object is cylindrical, the activity of each unit in the
cylindrical AOS population is given by:
( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
2 2 2
2
ˆ ˆ ˆ , , , , , , , , ,
2
, , ,
x x y y z z
CIPS
o i j k o p t o i j k o p t o i j k o p t
CIPS
p
i j k t e
σ
ε
- + - + - -
= +
∑
CYL
where ( )
ˆ , ,
x
o i j k , ( )
ˆ , ,
y
o i j k , and ( )
ˆ , ,
z
o i j k are the preferred orientations of the unit i, j, k in
the x, y, and z dimensions. When the object is not cylindrical, each unit’s activity in the
103
cylindrical AOS population is given by the noise term. Given a rectangular object, the
activity of each unit in the rectangular AOS population is defined by:
( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
2 2 2
2
ˆ ˆ ˆ , , , , , , , , ,
2
, , ,
x x y y z z
CIPS
o i j k o p t o i j k o p t o i j k o p t
CIPS
p
i j k t e
σ
ε
- + - + - -
= +
∑
RECT
Similarly the noise term determines the activation of these units when the object is not
rectangular.
Surface orientation selective (SOS) cells are tuned to the surface orientation in depth of
flat and broad objects (Sakata et al., 1997; Sakata et al., 1998; Sakata et al., 1999; Shikata et
al., 1996). These cells respond to a combination of monocular and binocular depth cues
(texture and disparity gradient cues) in representing surface orientation (Sakata et al., 2005).
Neurons sensitive to multiple depth cues are widely distributed and spatially intermingled
with those sensitive to only one depth cue (Tsutsui et al., 2005). We model the SOS
population as a noisy three-dimensional Gaussian population code over the normal vector (n
f
,
n
f
, n
f
) of each visible surface, f, of a rectangular object primitive, p:
( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
2 2 2
2
ˆ ˆ ˆ , , , , , , , , , , , ,
2
, , ,
x f y f z f
CIPS
n i j k p x t n i j k p y t n i j k p z t
CIPS
p f
i j k t e
σ
ε
- + - + - -
= +
∑ ∑
n n n
SOS
Each unit has a preferred value in each dimension ( ˆ
x
n , ˆ
y
n , ˆ
z
n ) uniformly distributed between
-1 and 1. If the object is not rectangular, the activity of each unit in these populations is
determined by the noise term.
104
3.1.4.d AIP
The anterior intraparietal area AIP is located on the lateral bank of the anterior
intraparietal sulcus and contains visually response neurons selective for 3D features of
objects, motor dominant neurons that only respond during grasping, and visuomotor neurons
that are activated by grasping and modulated by sight of the object (Sakata et al., 1998). The
region receives its main input from area cIPS (Nakamura et al., 2001), but also receives input
from V6a (Shipp et al., 1998) and projects most strongly to the premotor region F5 (Borra et
al., 2007).
We model AIP as a self-organizing feature map (SOFM) with its learning rate modulated
by a global reinforcement signal. SOFMs use unsupervised learning to map a high-
dimensional vector space onto a lower-dimensional space. The resulting map preserves
topological relationships between vectors (i.e. similar vectors in the high dimensional input
space are mapped onto nearby vectors in the low dimensional output space) and identifies
dimensions in the input space with the highest variance. They are similar to other
dimensionality reduction methods such as multi-dimensional scaling (MDS) and principal
components analysis (PCA; Yin, 2008) and it has been argued that a similar mechanism
organizes representations of high dimensional spaces in the cerebral cortex (Durbin and
Mitchison, 1990). Our addition of modulation by a reinforcement signal causes the network
to preferentially represent input vectors that occur before receiving a positive reward signal, a
method comparable to the reinforcement driven dimensionality reduction model (Bar-Gad et
al., 2003). The result of training is a network that can extract combinations of object features
that afford successful grasps (affordances), and can generalize to objects it has never seen
105
before but nonetheless elicit activation patterns overlapping those generated by objects in the
training set.
The AIP module receives input from each population of V6a and cIPS. While the
distance to the object is important for parameterizing the reach as well as the coordination of
the reach and grasp movements, it is not important for specifying the grasp itself, and
therefore we do not model a connection between LIP and AIP. The SOFM is a toroidal grid
of 40×40 units, with the input vector, I, constructed by concatenating the activity vectors of
each V6a and cIPS population into one vector, which is then normalized. The activity of each
AIP unit with indices i and j is given by:
( ) ( ) ( ) , , , ,
AIP AIP
i j t i j t t ε = + AIP W I
The weights W
AIP
are initialized to small random values. Weight training uses a form of
competitive learning. Given the input vector, I, the AIP unit with the most similar weight
vector is determined as the best matching unit (BMU). The similarity metric we used was the
Euclidean distance between the vectors. The weights of the BMU and all units within its
“neighborhood” are adjusted in the direction of the input vector:
( ) ( ) ( ) ( ) ( ) ( ) ( )
, , 1 , , , , , , ,
AIP AIP AIP
i j t i j t i j T t T t i j t α + = + Θ - W W I W
where Θ is the reinforcement-dependent neighborhood function, T is the current training
epoch, and α is the reinforcement-dependent learning rate. The neighborhood function is a
Gaussian function over the Euclidean distance, β, between the indices of the neuron i, j and
those of the BMU. The Gaussian is truncated at a certain radius, r, that also defines its
spread:
106
( )
( )
( )
( )
2
2
0
2
, ,
0
r T
e if r T i j t
otherwise
β
β
- -
< Θ =
The radius of the neighborhood function shrinks over the duration of training, but is
expanded by the global reinforcement signal, rs(t):
( ) ( )
0
,
T
r t T r e rs t
λ
- = +
where r
0
is the initial radius and the parameter λ determines the rate at which the radius
decreases. The learning rate also decreases over the duration of training:
( ) ( )
0
,
T
t T e rs t
λ
α α
- = +
where α
0
is the initial learning rate. At the beginning of training the neighborhood is broad
and the learning rate is high. This causes the self-organization to take place on the global
scale. As training progresses and the learning rate decreases, the weights converge to local
estimate of the training input vectors. The modulation of the learning rate and neighborhood
function by the global reinforcement signal ensures that input vectors that are used to plan
stable grasps become represented by more units in the SOFM at the expense of input vectors
that result in failed grasps.
3.1.5 Premotor Module – Grasp Planning
The Premotor module contains several subpopulations based on various regions of dorsal
and ventral premotor cortex including F2, F5, and F7. Each of these populations selects grasp
motor parameters based on input from AIP, LIP, and V6a. While studies of parietal
representation of object location test the encoding of the center of a visual target, reach-to-
107
grasp movements direct the wrist to some region offset from this point so that the hand may
contact the object’s affordances appropriately. The ILGM model suggested that a portion of
the premotor cortex encodes an object-centered representation of the wrist offset. This vector
was combined with a shoulder-centered representation of the object center in order to
compute a reach target. Along with affordance information, the selected offset and grasp type
influence the selection of the wrist orientation.
In ILGA, areas F2 and F7 are mainly involved with specifying the reach, with F2
selecting the center of the object primitive to reach to and area F7 selecting an object-
centered offset from that center for the reach target (Figure 3-4). Area F5 selects the grasp
type and maximal aperture, and F2/F5 select the wrist orientation. All the motor parameters
handled by the premotor cortex modules in both ILGM and ILGA are kinematics parameters
– kinetics is completely ignored by these modules and handled entirely by the primary motor
module.
Although there is evidence supporting views of the primary motor cortex in both
kinematic and kinetic encoding (Kalaska, 2009), it has been shown that the evidence in favor
of kinematic encoding could be an epiphenomenon of multidimensional muscle force coding
(Todorov, 2000). Lesions of the premotor cortex result in deficits in movement kinematics
(Freund, 1990; Gallese et al., 1994). Neurons in dorsal and ventral premotor cortex have been
found that correlate with movement dynamics variables (Xiao et al., 2006), but here we make
the simplifying assumption that the premotor cortex specifies movement kinematic variables
which are translated into muscle forces by the primary motor cortex and spinal cord.
108
φs
θs
ρs
φo
θo
ρo
max
aperture
wrx
wrz
wry
VF1
VF2
Figure 3-4 The reach and grasp parameters encoded by the premotor cortex. The blue circle
denotes the planned reach offset point. Area F2 encodes the shoulder-centered object position,
φ
s
, θ
s
, ρ
s
, F7 encodes the object-centered reach offset, φ
o
, θ
o
, ρ
o
, F5 encodes the VF
combination and maximum aperture used for the grasp, and F2/F5 encodes the wrist
orientation wr
x
, wr
y
, wr
z
.
Each region in the Premotor module contains one population of signal-related cells and
one population of execution-related cells. Execution-related cells discharge on movement
onset while signal-related cells show anticipatory activity prior to the start of movement.
These broad categories of cells have been found in several premotor areas (Cisek and
Kalaska, 2002; Kurata, 1994; Wise et al., 1997). In this model each signal- and execution-
related population is simulated as a DNF. Signal-related cells receive external input and
project topologically to execution-related cells with hard-wired, fixed connections.
Execution-related cells additionally receive tonic inhibition which is released when the go
signal is detected, ensuring that the movement does not begin until the signal is observed.
The tonic inhibitory input to each execution-related population, GP, was set to 10 before the
109
go signal was detected and 0 once it appeared. Therefore signal-related cells in ILGA plan
the movement, while the activation of execution-related cells triggers its onset.
Reinforcement learning is applied to the afferent connection weights of the signal-related
cells, using the activity of the corresponding execution-related population as an eligibility
trace. The basal ganglia is typically implicated in disinhibition of planned movements
(Kropotov and Etlinger, 1999) and reinforcement learning (Barto, 1995). While models of
the basal ganglia exist that could provide the tonic inhibition and reinforcement signals in
ILGA (Gurney et al., 2001), we simply provide these inputs procedurally. Eligibility traces
are commonly used in reinforcement learning in order to assign credit to the appropriate
connection weight for delayed reward (Singh and Sutton, 1996). This is typically a decaying
copy of the activated neurons, but since the delay between signal-related cell activity and the
achievement of a stable grasp that elicits a reward can be quite long, we use the activity of
corresponding execution-related cells as the eligibility trace.
3.1.5.a F2
Within the premotor cortex, the caudal portion F2 most likely codes reach movements in
a shoulder-centered reference frame (Caminiti et al., 1991; Cisek and Kalaska, 2002;
Rizzolatti et al., 1998). Many of the cells in F2 have broad directional tuning and their
population activity appears to encode a vector representing the direction of arm movement
and not the position of the end target (Caminiti et al., 1991; Weinrich and Wise, 1982). The
region was first defined by Matelli et al., (1985), and was later subdivided in to the F2 dimple
(F2d) and ventrorostral (F2vr) subregions (Matelli et al., 1998). Area F2 is located just rostral
to the leg and arm representation in area F1 and extends rostrally 2-3mm in front of the genu
110
of the arcuate sulcus and laterally to the spur of the arcuate sulcus (Fogassi et al., 1999). It
contains an arm field lateral to the superior precentral dimple (Dum and Strick, 1991;
Godschalk et al., 1995; He et al., 1993; Kurata, 1989).
The region contains a rostro-caudal gradient of cell types with signal-related cells found
predominantly in F2vr and execution-related cells located in F2d, the caudal portion adjacent
to F1 (Johnson et al., 1996; Tanne et al., 1995). Signal-related cells are 43% of F2 neurons
and respond to the visual target for reaching (Weinrich and Wise, 1982). Execution-related
cells have changes in activity that are synchronized with the onset of movement (Weinrich
and Wise, 1982). Some execution-related cells are only active after the Go signal and these
are more common caudally (Crammond and Kalaska, 2000). This categorization of cells
seems to correspond to a similar modality-based classification used by Fogassi et al. (1999)
and Raos et al. (2004) which describes cells as purely motor, visually modulated, or
visuomotor. Purely motor cells are not affected by object presentation or visual feedback of
the hand, visually modulated cells discharge differentially when reaching in the light vs.
dark, and visuomotor cells discharge during object fixation without movement. Most of
visually modulated or visuomotor cells are in F2vr (Fogassi et al., 1999), and therefore likely
correspond to the signal-related cells described by Cisek & Kalaska (2002). Our model thus
subdivides F2 into rostral and caudal regions (F2vr and F2d, respectively) and simplifies the
distribution of cell types by confining signal-related cells to the rostral region and execution-
related cells to the caudal region.
Most cells in F2 are sensitive to amplitude and direction, with very few cells sensitive to
only amplitude (Fu et al., 1993; Messier and Kalaska, 2000). However, muscimol
111
inactivation caused increases in directional errors when conditional cues are presented, but
amplitude and velocity were unchanged (Kurata, 1994). Neurons in the dorsal premotor
cortex have more recently been shown to encode the relative position of the eye, hand, and
goal (Pesaran et al., 2006), but we do not vary the eye position in these simulations and this
influence is therefore constant. We thus decode the output of F2d as a population code with
each cell having a preferred spherical coordinate in a shoulder-centered reference frame.
Note that the issue of which reference frame is used in the reach circuit is still debated
Visual inputs to area F2 come mainly from the superior parietal lobe (Caminiti et al.,
1996; Johnson et al., 1993). The subregion F2vr receives projections from area V6A (part of
the parietal-occipital area, PO, Shipp and Zeki, 1995), the medial intraparietal area MIP
(Marconi et al., 2001; Matelli et al., 1998; Shipp et al., 1998), and the lateral intraparietal
area LIP (Tanne-Gariepy et al., 2002). The main output of F2 projects to F1 (Dum and Strick,
2005).
We model the F2vr region as two DNFs encoding the shoulder-centered direction and
distance of the target object in spherical coordinates (F2vrDIR - two-dimensional direction,
F2vrRAD - radius). The input to each DNF is given by:
( ) ( )
F2vrDIR V6A F2 F2
t t
→
= + IN V6A W ε
( ) ( )
F2vrRAD LIP F2 F2
t t
→
= + IN LIP W ε
where the matrices W
V6A→F2
, and W
LIP→F2
define the weights of the projections from V6a to
F2, and LIP to F2, respectively. Since we assume that reaching ability has already developed,
these weights are not subject to learning and are set according to the following rule:
112
( ) , 3 i j = W I
where I is the identity matrix. This results in F2vr faithfully selecting the center of the object
(as signaled by V6a and LIP) as the position from which to calculate the final target for the
wrist using the object-centered reach offset. The F2d region is similarly modeled as two
DNFs that each receive excitatory input from F2vr and tonic inhibitory input, GP:
( ) ( ) ( )
F2dDIR F2 F2 F2
t t t
→
= + + IN F2vrDIR W GP ε
( ) ( ) ( )
F2dRAD F2 F2 F2
t t t
→
= + + IN F2vrRAD W GP ε
The weight matrix W
F2→F2
and other weight matrices between signal- and execution related
premotor populations were not subject to learning and were set as follows:
( ) , 2 i j = W I
3.1.5.b F7
While there does not appear to be direct evidence for a population of premotor neurons
encoding an object-centered reach offset, there is some suggestion that such a representation
does exist and may be located in the dorsal premotor cortex. The rostral portion of the dorsal
premotor cortex, area F7 (approximately equal to PMdr, Wise et al., 1997), can be separated
into the dorsorostral supplementary eye field (SEF) and a lesser-known ventral region. The
SEF is known to contain neurons which encode space in an object-centered reference frame
(Olson and Gettner, 1995), but the region is implicated in control of eye movements. While
the properties of ventral F7 are not well-known, it does contain neurons related to arm
movements (Fujii et al., 1996; Fujii et al., 2002), receives the same thalamic input as the arm
region of F6, and receives input from the same region of the superior temporal sulcus that
113
projects to F2vr (Luppino et al., 2001). The ventral portion of F7 may therefore be a likely
candidate for the location of population of neurons encoding reach targets in an object-
centered frame of reference.
We model F7 as a signal- and execution related population. The signal-related population
consists of two DNFs encoding the object-centered reach offset in spherical coordinates
(F7sDIR - azimuth and elevation, F7sRAD - radius).
The input to each signal-related DNF is given by:
( ) ( ) ( ) ( )
F7sDIR AIP F7DIR F2 F7 F7
t t t t
→ →
= + + IN AIP W F2vrDIR W ε
( ) ( ) ( )
F7sRAD AIP F7RAD F7
t t t
→
= + IN AIP W ε
The execution-related population also contains two DNFs, each corresponding to one DNF in
the signal-related population. The input to each execution-related DNF is given by:
( ) ( ) ( ) ( )
F7eDIR F7 F7 F7
t t t t
→
= + + IN F7sDIR W GP ε
( ) ( ) ( ) ( )
F7eRAD F7 F7 F7
t t t t
→
= + + IN F7sRAD W GP ε
The W
F7→F7
connection weights were set just as the W
F2→F2
weights. The connection
weights W
AIP→F7DIR
and W
AIP→F7RAD
were initialized to small random values and subject to
learning using a variant of the REINFORCE rule (Sutton and Barto, 1998) which is Hebbian
for positive reward values and anti-Hebbian for negative ones:
( ) ( )
( ) ( ) ( ) ( )
, , , , 1 , , , ,
, , , ,
AIP F7DIR AIP F7DIR
F7
a b i j t a b i j t
rs t a b t i j t α
→ →
+ = + W W
AIP F7eDIR
( ) ( )
( ) ( ) ( ) ( )
, , , 1 , , ,
, , ,
AIP F7RAD AIP F7RAD
F7
a b i t a b i t
rs t a b t i t α
→ →
+ = + W W
AIP F7eRAD
114
( ) ( )
( ) ( ) ( ) ( )
, , , , 1 , , , ,
, , , ,
F2 F7 F2 F7
F7
a b i j t a b i j t
rs t a b t i j t α
→ →
+ = + W W
F2dDIR F7eDIR
The outputs of the execution-related populations are used as the eligibility traces since in
general, the object may not be visible at the end of the grasp and signal-related cells may not
be active anymore.
3.1.5.c F5
Many neurons in premotor area F5 fire in association with specific types of manual
action, such as precision grip, finger prehension, and whole hand prehension (Rizzolatti et
al., 1988) as well as tearing and holding. Some neurons in F5 discharge only during the last
part of grasping; others start to fire during the phase in which the hand opens and continue to
discharge during the phase when the hand closes; finally a few discharge prevalently in the
phase in which the hand opens. Grasping appears, therefore, to be coded by the joint activity
of populations of neurons, each controlling different phases of the motor act. Raos et al.
(2006) found that F5 neurons selective for both grip type and wrist orientation maintained
this selectivity when grasping in the dark. Simultaneous recording from F5 and F1 showed
that F5 neurons were selective for grasp type and phase, while an F1 neuron might be active
for different phases of different grasps (Umilta et al., 2007). This suggests that F5 neurons
encode a high-level representation of the grasp motor schema while F1 neurons (or, at least,
some of them) encode the component movements or components of a population code for
muscle activity of each grasp phase.
We model F5 as a signal- and execution related population, each containing a one-
dimensional DNF for each VF combination with neurons in each DNF selective for
115
maximum grasp aperture. In this module, in addition to the WTA dynamic within DNFs,
every unit in a DNF laterally inhibits every other unit in the other DNFs, so that inter-DNF
competition selects a VF combination, while intra-DNF competition selects a maximum
aperture. The possible VF combinations are index finger pad-thumb pad (precision grasp),
index+middle finger pads-thumb pad (tripod grasp), inner fingers-palm (power grasp), and
thumb pad-side of index finger (side grasp). The maximal aperture is encoded as a
normalized value from 0 to 1 that is transformed into target finger joint angles by the grasp
motor controller. The inputs to each signal-related population, F5sPREC, F5sTRI,
F5sPOW, F5sSIDE for the precision, tripod, power, and side grasps, respectively, are given
by:
( ) ( ) ( )
( ) ( ) ( ) ( )
, , ,
F5sPREC AIP F5PREC
F5s F5s F5
i
t t t
i t i t i t
→
→
= - + + +
∑
IN AIP W
W F5sTRI F5sPOW F5sSIDE ε
( ) ( ) ( )
( ) ( ) ( ) ( )
, , ,
F5sTRI AIP F5TRI
F5s F5s F5
i
t t t
i t i t i t
→
→
= - + + +
∑
IN AIP W
W F5sPREC F5sPOW F5sSIDE ε
( ) ( ) ( )
( ) ( ) ( ) ( )
, , ,
F5sPOW AIP F5POW
F5s F5s F5
i
t t t
i t i t i t
→
→
= - + + +
∑
IN AIP W
W F5sPREC F5sTRI F5sSIDE ε
( ) ( ) ( )
( ) ( ) ( ) ( )
, , ,
F5sSIDE AIP F5SIDE
F5s F5s F5
i
t t t
i t i t i t
→
→
= - + + +
∑
IN AIP W
W F5sPREC F5sTRI F5sPOW ε
where W
F5s→F5s
is the inhibitory connection weight between DNFs, set to .25 in these
simulations. The inputs to the execution-related populations, F5ePREC, F5eTRI, F5ePOW,
F5eSIDE, are given by:
116
( ) ( ) ( ) ( )
F5ePREC F5s F5e F5
t t t t
→
= + + IN F5sPREC W GP ε
( ) ( ) ( ) ( )
F5eTRI F5s F5e F5
t t t t
→
= + + IN F5sTRI W GP ε
( ) ( ) ( ) ( )
F5ePOW F5s F5e F5
t t t t
→
= + + IN F5sPOW W GP ε
( ) ( ) ( ) ( )
F5eSIDE F5s F5e F5
t t t t
→
= + + IN F5sSIDE W GP ε
The W
F5s→F5e
weights were set just as the W
F2→F2
and W
F7→F7
weights, and the connection
weights W
AIP→F5PREC
, W
AIP→F5TRI
, W
AIP→F5POW
, and W
AIP→F5SIDE
were initialized to small
random values and subject to learning using the following rule:
( ) ( )
( ) ( ) ( ) ( )
, , , 1 , , ,
, , ,
AIP F5PREC AIP F5PREC
F5
a b i t a b i t
rs t a b t i t α
→ →
+ = + W W
AIP F5ePREC
( ) ( )
( ) ( ) ( ) ( )
, , , 1 , , ,
, , ,
AIP F5TRI AIP F5TRI
F5
a b i t a b i t
rs t a b t i t α
→ →
+ = + W W
AIP F5eTRI
( ) ( )
( ) ( ) ( ) ( )
, , , 1 , , ,
, , ,
AIP F5POW AIP F5POW
F5
a b i t a b i t
rs t a b t i t α
→ →
+ = + W W
AIP F5ePOW
( ) ( )
( ) ( ) ( ) ( )
, , , 1 , , ,
, , ,
AIP F5SIDE AIP F5SIDE
F5
a b i t a b i t
rs t a b t i t α
→ →
+ = + W W
AIP F5eSIDE
3.1.5.d Wrist Rotation
Infants starting at 7 months old begin to pre-orient their hands to match an object’s
affordances when reaching for that object (Witherington, 2005). By 9 months old, infants are
skilled at hand pre-orientation and adjustment and increase reach and grasp efficiency
(Morrongiello and Rocca, 1989). Neurons have been described in area F2 that become active
in relation to specific orientations of visual stimuli and to corresponding hand-wrist
117
movements (Raos et al., 2004). That same paper showed that 66% of grasp neurons in F2
were highly selective for grasp type and that 72% were highly selective for wrist orientation.
In addition to reach target selection, the dorsal premotor cortex is implicated in wrist
movements (Kurata, 1993; Riehle and Requin, 1989). Raos et al. (2006) show that F5
neurons combine selectivity for grip type and wrist orientation, and that 21 out of the 38 they
tested for wrist orientation selectivity showed high selectivity for a particular orientation. The
most plausible hypothesis that reconciles these findings is that the dorsal premotor cortex is
involved in coding reach direction and the ventral premotor cortex is involved in coding
grasps, and that interconnections between F2 and F5 (Marconi et al., 2001) allow the two
regions to converge on a wrist orientation appropriate for the selected reach direction and
grasp type.
We model the F2/F5 wrist rotation network as a signal- and execution-related population,
similarly to the other premotor modules. The signal-related population contains a three-
dimensional DNF, with each unit selective for a combination of the angles of the DOFs of
the wrist within its joint angle limits. The input to the signal-related DNF is given by:
( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
WRs AIP WR F7 WR
F5PREC WR F5TRI WR
F5POW WR F5SIDE WR WR
t t t t t
t t t t
t t t t
→ →
→ →
→ →
= + +
+ +
+ +
IN AIP W F7sDIR W
F5sPREC W F5sTRI W
F5sPOW W F5sSIDE W ε
Note that the signal-related F2/F5 DNF receives input from the F7sDIR population encoding
the direction of the reach offset, but it does not get reach offset radius information from the
F7sRAD population since the wrist rotation should not depend on the offset radius. The
execution-related population also contains a three-dimensional DNF, with its input given by:
118
( ) ( ) ( )
WRe WR WR WR
t t t
→
= + + IN WRs W GP ε
The weights W
WR→WR
were set just like the corresponding weights in the F2, F5 and F7
modules, and the weights of the afferent connections of the signal-related population were
updated using the REINFORCE learning rule:
( ) ( )
( ) ( ) ( ) ( )
, , , , , 1 , , , , ,
, , , , ,
AIP WR AIP WR
WR
a b i j k t a b i j k t
rs t a b t i j k t α
→ →
+ = + W W
AIP WRe
( ) ( )
( ) ( ) ( ) ( )
, , , , , 1 , , , , ,
, , , , ,
F7 WR AIP F7
WR
a b i j k t a b i j k t
rs t a b t i j k t α
→ →
+ = + W W
F7eDIR WRe
( ) ( )
( ) ( ) ( ) ( )
, , , , 1 , , , ,
, , , ,
F5PREC WR F5PREC WR
WR
a i j k t a i j k t
rs t a t i j k t α
→ →
+ = + W W
F5ePREC WRe
( ) ( )
( ) ( ) ( ) ( )
, , , , 1 , , , ,
, , , ,
F5TRI WR F5TRI WR
WR
a i j k t a i j k t
rs t a t i j k t α
→ →
+ = + W W
F5eTRI WRe
( ) ( )
( ) ( ) ( ) ( )
, , , , 1 , , , ,
, , , ,
F5POW WR F5POW WR
WR
a i j k t a i j k t
rs t a t i j k t α
→ →
+ = + W W
F5ePOW WRe
( ) ( )
( ) ( ) ( ) ( )
, , , , 1 , , , ,
, , , ,
F5SIDE WR F5SIDE WR
WR
a i j k t a i j k t
rs t a t i j k t α
→ →
+ = + W W
F5eSIDE WRe
3.1.6 Training
Each training trial was run for 5s with a 1ms time step. All training simulations used the
same protocol in which at 0.5s a green object appeared in the model’s field of view. Different
objects were presented (cubes, rectangular prisms, cylinders, spheres, and flat plates) at
random orientations and locations. The model input was obtained by getting the object’s
shape, color, size, orientation, and position from the physics simulator. At 1s into the
119
simulation the object turned red, triggering the release of inhibition from the execution-
related premotor populations by setting the input GP to each of these populations to 0. In the
last five time steps of each trial (corresponding to 5ms fo simulation time) rs(t) is set to
DA
success
if the grasp is successful, and DA
fail
if not. Although the bulk of learning in this
model is done simultaneously in all layers, we still found it necessary to use a somewhat
staged learning approach to bootstrap the system. During AIP pretraining trials no
movements were attempted and rs(t) was always equal to 0. During the wrist rotation
pretraining trials, rs(t) was set to
4
success
DA
if palm contact was achieved at all.
3.1.6.a AIP Pretraining
We found that at the start of training, the neurons in AIP did not have sufficient activity
rates to drive the premotor populations. This resulted in approximately 1,000 trials in which
no grasps were attempted, but the connection weights into AIP were slowly modified
according to the SOFM weight adjust rule described above. Eventually large numbers of AIP
units were significantly activated in widespread overlapping representations sufficient to
activate premotor populations. This period may correspond to a period of visual experience
before the development of skilled reaching in which no grasps are attempted and visual
regions are shaped by unsupervised learning mechanisms. Although we assume the existence
of a pretrained reaching circuit, this period could correspond to a period of motor babbling in
which internal models of the arm are learned (Bullock et al., 1993).
120
3.1.6.b Wrist Rotation Pretraining
The added realism of our simulator compared to that of ILGM comes at the price of a
much lower probability of successfully grasping an object with random motor parameters.
This makes the task of learning much more difficult. To surmount this problem, we used a
period of pretraining in which any palm contact was rewarded and only the connection
weights between F7 and F2/F5 were modified. At the end of pretraining, the system could at
least orient the hand in the correct direction to make finger or palm contact with the object at
various locations and orientations, similar to the automatic wrist orienting mechanism used in
some ILGM simulations. This period of training my also correspond to the motor babbling
period of infant development modeled by Kuperstein (1988). However, whereas Kuperstein
modeled control of a multijoint arm to bring the end-effector to a target location, we model
learning to orient the wrist to make hand contact. This period of training therefore
corresponds to infant development from 7-9 months where infants learn to pre-orient their
wrist in response to an object’s affordances (Morrongiello and Rocca, 1989; Witherington,
2005).
3.1.6.c Grasp Training
After pretraining, the system was trained for 100,000 trials with different objects at
random locations and orientations. The object type, position, and orientation were changed
every 6 trials since each trial lasted 5s and infants will repeatedly reach to a novel object for
at least 30s before habituation (von Hofsten and Spelke, 1985). During these training trials
only stable grasps were positively reinforced and all modifiable connection weights (red
arrows in Figure 3-1) were subject to learning.
121
3.2 Results
After training, the model was able to generate stable grasps of each object type at various
locations and orientations. The representation in AIP allowed the model to generalize across
object properties enough to successfully grasp objects in novel configurations in a few
attempts. Here we demonstrate the ability of the model to generate successful grasps, analyze
the learned representations in AIP, and generate predictions for future experiments.
3.2.1 AIP Representation
Each object at each location and orientation in the training set elicited a slightly different
activation pattern in the cIPS and V6a populations. However after pretraining AIP, objects
with similar features elicited similar, overlapping, patterns of activation in this region (Figure
3-5). This is an inherent property of SOFMs and is what allows the model the ability to
successfully grasp novel objects in familiar locations and orientations (see Grasp Training,
below).
122
1.0
0.0
0.5
AIP Firing Rate
1 40
1
40
1 40
1
40
1 40
1
40
1 40
1
40
1 40
1
40
1 40
1
40
1 40
1
40
1 40
1
40
Figure 3-5 AIP activation for different objects in various locations and orientations. Each
panel shows a third-person view (left) and AIP activation (right).
Murata et al. (2000) tested the response of AIP neurons to the sight of various types of
objects and found that 25/55 visually responsive AIP neurons were highly object selective,
responding strongly to one particular object and weakly to all others. They used
multidimensional scaling (MDS) to look at how moderately object selective neurons encode
the similarity of objects. MDS is a method of reducing a high dimensional input space into a
lower dimensional space while preserving topological relations between vectors. It was found
that moderately object selective AIP neurons respond to common combinations of geometric
123
features shared by similar objects such as shape, size and/or orientation. However, note that
the objects used in this experiment where not natural, complex objects, but geometric
primitives. We suggest that AIP codes combinations of features of these object primitives
and that given a complex object, AIP neurons selective for features of each of its primitives
will be activated. When we use the term object selectivity we therefore are actually referring
to object primitive selectivity.
We did not include any explicit encoding of object shape in the inputs to AIP (although
the CYL and RECT populations selective respond to features of particularly shaped object).
However we found that after training AIP contained a mixture of highly, moderately, and
weakly object selective neurons (Figure 3-6). To characterize a neurons object preference, we
used the same technique as Raos et al. (2006) where the object specificity of a neuron is
defined as a preference index (PI) :
1
i
pref
r
n
r
PI
n
-
=
- ∑
where n is the number of objects, r
i
is the mean activity of the neuron for object i, and r
pref
is
the mean activity for the preferred object in the current epoch. This measure can range from
0 to 1, with 0 meaning the neuron responds equally to all objects, and 1 indicating activity for
only one object. We classified neurons as highly object selective if they had a PI greater than
.75, moderately object selective if they had a PI between .25 and .5, and non-object selective
if they had a PI less than .25. The PI of each neuron was evaluated in blocks of 500 trials
throughout the entire training period.
124
1.0
0.0
0.5
0 10,000 5,000
1600
0
800
Trial
Preference Index Number of Neurons
Figure 3-6 Top: Object specificity statistics for the AIP population during training
(solid=maximum PI, dashed=mean PI, dotted=minimum PI). Bottom: Numbers of highly
(solid), moderately (dashed), and non- (dotted) object specific neurons throughout training.
We found that at the start of training, all AIP neurons were moderately object selective
(Figure 3-6), responding to combinations of features shared by several objects. After about
500 trials, a small amount of neurons became non-object selective and simply responded to
the presence of any object. By the end of the AIP pretraining period, some neurons became
highly object selective, some reaching a maximum PI near 1.0, indicating they only
responded to specific combinations of features that signaled a particular object. This
125
selectivity was maintained throughout the entire training period, even after the model began
to attempt grasps.
3.2.2 Grasp Training
For each object shape and configuration, the model replicated the results of the ILGM
model, which used a simplified AIP model that signaled the presence, location, or orientation
of an object in different simulations. As in ILGM, the learned connection weights between
the AIP module and the premotor populations encoded grasp parameters most likely to result
in the performance of a stable grasp. The model was able to generate stable grasps of each
object tested in various positions with different orientations (Figure 3-7), and because of its
learned affordance representation could generalize these grasp plans to novel objects.
Figure 3-7 Stable grasps generated by the model of different objects with various positions
and orientations. Each panel shows a third-person view on the left and the model’s first-
person view on the right.
ILGM showed that even without hand preshaping and virtual finger enclosure, precision
pinches could result from trial-and-error reinforcement learning, although not often. We have
replicated this result with these features even though the more realistic physics simulator we
126
used is less forgiving in evaluating grasp stability. We therefore believe that the development
of precision pinching coincides with the development of feedback-based, skilled grasping
(see Discussion). A sequence of frames from a precision pinch generated by the model
during training is shown in Figure 3-8. Hand preshaping has already begun at 2s after object
presentation and the enclosure phase is triggered at 2.3s. The thumb first makes contact with
the object at 2.4s and the index finger finally contacts the other side of the object’s surface at
2.6s. At this point the object is slightly rotated between the thumb and forefinger since their
opposition axis was not exactly aligned with the orientation of the object (note the slight
change in object orientation from 2.7s to 2.9s). However the axis was aligned close enough to
stabilize the object, aided by friction, and the resulting grasp is judged as stable.
127
Figure 3-8 A series of frames showing the progression of a precision pinch of a flat plate
generated by the model. At 2s after object presentation hand preshaping has already begun.
The enclose phase is triggered at 2.3s and the object is first contacted at 2.4s.
The learned AIP representation allowed the model to successfully grasp objects in
various orientations and positions by preserving those features represented in cIPS and V6a
that are essential for programming the grasp. Activity in cIPS, AIP and premotor cortex is
shown in Figure 3-9 during four grasps of the same size cylinder at the same location
(sometimes the object is displaced by the hand after grasping), but with varying orientations.
The cIPS AOS cylinder population encodes the three-dimensional orientation of the main
axis of the cylinder as a noisy population code. Based on this representation and those in the
128
other cIPS populations and V6a, the AIP module forms a distributed representation of
combinations of the object’s features that are important for grasping. AIP neurons selective
for the object shape and/or position are active during each grasp trial, resulting in highly
similar patterns of activity in AIP. However, there are some AIP neurons selective for the
orientation of the object and this causes the patterns of AIP activity to be slightly different
depending on the object’s orientation in each trial. These neurons bias motor parameter
selection in the premotor populations, resulting in the selection of parameters appropriate for
the current object’s orientation. The differences in F5 activity during each grasp trial are due
to noise – any of the encoded grasp types and maximum apertures would work for grasping
the cylinder. However, the differences in F7 and F2-F5 activity encode the different approach
angles and wrist rotations that must be used to successfully grasp the cylinder at each
orientation.
129
cIPS-AOS-Cylinder AIP F7-Direction F5 F2-F5
-π
π
y
-π
π
z
-π
π
y
-π
π
y
-π
π
y
-π
π
-7π/9 -5π/9 -π/3 -π/9
π/9 π/3 5π/9 7π/9
x
-π
π
-7π/9 -5π/9 -π/3 -π/9
π/9 π/3 5π/9 7π/9
x
-π
π
-7π/9 -5π/9 -π/3 -π/9
π/9 π/3 5π/9 7π/9
x
-π
π
-7π/9 -5π/9 -π/3 -π/9
π/9 π/9 5π/9 7π/9
x
1
40
1 40
1
40
1 40
1
40
1 40
1
40
1 40
π
0
φ
0
π
θ
-π
π
z
-π
π
z
-π
π
z
π
0
φ
0
θ
π
π
0
φ
0
π
θ
π
0
φ
0
π
θ
0
1
0.4 1
Max Aperture (normalized)
Firing Rate
0.7
pinch
power
tripod
side
0
1
0.4 1
Max Aperture (normalized)
Firing Rate
0.7
0
1
0.4 1
Max Aperture (normalized)
Firing Rate
0.7
0
1
0.4 1
Max Aperture (normalized)
Firing Rate
0.7
-π/6
π/4
-13π/108 −π/36 π/54
7π/108 π/9 17π/108 11π/54
x
x
x
x
-π/6
π/2
y
-π/12
π/18
z
-π/27
-π/6
π/4
-13π/108 −π/36 π/54
7π/108 π/9 17π/108 11π/54
-π/27
-π/6
π/4
-13π/108 −π/36 π/54
7π/108 π/9 17π/108 11π/54
-π/27
-π/6
π/4
-13π/108 −π/36 π/54
7π/108 π/9 17π/108 11π/54
-π/27
-π/6
π/2
y
-π/12
π/18
z
-π/6
π/2
y
-π/12
π/18
z
-π/6
π/2
y
-π/12
π/18
z
Figure 3-9 Firing rates of neurons in the cIPS AOS-cylinder, AIP, F7-direction, and F5
populations during grasps to the same object in the same location, but with different
orientations. Activity in other populations was not significantly different during each grasp
(due to using the same object and locations as the target) and is therefore not shown.
130
In this model, F5 neurons encode grasp types according to the model design, since their
activity is decoded in order to perform the grasp. In contrast, AIP neurons come to represent
affordances - combinations of object features that signal the possibility for grasping. A
comparison of AIP and F5 activity while grasping different objects with different types of
grasps is shown in Figure 3-10. The first two rows show AIP and F5 firing rates while
grasping two different objects with the same grasp – a precision pinch. The F5
representations are overlapping during the two grasps, but the AIP activation patterns are
completely different. This is because the inputs to AIP from cIPS are extremely different for
the two objects. The last two rows in Figure 3-10 show AIP and F5 activity while grasping
the same object with two different types of grasps. In this case the AIP representation is
almost exactly the same for the two grasps, but the F5 activation pattern is completely
different. AIP neurons in this model therefore do not specify the type of grasp, but represent
an affordance that can be acted on using several types of grasps.
131
0
1
0.4 1
Firing Rate
0.7
pinch
power
tripod
side
1 40
1
40
1 40
1
40
1 40
1
40
0
1
0.4 1
Firing Rate
0.7
Max Aperture (normalized)
0
1
0.4 1
Firing Rate
0.7
AIP F5
0
1
0.4 1
Firing Rate
0.7
1 40
1
40
Figure 3-10 Firing rates of neurons in AIP (left column) and F5 (middle column) while
grasping a cylinder with a precision pinch (top row), plate with a precision pinch (second
row), plate with a tripod grasp (third row), and plate with a power grasp (bottom row).
3.3 Discussion
We have shown that the current model not only explains the development of F5 canonical
neurons controlling grasping, as did ILGM, but that it also gives an account of the
development of visual neurons in area AIP. In our simulations, highly-object specific neurons
developed in this region as the result of unsupervised learning before grasps were even
132
attempted. This specificity was maintained even as grasps were performed. The result was a
mixture of AIP neurons that only fire for particular objects, and those that are activated by
combinations of features shared by several objects. This allows the model to respond to novel
objects in positions and orientations similar to ones it has already successfully grasped.
Parietal representations in both ILGM and ILGA used population codes to represent object
features, but ILGA represented multiple object features at once and combined them in a
representation of an affordance. ILGM’s premotor module used a probabilistic coding
followed by a rewriting of activity as a population code, while ILGA uses a more realistic
noisy WTA process. In ILGM the wrist rotation, object-centered reach offset, and hand
enclosure rate are selected by the premotor module, but ILGA comes closer to FARS in
including the grasp type and maximum aperture in addition to wrist rotation and object-
centered reach offset.
Oztop et al. (2006a) presented a model of AIP related to ILGM and ILGA, known as the
Grasp Affordance Extraction Model (GAEM). Like ILGA, GAEM uses a SOFM to model
visual-dominant AIP neurons. However GAEM uses backpropagation, a biologically
implausible learning rule, to shape the connections between AIP and F5, while ILGA uses
reinforcement learning. The inputs and outputs to GAEM are also less biologically plausible
than in ILGA. In GAEM, the input to AIP was a depth map, and the F5 representation was
set of hand joint angles. This is inconsistent with the neurohysiological data showing
representation of object geometric properties in cIPS (Sakata et al., 1998) and its projection
to AIP (Nakamura et al., 2001), as well as data showing that F5 neurons are tuned to a
particular grasp rather than the specific posture of the hand at any point during the grasp
133
(Umilta et al., 2007). Another shortcoming of GAEM is that it was trained offline on the log
data from ILGM simulations. Nevertheless, the basic result of GAEM was that it showed
how AIP could extract higher-level information from simpler visual inputs and map them
onto hand postures resulting in stable grasps. In this sense, the results of ILGA are similar,
showing how AIP can combine simpler visual input into higher-level affordance
representations and map them onto motor parameters that will result in stable grasps. ILGA
goes beyond GAEM in using more biologically plausible representations and learning rules
and in learning affordance extraction and grasp planning simultaneously.
3.3.1 Predictions
Using biologically plausible learning rules and inputs, we have shown that ILGA can
learn to represent affordances for grasping and to select motor parameters appropriate to act
on them. This model makes several testable predictions concerning a) the encoding of object
features in area AIP, b) shifts in AIP activation during learning, c) the response of parietal
regions to complex objects with multiple affordances, and d) the existence of an object-
centered spatial representation for reach-to-grasp movements.
One of the main issues hindering a clear interpretation of the encoding used in area AIP is
the lack of studies which systematically vary object parameters. It has been shown that many
AIP neurons are selective for combinations of object size, shape, and orientation (Murata et
al., 2000; Taira et al., 1990), however as pointed out by Oztop et al. (2006a), it is not known
whether or not AIP neurons encode the quantities of geometric properties. Experiments are
needed which test the response of visual-dominant AIP neurons to objects with
systematically varied dimensions, rotations, and positions.
134
Like GAEM, ILGA reproduces experimental data showing that most AIP neurons are
moderately object-selective, showing responses to multiple objects (Murata et al., 2000).
Since ILGA is a developmental model, it allows us to go beyond GAEM and available
experimental data, and predict that AIP neurons are initially selective for object features, and
become object-selective early in development, before grasping has developed. In ILGA, the
global reinforcement signal elicited by successful grasps modulates the rate of unsupervised
learning occurring in AIP. This causes neurons in AIP to preferentially encode features of
objects that can be successfully grasped, resulting in a representation of grasp affordances
rather than strictly geometric features. To our knowledge, no studies have looked for shifts in
AIP activation during the course of grasp learning. This model predicts that if grasps of
certain objects are disrupted (through local muscimol injection or physical perturbation), AIP
cell selectivity will shift, with more cortical representation eventually given to objects and
features of other objects that are successfully grasped. For example the selectivity of any AIP
neurons that prefer round, elongated objects should shift over many trials if grasps of
cylinders are repeatedly disrupted by spatially perturbing the target object.
To date, most studies of grasping use geometric primitives as stimuli rather than
complex, natural objects (Murata et al., 1997; Murata et al., 2000; Raos et al., 2006; Umilta
et al., 2007). While this is desirable from the standpoint of experimental control and data
analysis, it hinders an understanding of the role of area AIP in natural grasping tasks with
complex objects that can be grasped in several different ways. ILGA suggests that such
objects are represented as a collection of object primitives whose affordances are represented
by AIP. This idea could be tested by constructing pseudo-complex objects out of object
135
primitives such as the hammer in Figure 6-1 made from a cylinder and rectangular prism.
This model predicts that AIP neurons selective for an object primitive will still be active
when observing a complex object that contains that primitive. Such an experiment should
utilize eye-tracking to determine if the focus of attention modules affordance representation
in AIP. In an instructed task where a visual cue indicates which part of the object should be
grasped to receive a reward, this model predicts that the relative firing rates of AIP neurons
selective for each object component will predict the grasp choice.
While we are not aware of any data showing the existence of an object-centered reference
frame for reaching, we found it necessary to use such a representation in order to plan the
direction of the hand’s approach to the object. One possible reason that such a representation
has not been found is that most experiments use either a pure reaching, wrist rotation, or
naturalistic grasping task. Once ILGA has been trained, the reach offset direction is highly
correlated with the wrist rotation so that the hand will approach the object with the correct
orientation for grasping. Therefore selectivity for object-centered offset directions cannot be
experimentally demonstrated without trials in which the offset direction is held constant
while the wrist rotation is varied. This is similar to the situation in the interpretation of motor
cortex activity, where it has been shown that intrinsic and extrinsic and kinematic and kinetic
variables are highly correlated during a commonly used experimental reaching task (Chan
and Moran, 2006). In order to determine if a region encodes the object-centered reach offset
independent of wrist rotation a reaching task must be used in which the subject must reach to
a target object from different directions with varying wrist orientations. On the basis of its
136
object-centered representation for saccades and arm-related activity in its ventral portion, we
predict an object-centered spatial representation in ventral F7.
3.3.2 Conclusion
ILGA is the only developmental model of grasping to date that simultaneously learns to
extract affordances from object features and select motor parameters to successfully grasp
them. We have shown that the model develops distributed representations in area AIP similar
to those reported in the experimental literature and can use these representations to generalize
grasp plans to objects of varying sizes and at different orientations and positions. Finally we
presented several neurophysiologically testable predictions made by the model and discussed
ways in which it could be extended to handle context-dependent grasping of complex objects
and skilled manipulation.
137
Chapter 4 - Mirror Systems in Learning Sequential Action Production
Classically, the firing of mirror neurons has been associated with the execution of certain
actions and the observation of more-or-less similar actions (di Pellegrino et al., 1992).
However, we suggest that rapid reorganization of motor skills can benefit from a novel role
of the mirror system – assessing the success of the intended action and recognizing that what
was intended as one action may in execution look like another action.
In macaque experiments, the set of actions studied is well delineated so that each action
can be unambiguously characterized. A single mirror neuron is described as strictly
congruent if it is activated by observation of actions very similar to those for which it is
active during execution; it is broadly congruent if it can be activated by observation of a
broader class of actions. Newman-Norlund et al. (2007), using fMRI to assess the role of the
human mirror neuron system (MNS), found that the BOLD signal in the right inferior frontal
gyrus and bilateral inferior parietal lobes was greater during preparation of complementary
than during imitative actions. They speculate that this is because strictly congruent mirror
neurons responded to the observed action in a context-independent manner, whereas the
planning of complementary actions required the additional participation of broadly congruent
mirror neurons to link the observed action to a different, but related, motor response. In a
joint action paradigm, Sebanz et al. (2003) have found that the actions of the other participant
are represented and influence the representation of one’s own action even when an imitative
response is not required. These studies are consistent with the emerging view (Brass and
Heyes, 2005; Schütz-Bosbach et al., 2006) that action observation does not inevitably lead to
facilitation of matching actions. Rather, the claim is that mirror neurons process associations
138
between observed and executed movements, and that both imitative and nonimitative
associations may derive from this function.
Such findings establish the view that, in observing the action of others, mirror neurons
may code not only the observed action but others as well. We would add that, in general, the
nature of the observed action may be ambiguous so that influences from, e.g., inferotemporal
cortex and prefrontal cortex may be required to converge upon the representation of one
action rather than another (Oztop et al., 2005). What we add to this discussion is the
hypothesis that during self-action, mirror neurons may code not only the intended action but
also any apparently performed actions. More specifically, mirror neurons may be activated
during self-action not only by efference copy of the command for the intended action but also
by observation of one’s action (cues may be proprioceptive as well as visual) – and will thus
activate mirror neurons for actions which appear similar to the action as currently executed.
This includes the case where the unsuccessful execution of an intended action yields a
performance similar to that of another action in the animal’s repertoire.
The plausibility of this hypothesis is enhanced by computational considerations. In our
modeling of the mirror system for grasping as an adaptive system (Bonaiuto et al., 2007;
Oztop and Arbib, 2002),
• we postulate that a population of canonical neurons will encode an action already in
the animal’s repertoire, that these will activate a set of pre-mirror neurons (i.e.,
neurons in the area F5c of macaque brain that have not yet been tuned to act as
mirror neurons) which also receive highly processed visual data on how the hand
moves relative to an object (the so-called hand state), and
139
• we then demonstrate how, through learning, the synapses whereby the hand state
trajectory (tracking features of the hand relative to affordances of the object) affects
these pre-mirror neurons become tuned so that the neurons will become mirror
neurons for the given action. They will thus respond to an appropriate hand-state
trajectory whether it is based on the animal’s own movement or its observation of
another animal’s movement.
As a result, during self action, mirror neurons may be activated both by an efference copy
of the intended action (represented in the model as canonical neuron activity, absent during
observation of others) and by observation of the hand-state trajectory (which may activate
neurons encoding one or more actions).
In MNS2 we have modeled data on audiovisual mirror neurons (Kohler et al., 2002)
which can respond to the sight or sound of an action which is associated with a distinctive
sound (e.g., peanut breaking; paper tearing). Significantly, we modeled a case unaddressed
by the experimenters in which the visual and auditory inputs were discordant –
demonstrating activation of mirror neurons both for the heard action and the seen action. It is
a further property of our model, not developed in earlier publications but central here, that
since mirror neurons can be activated both by the neural representation of an intended action
and the observation of an executed action, cases can arise where the visual similarity of the
performed action to an unintended action results in the activation of a mirror neuron
representation for an apparent action simultaneously with that for the intended action.
At times in trying to solve a novel task (or a familiar task under novel conditions) we
may succeed by using a random variation on an action A – and then benefit from that success
140
by recognizing that the variant is more like some other action B than like A itself. We then
succeed immediately on replacing A by B in our usual strategy. This suggests that success
may reinforce not only successful intended actions but also any action the mirror system
recognizes during the course of that execution. Furthermore, the fact that action A was
intended was but not recognized as being successfully completed can be used to decrease the
estimate of successfully performing A in the current circumstances, facilitating the
exploration of alternate actions. Our claim is that the mirror system, by recognizing this
apparent action – and also by recognizing if the intended action was unsuccessful – can
greatly speed the learning of a new motor program. We also predict that no such rapid
reorganization will take place in cases where the mirror system can find no action in the
animal’s repertoire that “explains” the accidental success of an intended action.
To demonstrate this claim, we show its efficacy in explaining data on rapid
reorganization of food taking in a cat after spinal lesions which impaired grasping with the
forepaw. Alstermark et al. (1981) experimentally lesioned the spinal cord of the cat in order
to determine the role of propriospinal neurons in forelimb movements. A piece of food was
placed in a horizontal tube facing the cat (Figure 4-1). In order to eat the food, the cat had to
reach its forelimb into the tube, grasp the food with its paw, and bring the food to its mouth
(Figure 4-1 A-E). Lesions in spinal segment C5 of the cortico- and rubrospinal tracts
interfered with the cat’s ability to grasp the food, but not to reach for it. However, for us the
significant observation is that these experiments also illustrate interesting aspects of the cat’s
motor planning and learning capabilities.
141
Figure 4-1 The experimental setup used in Alstermark’s experiments. A horizontal tube
containing food is facing the cat and the cat must reach into the tube with its paw to extract
the food. (A-E): A cat able to grasp the food with its paw. (F-J): A cat unable to grasp the
food with its paw eventually learns to rake it from the tube and grasps it with its mouth
(reproduced from Alstermark et al., 1981 with permission of the author).
After the grasp-impairing lesion, the cat could still reach inside the tube, but would
repeatedly attempt to grasp the food and fail. These repeated failed grasp attempts would
eventually succeed in displacing the food from the tube by an accidental raking movement,
and the cat would then grasp the food from the ground with its jaws and eat it. After only a
few trials thereafter, rather than attempting to grasp the food the cat would simply rake the
food out of the tube, a more efficient process than random displacement by failed grasps
(Figure 4-1 F-J). In this case, the cat rapidly modified its motor program when a previously
successful plan became impaired because of changes in its abilities.
We refer to the example of Figure 4-1 as Alstermark’s cat. Its importance for the present
account is that it introduces the general issue of how an animal, when a habitual course of
action fails may, if suitable means are available, undergo motor reorganization to attain a
new strategy on a faster time scale than classical models of motor learning would yield. We
argue that this fast learning involves a new role for the activation of mirror neurons. In the
142
present example, if a failed grasp that dislodges the food from the tube it then looks like a
successful raking movement. We also show the utility of monitoring the success or failure of
the intended action. More generally then, we posit that the cat (and perhaps other species in
addition to macaques and humans) has a primitive mirror system for recognition of at least
some of its own actions.
In order to test our hypothesis, we have modeled the integration of a mirror system
capable of recognizing the success of apparent actions and the failure of intended actions into
a system called augmented competitive queuing (ACQ) for opportunistic scheduling which
combines reinforcement learning, action affordances, and competitive queuing. Here we hide
many of the details of ACQ so that we can focus on the interaction between the mirror
system and ACQ in rapid motor reorganization.
4.1 Methods
4.1.1 System Overview
This model is implemented as a set of interacting functional units called schemas (Arbib,
1981). This allows some components to be represented as (possibly state-dependent)
mappings from input to output variables, while those that are the focus of this study are
implemented as neural networks.
A simplified version of the ACQ model is shown in Figure 4-2. The key notion is that the
system will execute the most desirable action which is currently executable.
143
r
t
V (x)
t
Internal State
External State
Act
Primary
Reinforcement
(food)
Effective Reinforcement Effective Reinforcement
Adaptive Critic
Selected
Action
Desirability
Executability
x
Recognized Recogn Rec
Actions Ac Ac
x
^
γV (x) - V (x)
t-1 t
Mirror System
Motor
Controller
Action
Execution
Training Signal /
Priming
Visual /
Somatosensory
Input
Interoceptive
Input
Comparator
Reinforcement
Figure 4-2 A simplified version of the ACQ system.
The Actor selects the currently executable action that is most desirable. Desirability is the
expected reinforcement for executing an action in the current internal state. Estimates of
desirability are updated by the Adaptive Critic, which employs temporal difference learning.
A crucial innovation here is that the Adaptive Critic assesses not only the current action but
also those apparent actions reported by the Mirror System in making its assessments. The
executability of a particular action is negatively or positively reinforced depending on a
comparison between an efference copy of the selected action and the action recognized by
the mirror system.1) The external world is modeled as a set of environmental variables (in
the present example, position of the food, position of the paw). The executability signal
activates the schemas encoding those actions that are currently executable, i.e., for which the
environment provides suitable affordances. The desirability signal specifies for each motor
144
schema the reinforcement that is expected to follow its execution (perhaps after follow-up
actions), based on the current internal state of the organism. The Actor then simply uses a
noisy Winner-Take-All (WTA) mechanism to select for execution the most desirable of the
currently executable motor schemas.
2) In the general version of the model, there can be many sources of primary
reinforcement, and the organism can be in diverse internal states. However, in the simplified
model used here to demonstrate the efficacy of mirror neurons that recognize apparent
actions, the only reinforcer is food, and the only internal state is “hungry”.
3) Lower-level motor control structures are not modeled here. Instead, execution of motor
schemas is modeled by updating the representation of the appropriate environmental
variables. For example, execution of the Reach-Food motor schema is simulated by
modifying the value of the variable representing the position of the paw to that directly above
the food.
4) In our detailed models of the mirror system, the complete trajectory of the effector
relative to the target is used to provide a time series of activation of mirror neurons which
might relate to the initial part of the trajectory (Bonaiuto et al., 2007; Oztop and Arbib,
2002). In general such models should additionally utilize population codes for action
representation, but the general mechanism of utilizing action recognition for reinforcement
would remain the same. Therefore we use a simple feedforward neural network which
processes external state information to activate the mirror neuron for an action if the end-
state stands in the appropriate relation to the start-state. The key innovation is this: During
self-action, if the final state stands in the appropriate relation to the initial state for any
145
action, then the mirror neurons for that action will be activated even if it was not the intended
action, and its desirability will be updated, as described below. Just as importantly, if the
final state does not stand in the appropriate relation to the initial state for the intended action,
then this attempted execution will be branded as unsuccessful and the desirability of the
intended action will not be updated on this occasion.
5) Desirability is learned: The Mirror System informs the Adaptive Critic which actions
are eligible for temporal difference learning to update estimates of expected reinforcement
(desirability) – namely the intended action if it is successful, as well as any apparent actions.
(If the intended action is unsuccessful, its desirability is not changed since this instance
provides no evidence of whether or not its successful execution contributes to reaching a
desired outcome.) Using the general approach of temporal difference learning, the Adaptive
Critic learns to transform primary reinforcement (how much food you get now if you execute
this action) into expected reinforcement (how much food, on a discounted schedule, you are
likely to get from now on if you execute this action). In the Alstermark example, reaching for
food is desirable because it makes grasping the food possible which makes putting the food
in the mouth possible, leading to eating which is the only action that receives primary
reinforcement – but, because of discounting, reaching for food is less desirable than grasping
the food, and so on.
6) Executability is learned: In our model, when the Mirror System signals that an
intended or apparent action was executed successfully, the action’s executability is increased.
Conversely, if the intended action was unsuccessful, its executability is decreased.
146
4.1.2 Simulation Protocol for Alstermark’s Cat
Having introduced the general framework for ACQ, we now present simulation results
specialized to the case of Alstermark’s Cat. Here, the external environmental variables
(where the external space is 2 dimensional with both horizontal and vertical dimensions
bounded by 0 and V
max
) are:
f(t): position of the center of the food at time t
p(t): paw position at time t
m(t): mouth position at time t
b(t): position of the center of the tube opening at time t
and the internal environment variable:
h(t): level of hunger at time t
The “time-step” in the model corresponds to the execution of a single action. The
execution of motor schemas is modeled by the adjustment of the appropriate environmental
variables – e.g. after execution of the grasp with mouth action the position of the food is the
same as that of the mouth, f(T) = m(T) (see Motor schemas, below). As noted earlier, the
interoceptive signal in the present model is held constant, so that the desirability signal for
each action is always computed relative to the state of being hungry.
4.1.2.a Defining the Schemas
Motor schemas
There are 9 “relevant actions” in the model: Eat, Grasp-Jaws, Bring to Mouth, Grasp-
Paw, Reach-Food, Reach-Tube, Rake, Lower Neck, and Raise Neck. However, in
simulations we add a number of “irrelevant actions” so that the search space for finding
147
useful actions following the lesioning of the Grasp-Paw schema is so large that the cues
provided by recognition of apparent (though unintended) actions can be shown to play a
significant role in reducing the search space.
Each of the named schemas is defined by its preconditions and effects, as shown in Table
4-1. If the preconditions are met and the action is “executed”, the effects are enforced), but
we will also model how a lesion may yield unsuccessful execution of the Grasp-Paw schema.
Table 4-1 Set of relevant actions with preconditions and effects.
Action Preconditions Effects
Eat Food in jaws Hunger reduced; positive
reinforcement
Grasp-Jaws Food close to jaws Mouth moves to food
Bring to
Mouth
Food grasped by paw but
not close to mouth
Bring paw close to mouth with food
still grasped by paw
Grasp-Paw Paw close to food Paw grasps food
Reach-Food Food in tube and paw
aligned with or within
tube or food out of tube
but not close to paw
Paw is moved close to food
Reach-Tube Paw not near tube Move paw near end of tube
Rake Paw at a position both
beyond and higher than
the food
Bring paw closer, with food coming
with the paw
Lower Neck Neck above lowest
position
Bring neck to lowest position
Raise Neck Neck below highest
position
Bring neck to highest position
Mirror System Module
The action recognition schemas of the Mirror System module (Figure 4-2) signal the
perception of the cat’s own movements using presynaptic perceptual input and a working
memory trace of the same perceptual inputs from the previous time step (i.e., before
148
execution of the current action). The current value of each external environmental variable as
well as the change in each variable from the last discrete time step is input into the
feedforward network. The network was previously trained to classify actions and the outputs
code the currently recognized action(s). Each neuron in the output layer of the module
encodes a different action, and its normalized firing rate is interpreted as the level of
confidence that the observed action is the one it encodes. The output layer of the network
also receives an efferent copy of the output of the Actor module which primes the neuron
encoding the intended action. Only actions recognized by the mirror system (whether
intended or apparent) will be reinforced by the Adaptive Critic as described below.
Note, however, that in the present study, the only failures that occur are those we
specifically program into the system, as in the case of simulated lesioning of the grasp
schema, and errors in action classification by the neural network. In the simulation
experiments described below, irrelevant actions that have no environmental effects are used
to test the efficacy of the mirror system in reinforcement learning. Due to noise in the WTA
process of the actor, these actions can be selected for execution. In a more realistic model
that included a dynamic model of the cat’s body and probabilities of disturbances and errors
in execution, the range of possible mismatches of apparent and intended action would
increase, as would the possibility that the executed action would appear somewhat similar to
a different action.
4.1.2.b Learning
Learning proceeds as described above for the general ACQ model in the paragraphs
“Desirability is learned” and “Executability is learned”. The Adaptive Critic employs
149
temporal difference learning to update estimates of expected reinforcement (desirability) if
the currently attempted action was successful. Mirror system recognition of an action as
successful is used to update the executability of the attempted action using reinforcement
learning.
4.2 Results
4.2.1 Motor Program Reorganization in a Novel Environment
We demonstrate how our model supports the rapid reorganization of the cat’s getting
food from the tube following a lesion that affects its grasp schema. First, we show how ACQ
encodes the “motor programs” for reaching for and grasping food and bringing it to the
mouth to eat, and for grasping food on the ground with its jaws and eating it. The flow chart
of Figure 4-3A describes the the model’s behavior but this flow chart is not explicitly
encoded in the neural network. We now show how it emerges through the competition
between motor schemas differentially activated by their learned executability and
desirability:
150
Food in
Mouth?
Food
close to
Mouth?
Food in
Paw?
Food
close to
Paw?
Food
within
Reach?
Paw
not in
Tube?
Eat
Grasp-
Jaws
Bring to
Mouth
Grasp-
Paw
Reach
Food
Reach
Tube
No
No
No
No
No
Yes
Yes
Yes
Yes
Yes
Yes
Food in
Mouth?
Food
close to
Mouth?
Neck
Raised?
Food
close to
Paw?
Food
within
Reach?
Paw
not in
Tube?
Eat
Grasp-
Jaws
Lower
Neck
Rake
Reach
Food
Reach
Tube
No
No
No
No
No
Yes
Yes
Yes
Yes
Yes
Yes
A B
Figure 4-3 A) The original motor program for eating a piece of food initially in a horizontal
tube. B) The motor program that describes the behavior that is learned after the Grasp-Paw
motor schema is lesioned.
Remember that at each time step, the WTA of the Actor module will select for execution
the most desirable of those actions which are executable given the current external state. The
desirability of an action is encoded by the hunger signal (internal state) as scaled by the
synaptic weight driving the unit that codes the action in the Actor’s input layer. We have
seen that the effect of discounting in temporal difference learning is that desirability
151
(discounted expected reinforcement) will be positive in the hunger state for all actions that
lead to eating food, but that for a given action, the greater the number of actions that must
follow before eating occurs, the lower its desirability. We thus get
D(eat) > D(Grasp-Jaws) > D(Bring to Mouth) > D(Grasp-Paw) > D(Reach Food) >
D(Reach Tube) > 0.
Combining these desirabilities with the executability for the current external state means
that the animal faced with food in the tube and acting according to ACQ will behave in the
way described by Figure 4-3A. Similarly, a cat that sees food lying on the ground will behave
according to the appropriate executability conditions and the following desirabilities:
D(eat) > D(Grasp-Jaws) > D(Lower neck) > 0.
We thus start our simulations with the weights of all connections between the internal
state and the Actor’s input layer (which code Desirability) set to 0.0 except for those
connection weights between the hunger neuron of the internal state and the motor schema
neurons in the Actor’s input layer that yield desirability values that satisfy the strict
inequalities listed above (Figure 4-4A).
152
0
0.2
0.4
0.6
0.8
1
Desirability
Eat Grasp-
Jaws
Bring to
Mouth
Grasp-
Paw
Reach
Food
Reach
Tube
Rake Drop
Neck
Raise
Neck
Eat Grasp-
Jaws
Bring to
Mouth
Grasp-
Paw
Reach
Food
Reach
Tube
Rake Drop
Neck
Raise
Neck
0
0.2
0.4
0.6
0.8
1
Desirability
A B
Figure 4-4 A) The mean desirability connection weights for each action after training. The
error bars show the standard deviation. B) The mean desirability of each motor schema after
lesioning the Grasp-Paw motor schema and retraining the network.
4.2.2 Motor Program Reorganization After a Lesion
We simulated a lesion to the Grasp-Paw motor schema by having the lesioned schema
change the food position f(t) by a small random amount with a mean displacement towards
the animal, and setting the paw position p(t) to a value slightly above the old value of f(t).
This corresponds to the animal bringing its paw into contact with the food and retracting the
paw, but failing to maintain a stable grasp. Our simulations showed that the system was in
each case able to rapidly reorganize its behavior to compensate for the lesion.
We then ran this lesioned schema in 100 instances of a model that was already proficient
on both the horizontal tube task and the food on the ground task. In the first trial after the
lesion, the simulated cat reaches into the tube and reaches for the food as it did prelesion, and
then attempts to grasp the food with its paw. Since we modified the Grasp-Paw schema to
simulate the spinal lesion, the grasp is unsuccessful. However, when by chance the food is
displaced from the tube the Mirror System recognizes the performance as a Rake action. The
153
model repeatedly attempts to execute the Grasp-Paw action until the food is displaced from
the tube and is close enough to perform the Lower-Neck, Grasp-Jaws, and Eat actions. After
a few trials the model no longer attempts the Grasp-Paw action and switches to performing
the Rake action before the Lower-Neck, Grasp-Jaws, and Eat actions. This strategy is much
faster since the Rake action reliably displaces the food by a large amount in the direction of
the animal, while the lesioned Grasp-Paw schema displaces the food by a random direction
and magnitude.
The reorganization of the learned motor program after lesioning the Grasp-Paw schema
involved adjustment of the desirability of several motor schemas (Figure 4-4B). The Rake
schema achieved a higher desirability value than the Reach-Food, and the desirability of the
Drop Neck schema became higher than that of the Reach-Tube motor schema. Interestingly,
the Grasp-Paw motor schema desirability remained relatively unchanged, while that of the
Reach-Food motor schema decreased. As a result, the Drop-Neck and Rake actions are then
executed instead of the Grasp-Paw action. This occurs because after lesioning the Grasp-Paw
motor schema, its execution causes the food to be randomly displaced towards the animal
75% of the time. This causes the perception of that failed grasp to look like a successful rake
75% of the time. This causes the perception of that failed grasp to look like a successful rake
75% of the time (whereas a successful grasp does not). When this occurs the executability of
the Grasp-Paw schema is negatively reinforced due to the mismatch between the intended
action (Grasp-Paw) and apparent action (Rake) which indicates that the Grasp-Paw action
was unsuccessful. If the Drop-Neck action is then performed, the desirability of the Rake
154
action will be positively reinforced due to the Mirror System recognition of it as the
apparently executed action and the relatively high desirability of the Drop-Neck action.
Despite the relatively unchanged desirability of the Grasp-Paw action, the network
nonetheless switches strategies after repeated failed grasp attempts to yield action selection
describable (but not controllable) by the flowchart of Figure 4-3B. This is due to the decrease
in executability of the Grasp-Paw action due to representation of the Grasp-Paw action in the
efference copy but not by the Mirror System, indicating that it was unsuccessful. Changing
executability connection weights encode the knowledge that the Grasp-Paw action is no
longer possible after the lesion even when the paw and the food are very close together. The
action is no longer attempted in these circumstances once its executability is lowered enough.
The decrease in executability of the Grasp-Paw action is crucial in the reorganization of the
motor program as it encourages exploration of alternative actions.
4.2.3 Testing the Efficacy of the Mirror System
In order to explore the benefits of the new roles posited for the mirror system in
reorganization, we compared the performance of each network (i) with a mirror system
evaluating lack of success of intended actions and recognizing apparent actions so these too
could enter into learning of desirability, and (ii) when only the successful intended action
was reinforced. Specifically, we tested how this effect varied as a function of the number of
irrelevant actions available to the model. Since the Actor uses a noisy WTA process to select
an action (recall item 1 of the System Overview), these irrelevant actions can be selected for
execution if no other highly desirable actions are executable.
155
0 10 20 30 40 50 60 70 80 90 100
1
1.5
2
2.5
3
3.5
4
Irrelevant Actions
Trial
Figure 4-5 Mean number of trials until the first successful trial after lesion of the Grasp-Paw
motor schema for each number of irrelevant actions tested (0-100). Solid: The model with
reinforcement based on successful intended and apparent actions (mirror system). Dashed:
The alternate model version with reinforcement based solely on successful intended actions
(no mirror system). The error bars denote the standard error.
Without the mirror system for apparent actions the model was typically successful in
acquiring the food in early trials because it takes relatively few unsuccessful grasps to
displace the food from the tube (Figure 4-5). But this is quite different from learning a new
strategy for rapid displacement of the food. Since the desirability of the apparent raking
action was not reinforced and the executability of the unsuccessful grasping action decreased,
irrelevant actions were subsequently attempted. With the mirror system the desirability of the
raking action increased as the executability of the grasping action decreased and the system
smoothly transitioned to the new behavior. Without the mirror system the model can
eventually reorganize its behavior by selecting the Rake action by chance, however as the
156
number of possible actions increases, the probability that it will be randomly selected from
among the irrelevant actions decreases.
0 10 20 30 40 50 60 70 80 90 100
5
10
15
20
25
30
Irrelevant Actions
Trial
Figure 4-6 Mean number of trials until recovery (the first 4 out of 5 intentionally successful
trials) after lesion of the Grasp-Paw motor schema for each number of irrelevant actions
tested (0-100). Solid: The model with reinforcement based on successful intended and
apparent actions (mirror system). Dashed: The alternate model version with reinforcement
based solely on successful intended actions (no mirror system). The error bars denote the
standard error.
We defined recovery time as the number of trials until model was intentionally successful
(performing the Rake action rather than taking advantage of the effects of the lesioned Grasp-
Paw schema) in 4 out of the 5 previous trials. The recovery times for the model instances
with and without the mirror system were analyzed according to the number of irrelevant
actions. Spearman’s rho, a nonparametric measure of correlation, was used to assess the
relationship between the number of irrelevant actions and recovery time. This correlation was
not significant for the group with the mirror system (ρ=0.012, p=.591), but was for the group
without the mirror system (ρ=0.32, p<0.01). All tested combinations of learning rates,
157
numbers of trials, and trial lengths yielded similar results. This indicates that as the number
of irrelevant actions increased, the mean recovery time increased without the mirror system
(Figure 4-6). In contrast, the use of mirror system output in determining which action to
reinforce keeps the recovery time relatively constant even with 100 irrelevant actions. Thus,
the inclusion of the mirror system for reinforcement of apparent actions significantly
improves the speed of recovery from injury in the presence of a large pool of candidate
actions.
4.3 Discussion
We used computational modeling to demonstrate the adaptive value in motor
reorganization of mirror neurons having the property that during self-action, they code not
only for the intended action but also for actions which appear similar to the intended action
as it was actually executed. Specifically, we used the example of Alstermark’s cat to
demonstrate how a general approach to scheduling behavior, the Augmented Competitive
Queuing model, may support an empirically observed example of rapid motor reorganization
when enhanced by a Mirror System performing the hitherto unremarked “What Did I Just
Do?” function – recognizing that the action I just executed looks more or less like some
action already in my repertoire and recognizing if the intended action was unsuccessful.. This
allows temporal difference learning to increase the desirability of the apparent action so that
it rapidly becomes part of a new solution to the task. With an increasing repertoire of
candidate actions the advantage of reinforcement of apparent action in the speed of
reorganization is more apparent.
158
4.3.1 Predictions
Future neurophysiological experiments with mirror neurons could test the claim that they
respond to apparent actions even when these conflict with intended actions. This could be
investigated using an experimental setup similar to that used by Iriki et al. (2001) in which a
device called a Chromakeyer is used to alter what the monkey sees of its hands. A video
monitor may display an actual view of how the hands are moving, add superimposed images
or display something different. The proposed experiment has three conditions governing the
relationship between the action performed by the animal and that displayed on the monitor:
congruent, incongruent, and apparent only. The congruent condition would simply be a
display of the monkey’s hands without modification on the video monitor while the monkey
performs some object-directed grasp or manipulation. In the incongruent condition the
Chromakeyer would be used to present a video of hands performing a different object-
directed action than that being currently performed by the animal. The apparent only
condition would use the Chromakeyer to present video of hands performing some object-
directed action while the monkey is at rest. The congruent condition and apparent only
conditions correspond to the natural scenarios of self-and other-observation, respectively, and
should result in activation of mirror neurons related to the observed action. We hypothesize
that in the incongruent condition, while mirror neurons selective for the intended action will
show some priming, those selective for the apparent action will be the most activated. It is
this property that allows the model to take advantage of apparent actions in motor program
reorganization.
159
Chapter 5 - Synthetic Brain Imaging
A continuing challenge for systems and cognitive neuroscience is to integrate data from
animal neurophysiology and human brain imaging. While neurophysiological studies provide
detailed information on the properties of a sample of neurons in a single region, brain
imaging data reflects global brain activity resulting from neural population activation.
Although these two sources of information are often used in developing integrated
conceptual models of cognitive processes, more refined analysis requires an explicit account
of the coupling between these levels of analysis. One method that has begun to shed light on
this coupling is synthetic brain imaging. This technique uses computational models of the
brain regions in question based on neurophysiological data to generate simulated
neuroimaging signals such as regional cerebral blood flow (rCBF) and blood oxygen level-
dependent (BOLD) responses. These can then be compared with experimental neuroimaging
data in order to determine to what extent the model accounts for human global vascular
responses as well as the animal data that grounded it.
We have reviewed previous synthetic brain imaging approaches and developed a
technique that incorporates their best features to more accurately model the neurovascular
coupling. The balloon model of cortical vasculature is used to simulate the hemodynamic
response. We then apply the technique to winner-take-all circuits as an example of a
representative microcircuit. We show that in the absence of constraining neurophysiological
data these models can be parameterized such that the results of synthetic brain imaging offer
novel interpretations of imaging experiments, and when such data are available the model
160
can offer a causally complete view of the underlying processes that can make contact with
data from multiple experimental techniques.
5.1 Neural Basis of the BOLD Signal
Recent studies of the relationship between neural and vascular activity have validated the
use of integrated synaptic activity, including inhibitory activity, in generating the
hemodynamic response. However, the latest consensus (although still debated) is that
synaptic activity generates blood flow-inducing signals and increased metabolism in parallel,
as opposed to metabolic activity itself leading to increased blood flow. With the proliferation
of neuroimaging studies in the 1990’s, the relationship between neural activity and the
hemodynamic response has been a topic of intense research. Some studies have found an
approximately linear relationship between spiking activity and the fMRI signal (Heeger and
Ress, 2002). Arthurs & Boniface (2002) review papers that show a predominantly linear
correlation between neural activity and BOLD response, but this has been shown to only
describe a narrow range of the response, with nonlinearities evident at lower and higher
ranges (Devor et al., 2003; Hewson-Stoate et al., 2005; Lauritzen and Gold, 2003; Sheth et
al., 2004). Goense & Logothetis (2008) simultaneously recorded the BOLD signal, multi-unit
activity (MUA), and local field potential (LFP) in V1 of awake macaque monkeys. MUA is a
measure of pyramidal neuron spiking and the LFP mainly reflects synaptic processing. They
found that the LFP was a better predictor of the BOLD signal than MUA. This was not just a
difference in magnitude – the two signals were often dissociated, with the LFP correctly
predicting a BOLD response in the absence of significant spiking activity. Such data guide
161
our choice in using integrated synaptic activity to generate the blood flow inducing signal, as
previous synthetic brain imaging approaches have done.
Research on neurometabolic coupling has focused on the energy requirements of
excitation and inhibition, noting that both increase glucose consumption and that the main
metabolic activity is in presynaptic terminals and postsynaptic areas (Jueptner and Weiller,
1995). A major focus in the metabolic requirements of excitation has been the so-called
astrocyte lactate shuttle. Astrocytes take up excess glutamate and provide glutamine and
lactate to neurons in return, which requires the use of Na+/K+-ATPase, necessitating glucose
uptake (Magistretti and Pellerin, 1999). While the metabolic requirements of glutamergic
signaling have been worked out, the metabolic requirements of other modulatory and
inhibitory neurotransmitters is less well-known. Waldvogel et al. (2000) used event-related
fMRI to evaluate the BOLD response to inhibition or excitation induced using TMS. Unlike
excitation, inhibition resulted in no significant change in the BOLD signal, leading the
authors to suggest that inhibition is less metabolically demanding than excitation. Indeed,
less energy is required by inhibitory neurons to pump Cl
-
ions back across their gradient
because it is not as steep as that of Na+ and K+ ions (Attwell and Iadecola, 2002). However,
Lauritzen & Gold (2003) point out that although it is not known how metabolically
demanding GABAergic inhibition is, the excitation of inhibitory interneurons is energy
intensive. Our model therefore includes all synaptic activity in inhibitory neurons.
Numerous signals have been proposed to mediate neurovascular coupling such as
metabolites (lactate, K+, H+, adenosine), neurotransmitters (vasoactive intestinal peptide,
acetylcholine, noradrenaline), and nitric oxide (NO, Magistretti and Pellerin, 1999). Attwell
162
& Iadecola (2002) suggest that blood flow is controlled locally by glutamate via glutamate-
induced Ca
2+
influx which activates production of NO, adenosine, and other metabolites. The
first direct evidence of neurovascular coupling was found by Cauli et al. (Cauli et al., 2004)
who showed that GABAergic interneurons express vascular-modulating proteins and that the
firing of a single interneuron can induce dilation or constriction of neighboring microvessels.
5.2 Methods
5.2.1 A New Model of Synthetic Brain Imaging
In order to facilitate both region- and voxel-based analysis of simulated PET and fMRI
responses, simulated neurons in each region are grouped into virtual voxels. Each virtual
voxel contains an instantiation of the extended balloon model with Gaussian crosstalk
between adjacent voxels (as in, Babajani et al., 2005). In the simulations described below we
do not spatially locate voxels within regions and therefore do not exploit this crosstalk
feature; however this is possible with more detailed neural models (see Discussion).
In order to simulate the hemodynamic response, we extend Riera et al.’s (2006) use of
Friston et al.’s (2000) approach in grouping these various neurovascular coupling signals into
a generic flow-inducing signal. On the basis of their simulation experiments, they suggested
that the hemodynamic response more accurately reflects transmembrane currents, which
includes synaptic current, electrophysiological currents, and leakage current. We thus include
not only total synaptic activity, but also other transmembrane currents in generating the
neurovascular coupling signal:
( ) ( ) ( ) ( ) ( ) ( ) ( )
AMPA NMDA GABAa GABAb
AMPA m NMDA m GABAa m GABAb m m
m
y t g s t g s t g s t g s t q t = + + + +
∑
163
where the sum is over all neurons, m, in the virtual voxel, and q(t) includes all nonsynaptic
transmembrane currents. Rierra et al. (2006) introduced a baseline neurovascular coupling
signal, y
0
, in generating the blood flow-inducing signal (see Synthetic Brain Imaging
Approaches, Neurovascular Coupling Mechanism, above). However one problem with using
this formulation in large-scale models with multiple regions is that of normalization
according to the number of activated neurons. The number of neurons activated in a region
affects the baseline neurovascular coupling signal, which would invalidate comparisons
between regions and with experimental data. Furthermore, the blood flow-inducing signal is
used as input to the balloon model, whose variables are in normalized units. Therefore, we
extend Rierra et al.’s formulation to normalize the neurovascular coupling signal according
to its baseline level:
( ) ( )
0
0
1
h in
i f
y t y f di i
dt y
τ
ε
τ τ
- - -
= - +
We use Zheng et al’s extension of the balloon model without modification to generate rCBF
and BOLD signals from this blood-flow inducing signal. This system allows synthetic PET
and fMRI signals to be generated simultaneously from each voxel by sampling the rCBF or
BOLD signals, respectively, according to the repetition time (TR) of the scan.
In its full form this synthetic brain imaging approach requires a neural model with
conductance-based synapses and the localization of least networks within brain regions. The
model could also be used with simpler neural models by substituting a different
neurovascular coupling signal such as those described above (see Synthetic Brain Imaging
Approaches, Neurovascular Coupling Signal). This synthetic brain imaging model can be
164
used with many choices of neural and network models, but in the Methods section we will
present a specific choice of neuron and network consistent with these criteria.
To demonstrate the power of the method, we present results from two simple models
using competitive winner-take-all (WTA) circuits. Winner-take-all (WTA) or race models
have frequently been used to account for psychophysical data in decision-making tasks.
Given multiple inputs, these models converge on the strongest one and inhibit the rest
through feedback excitation and surround inhibition. This makes them well suited for
decision-making tasks where evidence for multiple alternatives is integrated and one must be
selected. The first model simulates the recognition of gestures for imitation, using one WTA
network to recognize an observed gesture given its similarity to known gestures and another
WTA network to select a gesture to perform given the recognized gesture. We show that in
the absence of data to constrain the neural model, it can be parameterized such that it
performs the task but yields predictions of the rCBF signal that differ from those generated
by a conceptual model. The second model simulates a saccade decision task in which WTA
networks select the net direction of a random dot motion display. In this case data from
behavioral, neural recording, and microstimulation experiments are available to constrain the
model, and we show that in this case it reproduces the results of published fMRI
experiments.
5.2.2 Neural Model
The neural model used is the Izhikevich (2004) spiking model. This simple model has
few free parameters and can reproduce a variety of spiking patterns seen in vitro in many
different classes of neurons. The basic formulation of this model is described here, but see
165
(Izhikevich, 2004) for descriptions of each parameter and demonstrations of various spiking
patterns that the model can exhibit. The model is based on two differential equations
describing the evolution of the membrane potential, v:
( ) ( ) ( ) ( ) ( ) ( )
m r t syn
dv
C k v t v v t v u t I t
dt
= - - - -
and a membrane recovery variable, u:
( ) ( ) ( )
r
du
a b v t v u t
dt
= - -
where C
m
is the membrane capacitance, v
r
is the resting potential, and v
t
is the instantaneous
threshold porential. Once the membrane potential reaches v
peak
, a spike is generated and the
membrane potential is reset:
( )
( )
( ) ( )
if , then
peak
v t c
v t v
u t u t d
←
≥
← +
Note that neither v
peak
nor v
t
are thresholds for spike generation since this neural model
generates the upstroke of the action potential, which greatly reduces numerical errors in spike
timing (Izhikevich, 2007). The parameters a, b, c, d, and k control the pattern of spiking that
the neuron exhibits. The parameter a is the recovery time constant, b determines whether the
neuron acts as an integrator or resonator, c is the voltage reset variable, and d determines how
currents activated during a spike affect post-spike membrane potential. The value I
syn
represents the sum of all synaptic currents.
In order to simulate in vivo conditions, simple synapses were added to each modeled
neuron. NMDA, AMPA, GABA-A, and GABA-B -mediated postsynaptic currents were
166
modeled. The following formulation is based on that of Deco et al., (2004) and Dayan &
Abbott (2001). The total synaptic current, I
syn
, is then:
( ) ( ) ( ) ( ) ( )
syn AMPA NMDA GABAA GABAB
I t I t I t I t I t = + + +
where each synaptic current is given by its maximal conductance multiplied by the difference
between the membrane potential and its reversal potential, and the fractions of open channels
in each synapse from presynaptic neuron, j:
( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
AMPA
AMPA AMPA AMPA j
j
NMDA
NMDA NMDA NMDA v j
j
GABAA
GABAA GABAA GABAA j
j
GABAB
GABAB GABAB GABAB j
j
I t g v t E s t
I t g v t E g t s t
I t g v t E s t
I t g v t E s t
= - = - = - = - ∑
∑
∑
∑
Synaptic currents are constrained according to the maximum conductance and time constants
of each channel taken from experimental literature. The extra term, g
v
, in the NMDA current
equation is based on the equation from Jahr & Stevens (1990) fit to experimental data on the
voltage dependence of NMDA channel which is controlled by extracellular Mg
2+
concentration,
2
CMg
+
:
( )
( )
1
2
16.13
1
3.57
v t
mV
v
CMg
g t e
mM
- - +
= +
The fractions of open channels for each presynaptic neuron j with synaptic connection
weight w
j
are given by:
167
( ) ( )
( ) ( ) ( ) ( )
( )
( ) ( )
( ) ( )
1
AMPA
j AMPA k
AMPA j j j
k
NMDA
j decay NMDA NMDA
NMDA j NMDA j j
j rise k
NMDA j j j
k
GABAA
j GABAA k
GABAA j j j
k
GABAB
j GABAB k
GABAB j j j
k
ds
s t w t t
dt
ds
s t x t s t
dt
dx
x w t t
dt
ds
s t w t t
dt
ds
s t w t t
dt
τ δ
τ α
τ δ
τ δ
τ δ
= - + - = - + - = - + - = - + - = - + - ∑
∑
∑
∑
where the sums over k are a sum over spikes from presynaptic neuron j at time
k
j
t . The rise
time of AMPA and GABA currents are neglected because they are typically less than 1ms.
The Izhikevich neural model with the given synapse model fulfills the requirement of
providing the synthetic brain imaging model with a measure of synaptic conductance for
each neuron. Although the model does not include biophysically meaningful variables for
nonsynaptic transmembrane currents, it does contain a membrane recovery variable, u, which
represents slow currents that modulate spike generation. We therefore set q(t) in the
neurovascular coupling signal equation equal to u(t). The model is more efficient than
compartmental models and thus better suited to large-scale simulations (Izhikevich, 2004),
but has more parameters that can be estimated from experimental data (when available) than
simpler models such as leaky integrators.
5.2.3 Winner-Take-All Circuit
A basic winner-take-all (WTA) network is built using a population of pyramidal cells and
inhibitory interneurons. In all simulations reported here, each voxel contained an entire WTA
network including both populations and was localized to a brain region, but not specifically
168
within it. The parameters of each Izhikevich neuron are set to random values in a distribution
centered on values that result in the most common spiking patterns of these types of neuron.
Pyramidal cells excite nearby pyramidal cells through AMPA and NMDA synapses and
inhibit distant pyramidal cells via inhibitory interneuron GABAergic projections (Figure
5-1). This center-surround connectivity implements the WTA dynamics. Given multiple
inputs, network activity converges on a population code centered on the highest intensity
input.
NMDA
AMPA
NMDA
AMPA
GABAA
GABAB
Time
Pyramidal Neurons
Inhibitory
Interneurons
Membrane potential (mV)
Firing rate (Hz)
Figure 5-1 Left: The membrane potential (mV) of the pyramidal neurons and inhibitory
interneurons in a one-dimensional winner-take-all network for 5s after two conflicting inputs
are applied. While excitatory and inhibitory neurons are shown in separate layers to lay bare
the mathematical structure of the model, the two types of neuron co-occur in each voxel.
Right: The corresponding firing rate (Hz) of the pyramidal neurons of this network.
The excitatory and inhibitory weight matrices are determined according to a formula
similar to those used in continuous WTA formulations (Amari, 1977) such as DNFs
(Erlhagen and Schoner, 2002).
The excitatory weight kernel, WE, is a Gaussian function with height w
excite
and width σ
e
of
size N with the n
th
element given by:
169
( )
2
2
/ 2
2
e
n N
n excite
WE w e
σ
- - =
The inhibitory weight kernel, WI, is an inverted Gaussians, with the n
th
element given by:
( )
2
2
/ 2
2
i
n N
n inhibit inhibit
WI w w e
σ
- - = -
where w
inhibit
is the maximum inhibitory connection weight, N is the size of the kernel, and σ
i
is the width of the Gaussian. Each pyramidal neuron n projects to the AMPA and NMDA
synapses of the corresponding interneuron n with a connection strength equal to w
ie
.
5.3 Results
We applied the synthetic brain imaging model to simple models of praxis and random dot
motion discrimination. The praxis model simulation demonstrates that without data to
constrain the neural model, it can be parameterized such that it performs the task but yields
predictions of the rCBF signal that support a radically different interpretation of published
PET data. The random dot motion discrimination model was parameterized using a genetic
algorithm to fit its activity to behavioral and neural recording data, and was then validated
against data from microstimulation experiments. Synthetic fMRI on the model reproduces the
results of human fMRI experiments using the same task, demonstrating the ability of this
synthetic brain imaging technique to generate accurate predictions when the neural model is
suitably constrained by experimental data.
5.3.1 Synthetic PET on a Model of Praxis
Apraxia is a disorder of skilled movement that results in dissociations between
meaningful and meaningless gesture imitation. A classic conceptual model of apraxia
separates the praxis system into an indirect route for meaningful gesture imitation, composed
170
of an input praxicon for recognizing meaningful gestures and an output praxicon for
generating them, and a direct route for meaningless gesture imitation (Rothi et al., 1991). In a
PET study, Peigneux et al. (2004) failed to find evidence for a distinct input and output
praxicon when comparing activation during familiar vs. novel gesture imitation. They used
familiar and novel gesture naming and imitation conditions to attempt to isolate the input and
output praxicons and their roles in the direct and indirect routes. While they did find
neuroanatomical evidence for this distinction, they found that the neural substrates of the
input praxicon were more activated during novel gesture imitation. This is at odds with their
predictions from Rothi et al.’s conceptual model, which suggested that the input praxicon
should be activated to a greater extent during familiar gesture imitation. They therefore
suggested that the model should combine the two praxicons.
We simulated the input praxicon as a WTA network projecting to an efferent population
of pyramidal neurons, the output praxicon. The input array to the WTA network had an
element for each known gesture whose input represented the confidence level that it matched
the observed gesture. WTA convergence to activity focused around one unit signaled
successful recognition of the corresponding gesture. Low contrast input simulated
presentation of a novel gesture - the network did not recognize the gesture (Figure 5-2, left).
The high contrast condition simulated presentation of a familiar gesture (Figure 5-2, right)
and the resultant activation of the output praxicon to “imitate” the observed movement.
171
Observed
Gesture
Visual Analysis
Time (ms)
Recognized
Gesture
Input
Praxicon
Output
Praxicon
Produced
Gesture
Visual Analysis
Familiar
Novel
Simulated rCBF Response
Time (s)
Familiar
Novel
0
500
250
0
100
Time (ms)
0
500
250
0
150
1 100
Neuron
Time (ms)
0
500
250
0
100
Time (ms)
0
500
250
0
150
1 100
Neuron
0 60 30
Time (s)
0 60 30
Firing Rate (Hz) Firing Rate (Hz)
0
6
Percent Change 0
8
Percent Change
Firing Rate (Hz) Firing Rate (Hz)
Figure 5-2 The firing rates (Hz) of the pyramidal cells in the input praxicon (top row) and
output praxicon (bottom row) networks after application of low intensity / low contrast (left
column) and high intensity / high contrast (right column) inputs. The hemodynamic response
of each network is shown in the middle column during the familiar (blue) and novel (red)
conditions.
While the input praxicon had the highest BOLD response in the familiar conditions, the
output praxicon had the highest BOLD response in the novel conditions (Figure 5-2, middle).
Therefore, Peigneux et al.'s results do not necessarily mean that there is no separation of
input and output praxicons since this model predicts a higher BOLD response in regions
corresponding to the input praxicon during familiar gesture imitation, but a higher response
in output praxicon regions during novel gesture imitation.
172
The neurophysiological data needed to constrain the model are not available. However,
these simulations show the danger in simple verbal interpretation of brain imaging data, even
when based on conceptual models. Since rCBF is more directly correlated with synaptic
activity than spiking output it also represents afferent and inhibitory input, as well as
intraregional processing. While there is currently not enough data to support or refute
hemodynamic predictions generated from this computational model or Rothi et al.’s
conceptual model, synthetic brain imaging on computational models at least allows the
formulation of very specific hypotheses concerning the relationship between the task and
hemodynamic response. When such data do become available the computational model can
be further validated or refined.
5.3.2 Synthetic fMRI on a Model of Motion Direction Discrimination
A commonly used perceptual decision-making paradigm is the left-right direction-of-
motion discrimination task (Morgan and Ward, 1980; Newsome et al., 1989). This involves
deciding the net movement direction of a set of randomly moving dots and indicating the
decision by making a saccade in that direction. Stimulus coherence is varied on each trial by
changing the proportion of dots moving together in one of two possible directions (known in
advance). A series of electrophysiological experiments have characterized a network of
regions in the monkey brain thought to be involved in this task including the visual area MT,
lateral intraparietal area LIP, and frontal eye field FEF. It is thought that neurons in MT
detect the low-level motion information, LIP neurons integrate input from MT over their
receptive fields, and FEF selects a saccade direction based on input from LIP.
173
The network model includes two WTA circuits arranged in series which represent regions
LIP and FEF. LIP is anatomically positioned midway through the sensory-motor chain, with
inputs from MT and MST and outputs to the FEF and superior colliculus (SC, Andersen et
al., 1990; Andersen et al., 1992; Blatt et al., 1990; Lewis and Van Essen, 2000). The
pyramidal cells of the LIP network therefore receive input from MT and project to the
pyramidal cell layer of FEF. Each region has the same number of neurons, and each neuron
has a preferred direction in angular space in the interval [-π, π]. In the area MT this preferred
direction corresponds to the direction of motion of stimuli in the visual field, and in area FEF
and LIP the preferred direction encodes the direction of a saccade.
Area MT is topographically mapped so that 180 degrees of motion direction are
represented in 400-500μm of cortex (Albright, 1984), making it reasonable to assume that all
direction preferences are equally represented within each fMRI voxel in MT (Albright, 1984;
Maunsell and Van Essen, 1983). We therefore include all neurons in the model MT network
in one virtual voxel. The LIP and FEF networks are treated the same way, resulting in three
virtual voxels that correspond to areas MT, LIP, and FEF.
The LIP and FEF regions were modeled as WTA networks, while the area MT was
modeled by a set of Poisson spike generators firing at frequencies given by a model of MT
(Rees et al., 2000). Since the possible response directions are known in advance, the LIP
pyramidal neurons additionally received spiking input from a set of Poisson spike generators
representing top-down influence on the direction of a performed saccade. The output of the
model was a direction in which to perform a saccade, which was determined by decoding the
population firing rate of FEF using the center-of-mass technique (Wu et al., 2002). Since in
174
this task the saccades are all of the same distance we do not model distance preference in
FEF cells. In a more complete model FEF might be modeled as a two-dimensional WTA
network with both distance and directional preferences.
Parameters of the model that could not be directly constrained by experimental data were
set using a genetic algorithm that determined the model’s fitness by comparing its activity
and output to published neurophysiological and behavioral data. The model was then
validated by comparing the results of microstimulation and synthetic fMRI simulations to
those of corresponding experiments.
5.3.2.a Neurophysiological Activity
The model was tested on six coherence levels (0, 3.2, 6.4, 12.8, 25.6, 51.2) for 100 trials
at each level. On each trial, the direction of coherently moving dots was randomly set to –π/2
or π/2. The firing rate of each pyramidal neuron was estimated from its spike times using the
same method as Palmer, Huk & Shadlen (2005). Due to the fitness function of the genetic
algorithm used to set model parameters and the intrinsic dynamics of the WTA network, the
pyramidal neurons in LIP converge on a population code centered on the chosen saccade
direction (Figure 5-3). Since the average firing rate of FEF cells is 100Hz before saccade
initiation (Hanes and Schall, 1996), the response time (RT) was defined as the time taken for
the maximum firing rate of FEF to reach 100Hz. As the stimulus coherence increased, FEF
reached this threshold sooner, and the RT decreased.
175
Pyramidal Neuron
20
40
60
80
100
Pyramidal Neuron
20
40
60
80
100
Time (s)
Pyramidal Neuron
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
20
40
60
80
100
Time (s)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Firing Rate (Hz)
0
10
20
30
40
50
Firing Rate (Hz)
0
40
80
120
Time (s)
0 0. 1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Firing Rate (Hz)
0
50
100
125
MT
LIP
FEF
Figure 5-3 The firing rate of the pyramidal populations of (rows, top to bottom): MT, LIP, and
FEF during the (columns, left to right): 3.2%, 12.8%, and 51.2% stimulus coherence level
conditions when the net direction of dot motion is to the right. The solid white lines denote
stimulus onset and the dashed white lines denote the time of response.
5.3.2.b Behavioral Measures
The model output was deemed correct when the generated saccade was in the same
direction as the net motion of the dot kinetogram. Palmer, Huk & Shadlen (2005) showed
how the accuracy and response times of human subjects performing the same task could be
fit to psychometric functions predicted by a proportional-rate diffusion model. A diffusion
model is a continuous version of a random walk decision model in which a noisy signal is
sampled and evidence for one decision versus another is accumulated over time (Ratcliff and
Rouder, 1998). Note that this is a high-level description of the behavior of linked WTA
networks with noisy input. The psychometric function for accuracy gives the proportion of
correct responses for a given stimulus strength x:
( )
2
1
1
C A kx
P x
e
′ - =
+
176
where the free parameters A’ and k are the normalized bound and sensitivity, respectively.
The chronometric function for mean response time is:
( ) ( ) tanh
R
A
RT x A kx t
kx
′
′ = +
where the free parameter t
R
is the mean residual time.
Response Time (s)
0.32
0.34
0.36
0.38
0.4
0.42
0.44
Proportion Correct
0.5
0.6
0.7
0.8
0.9
1.0
1.1
0 1 10 100
Motion Strength (% coherence)
0 1 10 100
Motion Strength (% coherence)
Figure 5-4 The mean response time (left) and accuracy (right) as a function of motion
strength. The error bars denote standard error. The solid curves show the fitted psychometric
functions (A’=.6, k=18, t
R
=.29, ln(L)=3.2).
The response time and behavioral accuracy of the model at each coherence level over 100
trials were fitted to the psychometric and chronometric functions predicted by the
proportional-rate diffusion model using the same maximum log-likelihood technique used by
Palmer, Huk & Shadlen (2005) (Figure 5-4). The parameter values of the fit were A’=.6,
k=18, t
R
=.290s (ln(L)=3.2), which are within the range of human subjects (Palmer et al.,
2005). Note that this feature of the model was built-in, since the accuracy of the fit to these
177
functions was a part of the fitness function used by the genetic algorithm to set model
parameters.
5.3.2.c Microstimulation Simulations
To validate the model, we replicated the stimulation protocols of several microstimulation
experiments that also used the direction-of-motion discrimination task. This data was not
included in the genetic algorithm used to set model parameters and therefore these
simulations serve as verification of the model’s behavior. Stimulation of area direction-
selective MT neurons causes monkeys to bias their decisions in favor of the preferred
direction of the stimulated neurons (Salzman et al., 1990; Salzman et al., 1992) as well as
shorten RT when the preferred direction is chosen (Ditterich et al., 2003). Microstimulation
in LIP evokes similar effects, but to a much lesser extent than MT stimulation (Ditterich et
al., 2003; Hanks et al., 2006). FEF microstimulation evokes a short-latency saccade whose
amplitude and direction are determined by the stimulation site (Gold and Shadlen, 2000,
2003). Stimulation of the FEF network in the model will necessarily evoke a saccade by
model design; therefore we validated the model by performing MT and LIP microstimulation
simulations.
178
0.3
0.32
0.34
0.36
0.38
0.4
0.2
0.4
0.6
0.8
1
−60 −40 −20 0 20 40 60
Response Time (s)
Proportion of PREF Choices
Motion Strength (% coherence)
Figure 5-5 The response time (top) and accuracy behavioral measures during the control
(solid), MT stimulation (dashed) and LIP stimulation (dotted) simulations for each stimulus
strength tested. The curves show the fitted chronometric and psychometric functions. The
error bars denote standard error.
Just as in the experimental data, microstimulation of the model MT network biased the
decision process toward faster responses to the preferred direction of the stimulated neurons,
resulting shifts in the fitted chronometric and psychometric functions (Figure 5-5). LIP
stimulation evoked similar, but smaller effects. The differences in residual times for leftward
versus rightward saccades were due to asymmetries in the model network. These were caused
by the randomization of some network parameters. Note that this directional bias in response
times is also evident in the experimental data (Hanks et al., 2006).
179
5.3.2.d Synthetic fMRI
Using fMRI, Rees, Friston & Koch (2000) found that during the random motion
discrimination task, the BOLD signal in area MT was linearly correlated with motion
strength. This seems surprising since areas LIP and FEF are known to be involved in the task
as well. We ran the model on the same task and compared the maximum fMRI signal in each
region at different levels of motion coherence. The model replicated these results, showing a
significant positive correlation between MT activation and stimulus strength (R=.9664), but
not so for LIP and FEF (Figure 5-6). Although LIP and FEF were activated by the task, in
these regions intraregional processing due to the WTA dynamic contributed more to the
hemodynamic response than afferent input. Since a response had to be selected regardless of
the coherence level, these regions had similar BOLD responses across all level of stimulus
strength. Since in this model MT was not a WTA network, its hemodynamic response was
dominated by synaptic activity due to afferent input.
180
0 20 40 60 80 100
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
Motion Strength (% coherence)
fMRI Response Amplitude (%)
Figure 5-6 The percent change in fMRI response amplitude across multiple coherence levels
and linear functions fit to the data for MT (solid line), LIP (dashed line), and FEF (dotted
line). The error bars denote standard error.
A crucial issue in these simulations and indeed all synthetic brain imaging simulations, is
how much of the detail of the network and neural model is relelvant to the findings. We have
demonstrated that the results of human fMRI experiments can be reproduced using a fairly
detailed neural model with parameters set from experimental literature and a simple network
model with parameters set by fitting the model activity to behavioral and neurophysiological
data. Further simulations are required to determine if the same results can be achieved
variants of the model.
5.4 Discussion
In the first set of simulations we showed that in the absence of constraining
neurophysiological data and a computational model, ad-hoc verbal analyses of the results of
181
brain imaging studies are premature at best. We demonstrated that a simple model
performing the same task used in an fMRI study of imitation can be parameterized to yield
BOLD responses drastically different from those predicted by the experimenters. Since this
fMRI study involved anatomical localization of components of a conceptual model based on
functional activation, incorrect assumptions concerning the hemodynamic signals generated
by different components during the task will result in a misleading interpretation of the data.
The second set of simulations demonstrated that when the appropriate data are available, a
model can be validated from multiple sources such as neurophysiological, behavioral, and
using synthetic brain imaging, neuroimaging data.
The model thus goes beyond previous synthetic brain imaging approaches that use firing
rate-based leaky integrator models (Arbib et al., 1995) and leaky integrate-and-fire neurons
(Deco et al., 2004). The model more accurately reflects the contributions made by excitation
and inhibition to the BOLD signal (within the assumption that it is due to synaptic activity)
by modeling synaptic currents using parameters from experimental literature. At a more fine-
grained level, the Izhikevich neural model allows much higher precision concerning spike
timing than leaky integrate-and-fire models since it actually generates the action potential up-
shoot rather than artificially resetting the membrane potential once it reaches an arbitrary
threshold (Izhikevich, 2007).
In most cases the data needed to constrain such a detailed model are not available.
However, by including as many biophysically meaningful variables as possible, the model
can parameterized within a plausible range of values and easily extended when the necessary
data become available. We assigned synapse and neural model parameter values according to
182
published experimental data for typical cortical pyramidal cells and interneurons. Non-
biophysically meaningful neural parameters such as those controlling the behavior of the
Izhikevich model are set using values that reproduce spiking patterns seen in pyramidal cells
and interneurons (see, Izhikevich, 2004). Network parameters, such as the connection
weights between layers, cannot be estimated from existing data. In order to set these
parameters, we used a genetic algorithm utilizing the model’s fit to neurophysiological and
behavioral data as the fitness function. This strategy should prove useful when such data are
available. However, for models such as the imitation model, this data is not currently
available. In these cases, several competing models should be developed and simulations run
to determine experimental manipulations that could disambiguate them.
183
Chapter 6 - Future Work
6.1 Inferring Hidden Actions
The monkey mirror system is capable of inferring the final result of a grasp given the
initial sight of an object and a preshaped hand directed towards it even when the object and
hand are subsequently obscured. Given the data of Umilta et al. (2001), it is not clear whether
or not a working memory representation of the hand is used to extrapolate the grasp
trajectory or if the initial hand state trajectory coupled with object location working memory
is sufficient to correctly activate F5 mirror neurons. Further experiments are need to assess
the duration of object working memory and to tease apart the differential contributions of
hand and object working memory to F5 mirror neuron activation. First, a series of
experiments replicating the hidden grasp condition of Umilta et al. (2001), but varying the
time between when the object is hidden and when the reach to grasp is initiated is needed to
assess the duration of object working memory. Once this is determined, the involvement of
working memory for the observed hand state (if it exists) could be determined by gradually
receding the point in the grasp at which the experimenter's hand disappears behind the screen
until the hand is behind the screen for entire grasp duration (but within the bounds of object
working memory duration). This would vary the time that the hand state information must be
maintained in hand working memory. If the mirror neuron activity decays as this time is
increased, this could be evidence that a transient working memory activation is supplying
hand state information to area F5 when the hand is not visible. To test whether or not the
process we posited in the MNS2 model to update the working memory representation of the
hand position (dynamic remapping) is actually employed by the primate mirror system, a
184
fake hidden grasp where the hand overshoots the object (as in our simulation experiments,
see Figure 2-16) could be presented to the monkey. The MNS2 model predicts a transient
level of high mirror neuron activation as the hand nears the object (behind the screen) that
dramatically declines as it passes the object. If the grasp-related mirror neurons still respond
to this example of a hidden overshot-grasp, this could be an indication that hand trajectory
extrapolation, or dynamic remapping, is not used in hidden grasp recognition.
6.2 Inferring Intentions
It is not clear whether or not the mirror system infers the actual outcome of the action, or
the actor's intent in executing it. It has been proposed that mirror neurons are a part of, or a
precursor to a simulation theory of mind-reading (Gallese and Goldman, 1998). Kuroshima et
al. (2002) showed that capuchin monkeys can learn to infer whether or not a human knows
the location of an object. We propose an experiment to test the involvement of the monkey
mirror neuron system in inferring mental states and beliefs. Experimenter A would show an
object to the monkey and then place it behind a screen. The monkey would then observe
experimenter B remove the object from behind the screen after experimenter A leaves the
room. At this point the monkey should know that the object is not behind the screen, but that
experimenter A believes that it is. Now if experimenter A returns and reaches behind the
screen, the monkey's grasp-related mirror neurons should discharge if they infer intention,
because experimenter A believes the object is behind the screen and intends to grasp it. If
however, the mirror neurons code the predicted result of the executed action, they should be
silent because the monkey knows that the object is not behind the screen and that therefore
the observed reach-to-grasp will not make contact with it. The results of this experiment
185
could yield insight into the possible involvement of mirror neurons in primate theory-of-
mind.
6.3 Integration of MNS2 and ILGA
Integration of ILGA with models of the primate mirror system like MNS2 could provide
the system with a feedback signal for skilled grasping. The neurons in the AIP module of
ILGA correspond to visual dominant neurons in monkey area AIP and project to the F5
module but do not receive reciprocal connections from F5. In reality, area AIP also contains
neurons classified as visuo-motor and motor dominant and receives feedback from F5
(Sakata et al., 1995). Motor dominant cells respond during grasping in the light and the dark
while visual dominant cells respond only during object fixation and grasping in the light and
visuomotor cells respond during object fixation and grasping in the dark but fire most
strongly during grasping in the light. Sakata et al. (1995) offer a conceptual model of
feedback-based grasping in which F5 canonical neurons provide AIP motor dominant
neurons with a copy of the grasp motor command, which then pass this signal to AIP visuo-
motor neurons which combine this information with information from AIP visual dominant
neurons and project back to F5. In this way if the ongoing grasp does not match the encoded
affordance the grasp plan in F5 is modified or aborted. AIP visual dominant neurons are
classified into object type neurons that fire during object fixation, and non-object type
neurons that fire during grasping in the light but not object fixation and may respond to the
sight of the hand during the grasp. Non-object type neurons are seldom mentioned in
discussions of AIP, but make up half of visual dominant neurons in the region (Sakata et al.,
1995). Interestingly, their existence fits with the hypothesis outlined by Oztop & Arbib
186
(2002) that F5 mirror neurons evolved to provide visual feedback of the shape of the hand
relative to the object’s affordances. We suggest that non-object type AIP neurons obtain their
properties by projections from F5 mirror neurons, and that these projections are used for
visual feedback-based control of grasping. It has been shown that reversible inactivation of
F5 mirror neurons by muscimol injection in the cortical convexity of the arcuate sulcus
results in slower, clumsy grasps (Fogassi et al., 2001), consistent with the idea of F5 mirror
neurons playing a role in providing visual feedback. Our conceptual model predicts that
muscimol injection in the cortical convexity of F5 will abolish the response of non-object
type AIP neurons to the sight of the grasp.
In addition to reversible inactivation of mirror neurons, Fogassi et al. (2001) tested the
effects of muscimol injection in the bank of the arcuate sulcus where mostly F5 canonical
neurons are located. In this case, the hand preshape was impaired but monkeys were still able
to grasp the object by contacting it and then making appropriate corrective movements using
tactile feedback. This seems similar to the process modeled by Grupen & Coelho (2000) in
which haptic feedback is used to reposition contact forces. The fact that corrective
movements can still be made after F5 inactivation suggests that they are not based on F5
activity, and may be implemented by the direct projection included in the FARS model from
the primary somatosensory area S1 to the primary motor cortex.
6.4 Skilled Grasping
While ILGA accounts for the development of visually dominant AIP neurons, it does not
include motor and visuomotor neurons. It is thought that these neurons complete a feedback
loop between AIP and area F5 that is used to perform feedback-based control of grasping and
187
other manual actions. Indeed, transient inactivation of human area AIP using TMS causes
deficits in error correction during online control of grasping (Tunik et al., 2005). In order for
the model to include feedback control to guide the fingertips to the object’s surface, two
extensions are required. The first is a region to represent patches of an object’s surface to
serve as targets to bring the fingertips to, and the second is an inverse kinematics model of
the hand and wrist to bring the desired virtual fingers to these targets.
Since the orientation of the surface patches matters in programming a grasp, a candidate
region for the object surface patch representation would be cIPS since surface-orientation-
selective cells have been found there (Sakata et al., 1997). To make the target surface
representation invariant to object location, it may be represented in an object-centered
reference frame. Although such an organization has not been reported in cIPS, this may be
due to a lack of experiments using eye tracking to control for gaze position with respect to
the center of the object. A potential problem with this idea is that cIPS neurons have only
been found to respond to visible surfaces, whereas fingertips may potentially contact a
surface on the opposite side of an object during a grasp. The location of potential target
surfaces on the opposite side of a visible object must therefore be inferred from the 2 ½
dimensional sketch of the object provided by cIPS.
ILGA uses an inverse kinematics model of the arm to convert target wrist locations into
target joint angles for the shoulder and elbow. In order to perform more precise and dextrous
grasp and manipulation tasks, the extended model must include an inverse kinematics model
of the arm and hand that can convert target locations for any combination of virtual fingers
into target joint angles for the shoulder, elbow, wrist, thumb, and finger joints. ILGA uses the
188
pseudo-inverse of the Jacobian matrix for computing inverse reach kinematics, requiring a
3×4 matrix (4 controlled degrees of freedom, 3-dimensional wrist position). If the extended
model used the same method, this would require multiple Jacobian matrices, one for each
combination of virtual fingers and mapping to real fingers. Assuming that only two fingers
will contact the object, each Jacobian would therefore be a 6×22 matrix (22 controlled DOFs,
and two 3-dimensional virtual finger positions). Since multiple combinations of virtual
fingers, mappings to real fingers, and contact points are possible, this method may not be
tractable.
A model like ILGA could provide the scaffolding for such a model by generating a range
of stable grasps that can be used to learn both the prediction of unseen target surfaces. Given
a stable grasp, the haptic feedback from finger and hand contacts with the object can be used
to calculate an object-centered representation of the location of these contact points using the
location of the object (represented in V6a and LIP) and the posture of the harm and hand.
These contact point locations could then be used as a training signal for a network that
predicts the location of target surfaces given the representation of visible surfaces in cIPS.
Computing the positions of hand-object contact points given the arm/hand posture requires a
forward kinematics model of the entire arm and hand including shoulder, elbow, wrist,
thumb, and finger joints.
The extended model therefore requires a set of inverse/forward model pairs for the entire
arm/hand. The models could be learned using multiple model-based reinforcement learning
(MMRL; Doya et al., 2002) during the ILGA training period. MMRL is a method of using
reinforcement learning and predictor error to learn multiple inverse/forward model pairs, and
189
has been formulated in both discrete- and continuous-time and state cases. These models
could be learned offline while ILGA training progresses, until the prediction errors of the
forward models becomes small enough to allow their associated inverse models to control the
arm and hand.
6.5 Context-Dependent Grasps
At least two studies show that infants 1-2 years old selectively modify their actions based
on future planned actions. McCarty, Clifton, & Collard (1999) demonstrated that 9 and 14
month old infants grasp a spoon with their preferred hand, regardless of its orientation and
whether the next action was to bring the spoon to the mouth or another location. At 19
months, infants had learned to coordinate hand selection with the action goal and the spoon’s
orientation in order to facilitate the smooth execution of the next action. Claxton et al. (2003)
measured the arm kinematics of 10-month-old infants reaching for a ball and then either
throwing it or fitting it down a tube. They found that the reach to the ball was faster if they
then intend to throw it. Both of these studies suggest that infants preplan segments of
compound actions at some level. This point brings to light a shortcoming in both the ILGM
and the ILGA models. Both of these models use some evaluation of the stability of the grasp
as the metric for reinforcement. A more realistic model would use success in a subsequent
action with the object such as throwing or placing as the grasp reinforcement criterion. This
would require some representation of the task or goal with the ability to bias selection of
grasp targets and affordance representations to perform grasps appropriate for the planned
action.
190
In the original FARS model, working memory and task-specific associations in prefrontal
cortex bias grasp execution by modulating the grasp selection in F5 such that those grasps
appropriate for the current task are selected. Using synthetic brain imaging, a method to
compare global model activity with PET or fMRI data, Arbib, Fagg & Grafton (2002)
showed that a projection from PFC to AIP rather than F5 better explained human PET data.
A projection from PFC to AIP had not been reported at that time, but this prediction was later
validated both anatomically (Borra et al., 2007) and neurophysiologically (Baumann et al.,
2009). This could be included in ILGA by the addition of a prefrontal cortex module
encoding the task context with projections to the AIP module that are also modifiable
through reinforcement. Thus task representations in prefrontal cortex would presumably
become associated with the affordances and actions that lead to reward. Such a simulation
could shed insight into the interplay between cognitive and motor development by examining
the operation of the model without or without a pre-trained motor system or prefrontal
cortex. It could be that associative signals from an already trained prefrontal cortex could
interfere with the normal development of the parieto-premotor connection weights.
For a complex object, the representations in AIP would encode the affordances for each
of the object primitives it is composed of. Figure 6-1 shows the AIP representations for
cylinder and rectangular prism primitives making the handle and head of a hammer, and the
AIP representation for a hammer. The pattern of AIP activation for the hammer combines
those patterns elicited by the cylinder and rectangular prism. The prefrontal cortex module
could bias the AIP neurons selective for either the handle or head, resulting in selection of
grasps appropriate for using the hammer on a nail or putting it away.
191
1 40 1 40 1 40
1
40
Firing Rate
Figure 6-1 AIP activation (top row) when the model is presented with a cylinder (left
column), rectangular prism (middle column), and a cylinder and rectangular prism combined
in a hammer (right column).
6.6 Integration of MNS2, ILGA, and ACQ
In the ACQ simulations we used a simplified version of the MNS2 model. While the
complete version recognized actions on the basis of a trajectory of hand-object relations, the
simplified version recognizes actions based on a comparison of the environmental state
before and after the action is performed. This allows reinforcement learning to operate on a
discrete timescale. Continuous versions of reinforcement learning in general and temporal
difference learning in particular have been formulated (Doya, 2000). Future work could
utilize these methods with the full MNS2 model. Alternately, discrete-time reinforcement
learning of an action’s executability and desirability could be triggered once the firing rate of
a mirror neuron recognizing that action reaches some threshold.
192
Recently, researchers in motor control have focused on the notion that for each task for
which the CNS controls the body, the brain contains coupled forward and inverse models. It
has been suggested how to exploit these developments in extending our work on the mirror
system to address a wider range of imitative behaviors (Oztop et al., 2006b), and this
approach will be built on in the modeling proposed here by linking the new version of the
mirror system model, MNS3, with the ACQ model of action selection. ACQ contributes by
including the mobilization of motor programs (which correspond to inverse/forward models)
on the basis of affordance extraction, action desirability, and the output of a simplified mirror
system. A feature not included in the current version of ACQ, but proposed for development
under this modeling goal is the idea that mirror system recognition of another agent’s actions
might increase the probability of an action’s selection by priming its desirability, yielding
effects that may be described as effector or response facilitation. The linkage of inverse and
forward models to mirror neurons was introduced at the conceptual level by (Miall, 2003);
the concern here is to achieve testable hypotheses by exploiting the greater detail afforded by
computational analysis of biologically grounded neural networks.
6.7 Extensions to Synthetic Brain Imaging
The four main directions for future work concerning the synthetic brain imaging
technique itself are spatial localization of virtual voxels with consideration of inter-voxel
interactions, estimation of model parameters from analysis of experimental data, inclusion of
the effects of modulatory neurotransmitters on global CBF, and extension to electromagnetic
brain imaging (electroencephalography, EEG/ magnetoencephalography, MEG) signals.
193
The technique we currently use groups neurons from each region into virtual voxels but
does not spatially locate the voxel within the region. Our plan for future work involves using
stereotaxic brain atlases (such as the Talairach atlas) and available neurophysiological and
hodological data to assign each voxel a spatial location. The Tailarach atlas will provide a list
of coordinates for each brain region. Many neurophysiological and tract tracing studies
describe the distribution of neurons with various functional and connectivity properties
within a region in terms of gradients (i.e. neurons in region x project to region y in a dorsal-
ventral gradient with the strongest connections in the dorsalmost region). This information
will be used to probabilistically designate a neuron to a coordinate within a region based on
its response properties and connection strengths. Multiple virtual “subjects” will be generated
using this stochastic voxel localization method to simulate inter-subject anatomical and
functional variability. This will allow simulation of crosstalk between adjacent voxels (as
Babajani et al., 2005, do using spatial convolution with a Gaussian kernel, for example) and
generation of statistical parametric maps of significantly activated voxels for direct
comparison with those reported in experimental studies.
As pointed out by Attwell & Iadecola (2002) and Poznanski & Riera (2006), other
neurotransmitters such as dopamine, serotonin, and noradrenaline might globally modulate
CBF. Krimer et al (1998) have shown that the axons of dopaminergic neurons innervate
cortical microvessels and that dopamine induces vasomotor responses in vitro. This
complicates the interpretation of brain imaging studies in patients with conditions such as
schizophrenia and Parkinson’s. For example, serotonin and certain Parkinson’s treatments
result in dissociation between CBF and neural metabolism (Cohen et al., 1996; Hirano et al.,
194
2008). If synthetic brain imaging can be extended to include the effects of the altered amines
on global CBF, it could become a powerful tool for the interpretation of clinical
neuroimaging studies.
More recent large scale models have been constructed that can simulate both fMRI and
electromagnetic brain imaging (EEG/MEG) signals. Babajani et al. (2005) developed a
model that can produce both MEG and fMRI signals. In this model postsynaptic currents
(PSCs) are used as the link between the two imaging methods. As in our model, fMRI signals
were generated using the strength of the PSC (irrespective of direction) as input to the
extended balloon model. To generate the MEG signal, PSCs in only pyramidal cells were
summed taking into account the kind (excitatory or inhibitory), direction, and strength. The
equivalent current dipole (ECD) was set equal to the vector sum of all pyramidal cell PSCs.
The same method used in actual MEG studies was used to transform the resulting magnetic
field into voxel activations (the lead field from forward problem). Babajani A and H
Soltanian-Zadeh (2006) extended this model to include thalamo-cortical loops which are
important for generating oscillations in the frequency measured by EEG. Their modeling
showed that EEG depends only on the synaptic currents in pyramidal cells which are
modulated by these loops. Inclusion of this technique in our current system would be
straightforward and would extend the range of experimental data that our models could
address.
6.8 Synthetic Brain Imaging in Analyzing Real Imaging Data
Buxton et al., (2004) discuss how models of electrovascular coupling and the balloon
model could be applied in the analysis of real fMRI data. This was accomplished by Riera et
195
al., (Riera et al., 2004a; Riera et al., 2004b) who reformulated the differential equations used
in the extended balloon model into a state-space model that could be used to estimate its
parameters and hidden state-space variables from BOLD data. Riera et al., (2007) then
showed how parameters of such a generative model could be estimated from EEG and fMRI
data. These techniques could be used to generate balloon model instantiations tailored to
individual subjects in order to make specific predictions for their performance on other tasks.
196
References
Ahn, S., and Phillips, A. (1999). Dopaminergic correlates of sensory-specific satiety in the
medial prefrontal cortex and nucleus accumbens of the rat. Journal of Neuroscience 19, 29.
Albright, T. (1984). Direction and orientation selectivity of neurons in visual area MT of the
macaque. Journal of Neurophysiology 52, 1106.
Alexander, G.E., and Crutcher, M.D. (1990). Preparation for movement: neural
representations of intended direction in three motor areas of the monkey. J Neurophysiol 64,
133-150.
Almeida, R., and Stetter, M. (2002). Modeling the link between functional imaging and
neuronal activity: synaptic metabolic demand and spike rates. Neuroimage 17, 1065-1079.
Alstermark, B., Lundberg, A., Norrsell, U., and Sybirska, E. (1981). Integration in
descending motor pathways controlling the forelimb in the cat. 9. Differential behavioural
defects after spinal cord lesions interrupting defined pathways from higher centres to
motoneurones. Exp Brain Res 42, 299-318.
Amari, S. (1977). Dynamics of pattern formation in lateral-inhibition type neural fields. Biol
Cybern 27, 77-87.
Andersen, R., Bracewell, R., Barash, S., Gnadt, J., and Fogassi, L. (1990). Eye position
effects on visual, memory, and saccade-related activity in areas LIP and 7a of macaque.
Journal of Neuroscience 10, 1176.
Andersen, R., Brotchie, P., and Mazzoni, P. (1992). Evidence for the lateral intraparietal area
as the parietal eye field. Current Opinion in Neurobiology 2, 840-846.
Andersen, R.A., Essick, G.K., and Siegel, R.M. (1985). Encoding of spatial location by
posterior parietal neurons. Science 230, 456-458.
Arbib, M.A. (1981). Perceptual structures and distributed motor control. In Handbook of
physiology – the nervous system II, V.B. Brooks, ed. (American Physiological Society).
Arbib, M.A. (2005). From monkey-like action recognition to human language: an
evolutionary framework for neurolinguistics. Behav Brain Sci 28, 105-124; discussion 125-
167.
Arbib, M.A., Billard, A., Iacoboni, M., and Oztop, E. (2000). Synthetic brain imaging:
grasping, mirror neurons and imitation. Neural Networks 13, 975-997.
Arbib, M.A., Bischoff, A., Fagg, A.H., and Grafton, S.T. (1995). Synthetic PET : analyzing
large-scale properties of neural networks. Human Brain Mapping 2, 225-233.
197
Arbib, M.A., Bonaiuto, J.B., Jacobs, S., and Frey, S.H. (2009). Tool use and the distalization
of the end-effector. Psychol Res 73, 441-462.
Arbib, M.A., Fagg, A.H., and Grafton, S.T. (2002). Synthetic PET Imaging for Grasping:
From Primate Neurophysiology to Human Behavior. In Exploratory Analysis and Data
Modeling in Functional Neuroimaging, F.T. Soomer, and A. Wichert, eds. (MIT Press), pp.
231-250.
Arbib, M.A., Iberall, T., Lyons, D. (1985). Coordinated control programs for movements of
the hand. Experimental brain research, 111-129.
Arikuni, T., Watanabe, K., and Kubota, K. (1988). Connections of area 8 with area 6 in the
brain of the macaque monkey. J Comp Neurol 277, 21-40.
Arthurs, O.J., and Boniface, S. (2002). How well do we understand the neural origins of the
fMRI BOLD signal? Trends Neurosci 25, 27-31.
Ashmead, D.H., McCarty, M.E., Lucas, L.S., and Belvedere, M.C. (1993). Visual guidance
in infants' reaching toward suddenly displaced targets. Child Dev 64, 1111-1127.
Attwell, D., and Iadecola, C. (2002). The neural basis of functional brain imaging signals.
Trends Neurosci 25, 621-625.
Aubert, A., and Costalat, R. (2002). A model of the coupling between brain electrical
activity, metabolism, and hemodynamics: application to the interpretation of functional
neuroimaging. Neuroimage 17, 1162-1181.
Aubert, A., Costalat, R., and Valabrègue, R. (2001). Modelling of the coupling between brain
electrical activity and metabolism. Acta biotheoretica 49, 301-326.
Aubert, A., Pellerin, L., Magistretti, P.J., and Costalat, R. (2007). A coherent neurobiological
framework for functional neuroimaging provided by a model integrating compartmentalized
energy metabolism. Proc Natl Acad Sci U S A 104, 4188-4193.
Babajani, A., Nekooei, M.H., and Soltanian-Zadeh, H. (2005). Integrated MEG and fMRI
model: synthesis and analysis. Brain Topogr 18, 101-113.
Babajani, A., and Soltanian-Zadeh, H. (2006). Integrated MEG/EEG and fMRI model based
on neural masses. IEEE Trans Biomed Eng 53, 1794-1801.
Babajani-Feremi, A., Soltanian-Zadeh, H., and Moran, J. (2008). Integrated MEG/fMRI
model validated using real auditory data. Brain topography 21, 61-74.
Bar-Gad, I., Morris, G., and Bergman, H. (2003). Information processing, dimensionality
reduction and reinforcement learning in the basal ganglia. Prog Neurobiol 71, 439-473.
198
Barto, A.G. (1995). Adaptive critics and the basal ganglia. In Models of Information
Processing in the Basal Ganglia, J.L.D. J. C. Houk, and D. G. Beiser, ed. (Cambridge, MA:
MIT Press), pp. 215-232.
Battaglia-Mayer, A., Caminiti, R., Lacquaniti, F., and Zago, M. (2003). Multiple levels of
representation of reaching in the parieto-frontal network. Cereb Cortex 13, 1009-1022.
Battaglini, P.P., Muzur, A., Galletti, C., Skrap, M., Brovelli, A., and Fattori, P. (2002).
Effects of lesions to area V6A in monkeys. Exp Brain Res 144, 419-422.
Baumann, M.A., Fluet, M.C., and Scherberger, H. (2009). Context-specific grasp movement
representation in the macaque anterior intraparietal area. J Neurosci 29, 6436-6448.
Bennett, K.M., and Castiello, U. (1994). Reach to grasp: changes with age. J Gerontol 49,
P1-7.
Berthier, N.E. (1996). Learning to reach: A mathematical model. Developmental Psychology
32, 811-823.
Bhat, A., Heathcock, J., and Galloway, J.C. (2005). Toy-oriented changes in hand and joint
kinematics during the emergence of purposeful reaching. In Infant Behavior and
Development.
Bhat, A.N., and Galloway, J.C. (2006). Toy-oriented changes during early arm movements:
hand kinematics. Infant Behav Dev 29, 358-372.
Biederman, I. (1987). Recognition-by-Components: A Theory of Human Image
Understanding. Psychol Rev 94, 115-147.
Binkofski, F., Dohle, C., Posse, S., Stephan, K.M., Hefter, H., Seitz, R.J., and Freund, H.J.
(1998). Human anterior intraparietal area subserves prehension: a combined lesion and
functional MRI activation study. Neurology 50, 1253-1259.
Blatt, G.J., Andersen, R.A., and Stoner, G.R. (1990). Visual receptive field organization and
cortico-cortical connections of the lateral intraparietal area (area LIP) in the macaque. J
Comp Neurol 299, 421-445.
Bonaiuto, J., Rosta, E., and Arbib, M. (2007). Extending the mirror neuron system model, I -
Audible actions and invisible grasps. Biological Cybernetics 96, 9-38.
Borra, E., Belmalih, A., Calzavara, R., Gerbella, M., Murata, A., Rozzi, S., and Luppino, G.
(2007). Cortical Connections of the Macaque Anterior Intraparietal (AIP) Area -- Borra et al.,
10.1093/cercor/bhm146 -- Cerebral Cortex. Cerebral Cortex.
199
Botvinick, M., and Plaut, D.C. (2004). Doing without schema hierarchies: a recurrent
connectionist approach to normal and impaired routine sequential action. Psychol Rev 111,
395-429.
Brass, M., and Heyes, C. (2005). Imitation: is cognitive neuroscience solving the
correspondence problem? Trends in Cognitive Sciences 9, 489-495.
Breveglieri, R., Kutz, D.F., Fattori, P., Gamberini, M., and Galletti, C. (2002).
Somatosensory cells in the parieto-occipital area V6A of the macaque. Neuroreport 13, 2113-
2116.
Brog, J., Salyapongse, A., Deutch, A., and Zahm, D. (1993). The afferent innervation of the
core and shell of the accumbens part of the rat ventral striatum: immunohistochemical
detection of retrogradely transported Fluoro-gold. J. comp. Neurol 338, 255–273.
Buccino, G., Binkofski, F., Fink, G.R., Fadiga, L., Fogassi, L., Gallese, V., Seitz, R.J., Zilles,
K., Rizzolatti, G., and Freund, H.J. (2001). Action observation activates premotor and
parietal areas in a somatotopic manner: an fMRI study. Eur J Neurosci 13, 400-404.
Buccino, G., Vogt, S., Ritzl, A., Fink, G.R., Zilles, K., Freund, H.J., and Rizzolatti, G.
(2004). Neural circuits underlying imitation learning of hand actions: an event-related fMRI
study. Neuron 42, 323-334.
Bullier, J., Schall, J.D., and Morel, A. (1996). Functional streams in occipito-frontal
connections in the monkey. Behav Brain Res 76, 89-97.
Bullock, D., Grossberg, S., and Guenther, F.H. (1993). A Self-Organizing Neural Model of
Motor Equivalent Reaching and Tool Use by a Multijoint Arm. J Cogn Neursci 5, 408-435.
Buxton, R.B., and Frank, L.R. (1997). A model for the coupling between cerebral blood flow
and oxygen metabolism during neural stimulation. J Cereb Blood Flow Metab 17, 64-72.
Buxton, R.B., Uludag, K., Dubowitz, D.J., and Liu, T.T. (2004). Modeling the hemodynamic
response to brain activation. Neuroimage 23 Suppl 1, S220-233.
Buxton, R.B., Wong, E.C., and Frank, L.R. (1998). Dynamics of blood flow and oxygenation
changes during brain activation: the balloon model. Magn Reson Med 39, 855-864.
Caminiti, R., Ferraina, S., and Johnson, P.B. (1996). The sources of visual information to the
primate frontal lobe: a novel role for the superior parietal lobule. Cereb Cortex 6, 319-328.
Caminiti, R., Genovesio, A., Marconi, B., Mayer, A.B., Onorati, P., Ferraina, S., Mitsuda, T.,
Giannetti, S., Squatrito, S., Maioli, M.G., and Molinari, M. (1999). Early coding of reaching:
frontal and parietal association connections of parieto-occipital cortex. Eur J Neurosci 11,
3339-3345.
200
Caminiti, R., Johnson, P.B., Galli, C., Ferraina, S., and Burnod, Y. (1991). Making arm
movements within different parts of space: the premotor and motor cortical representation of
a coordinate system for reaching to visual targets. J Neurosci 11, 1182-1197.
Cattaneo, L., Voss, M., Brochier, T., Prabhu, G., Wolpert, D.M., and Lemon, R.N. (2005). A
cortico-cortical mechanism mediating object-driven grasp in humans. Proc Natl Acad Sci U
S A 102, 898-903.
Cauli, B., Tong, X., Rancillac, A., Serluca, N., Lambolez, B., Rossier, J., and Hamel, E.
(2004). Cortical GABA interneurons in neurovascular coupling: relays for subcortical
vasoactive pathways. Journal of Neuroscience 24, 8940.
Cavada, C., and Goldman-Rakic, P.S. (1989). Posterior parietal cortex in rhesus monkey: II.
Evidence for segregated corticocortical networks linking sensory and limbic areas with the
frontal lobe. J Comp Neurol 287, 422-445.
Chadderdon, G.L., and Sporns, O. (2006). A large-scale neurocomputational model of task-
oriented behavior selection and working memory in prefrontal cortex. J Cogn Neurosci 18,
242-257.
Chaminade, T., Meltzoff, A.N., and Decety, J. (2005). An fMRI study of imitation: action
representation and body schema. Neuropsychologia 43, 115-127.
Chan, S.S., and Moran, D.W. (2006). Computational model of a primate arm: from hand
position to joint angles, joint torques and muscle forces. J Neural Eng 3, 327-337.
Chao, L.L., and Martin, A. (2000). Representation of manipulable man-made objects in the
dorsal stream. Neuroimage 12, 478-484.
Chiavarino, C., Apperly, I.A., and Humphreys, G.W. (2007). Exploring the functional and
anatomical bases of mirror-image and anatomical imitation: the role of the frontal lobes.
Neuropsychologia 45, 784-795.
Churchland, M.M., Santhanam, G., and Shenoy, K.V. (2006). Preparatory activity in
premotor and motor cortex reflects the speed of the upcoming reach. J Neurophysiol 96,
3130-3146.
Cisek, P., and Kalaska, J.F. (2002). Simultaneous encoding of multiple potential reach
directions in dorsal premotor cortex. J Neurophysiol 87, 1149-1154.
Clark, J., Clark, A., Bartle, A., and Winn, P. (1991). The regulation of feeding and drinking
in rats with lesions of the lateral hypothalamus made byN-methyl-d-aspartate. Neuroscience
45, 631-640.
201
Claxton, L.J., Keen, R., and McCarty, M.E. (2003). Evidence of motor planning in infant
reaching behavior. Psychol Sci 14, 354-356.
Cohen, Z., Bonvento, G., Lacombe, P., and Hamel, E. (1996). Serotonin in the regulation of
brain microcirculation. Progress in neurobiology 50, 335.
Colby, C.L., and Duhamel, J.R. (1996). Spatial representations for action in parietal cortex.
Brain Res Cogn Brain Res 5, 105-115.
Colby, C.L., Duhamel, J.R., and Goldberg, M.E. (1993). Ventral intraparietal area of the
macaque: anatomic location and visual response properties. J Neurophysiol 69, 902-914.
Colby, C.L., Gattass, R., Olson, C.R., and Gross, C.G. (1988). Topographical organization of
cortical afferents to extrastriate visual area PO in the macaque: a dual tracer study. J Comp
Neurol 269, 392-413.
Colby, C.L., and Goldberg, M.E. (1999). Space and attention in parietal cortex. Annu Rev
Neurosci 22, 319-349.
Cooper, R., and Shallice, T. (2000). Contention scheduling and the control of routine
activities. Cognitive Neuropsychology 17, 297-338.
Corchs, S., and Deco, G. (2002). Large-scale neural model for visual attention: integration of
experimental single-cell and fMRI data. Cereb Cortex 12, 339-348.
Corchs, S., and Deco, G. (2004). Feature-based attention in human visual cortex: simulation
of fMRI data. Neuroimage 21, 36-45.
Courtney, S.M., Petit, L., Maisog, J.M., Ungerleider, L.G., and Haxby, J.V. (1998). An area
specialized for spatial working memory in human frontal cortex. Science 279, 1347-1351.
Cragg, S., Baufreton, J., Xue, Y., Bolam, J., and Bevan, M. (2004). Synaptic release of
dopamine in the subthalamic nucleus. Eur J Neurosci 20, 1788-1802.
Crammond, D.J., and Kalaska, J.F. (2000). Prior information in motor and premotor cortex:
activity during the delay period and effect on pre-movement activity. J Neurophysiol 84,
986-1005.
Crutcher, M.D., and Alexander, G.E. (1990). Movement-related neuronal activity selectively
coding either direction or muscle pattern in three motor areas of the monkey. J Neurophysiol
64, 151-163.
Culham, J.C., and Valyear, K.F. (2006). Human parietal cortex in action. Current Opinion in
Neurobiology 16, 205-212.
202
D'Esposito, M., Aguirre, G.K., Zarahn, E., Ballard, D., Shin, R.K., and Lease, J. (1998).
Functional MRI studies of spatial and nonspatial working memory. Brain Res Cogn Brain
Res 7, 1-13.
Dayan, P., Abbott, L., and Abbott, L. (2001). Theoretical neuroscience: Computational and
mathematical modeling of neural systems (MIT Press).
Deacon, T.W. (1992). Cortical connections of the inferior arcuate sulcus cortex in the
macaque brain. Brain Res 573, 8-26.
Decety, J., Chaminade, T., Grezes, J., and Meltzoff, A.N. (2002). A PET exploration of the
neural mechanisms involved in reciprocal imitation. Neuroimage 15, 265-272.
Decety, J., Grezes, J., Costes, N., Perani, D., Jeannerod, M., Procyk, E., Grassi, F., and Fazio,
F. (1997). Brain activity during observation of actions - Influence of action content and
subject's strategy. Brain 120, 1763-1777.
Deco, G., Rolls, E.T., and Horwitz, B. (2004). "What" and "where" in visual working
memory: a computational neurodynamical perspective for integrating FMRI and single-
neuron data. J Cogn Neurosci 16, 683-701.
Devor, A., Dunn, A., Andermann, M., Ulbert, I., Boas, D., and Dale, A. (2003). Coupling of
total hemoglobin concentration, oxygenation, and neural activity in rat somatosensory cortex.
Neuron 39, 353-359.
di Pellegrino, G., Fadiga, L., Fogassi, L., Gallese, V., and Rizzolatti, G. (1992).
Understanding motor events: a neurophysiological study. Experimental brain research.
Experimentelle Hirnforschung. Exp © rimentation c © r © brale 91, 176.
Dinstein, I., Hasson, U., Rubin, N., and Heeger, D.J. (2007). Brain areas selective for both
observed and executed movements. Journal of Neurophysiology 98, 1415-1427.
Ditterich, J., Mazurek, M., and Shadlen, M. (2003). Microstimulation of visual cortex affects
the speed of perceptual decisions. Nature Neuroscience 6, 891-898.
Dominey, P.F., and Arbib, M.A. (1992). A cortico-subcortical model for generation of
spatially accurate sequential saccades. Cereb Cortex 2, 153-175.
Downing, P.E., Jiang, Y.H., Shuman, M., and Kanwisher, N. (2001). A cortical area selective
for visual processing of the human body. Science 293, 2470-2473.
Downing, P.E., Peelen, M.V., Wiggett, A.J., and Tew, B.D. (2006). The role of the
extrastriate body area in action perception. Social Neuroscience 1, 52-62.
203
Doya, K. (2000). Reinforcement learning in continuous time and space. Neural Computation
12, 219-245.
Doya, K., Samejima, K., Katagiri, K., and Kawato, M. (2002). Multiple model-based
reinforcement learning. Neural Comput 14, 1347-1369.
Dum, R.P., and Strick, P.L. (1991). The origin of corticospinal projections from the premotor
areas in the frontal lobe. J Neurosci 11, 667-689.
Dum, R.P., and Strick, P.L. (2005). Frontal lobe inputs to the digit representations of the
motor areas on the lateral surface of the hemisphere. J Neurosci 25, 1375-1386.
Durbin, R., and Mitchison, G. (1990). A dimension reduction framework for understanding
cortical maps. Nature 343, 644-647.
Durstewitz, D., Seamans, J.K., and Sejnowski, T.J. (2000). Neurocomputational models of
working memory. Nat Neurosci 3 Suppl, 1184-1191.
Erlhagen, W., and Schoner, G. (2002). Dynamic field theory of movement preparation.
Psychol Rev 109, 545-572.
Fadel, J., and Deutch, A. (2002). Anatomical substrates of orexin-dopamine interactions:
lateral hypothalamic projections to the ventral tegmental area. Neuroscience 111, 379-387.
Fagg, A., and Arbib, M. (1998). Modeling parietal-premotor interactions in primate control
of grasping. Neural Netw 11, 1277-1303.
Faillenot, I., Sakata, H., Costes, N., Decety, J., and Jeannerod, M. (1997). Visual working
memory for shape and 3D-orientation: A PET study. Neuroreport 8, 859-862.
Faldella, E., Fringuelli, B., and Zanichelli, F. (1993). A Hybrid System for Knowledge-Based
Synthesis of Robot Grasps. In IEEE/RSJ International Conference on Intelligent Robots and
Systems.
Fattori, P., Kutz, D.F., Breveglieri, R., Marzocchi, N., and Galletti, C. (2005). Spatial tuning
of reaching activity in the medial parieto-occipital cortex (area V6A) of macaque monkey.
Eur J Neurosci 22, 956-972.
Felleman, D.J., and Van Essen, D.C. (1991). Distributed hierarchical processing in the
primate cerebral cortex. Cereb Cortex 1, 1-47.
Ferraina, S., Pare, M., and Wurtz, R.H. (2002). Comparison of cortico-cortical and cortico-
collicular signals for the generation of saccadic eye movements. J Neurophysiol 87, 845-858.
204
Fiorillo, C.D., Tobler, P.N., and Schultz, W. (2003). Discrete coding of reward probability
and uncertainty by dopamine neurons. Science 299, 1898-1902.
Fogassi, L., and Ferrari, P.F. (2004). Mirror neurons, gestures and language evolution.
Interaction Studies 5, 345-364.
Fogassi, L., Ferrari, P.F., Gesierich, B., Rozzi, S., Chersi, F., and Rizzolatti, G. (2005).
Parietal lobe: from action organization to intention understanding. Science 308, 662-667.
Fogassi, L., Gallese, V., Buccino, G., Craighero, L., Fadiga, L., and Rizzolatti, G. (2001).
Cortical mechanism for the visual guidance of hand grasping movements in the monkey - A
reversible inactivation study. Brain 124, 571-586.
Fogassi, L., Gallese, V., Fadiga, L., and Rizzolatti, G. (1998). Neurons responding to the
sight of goal directed hand/arm actions in the parietal area PF (7b) of the macaque monkey.
In Soc Neurosci, p. 257.
Fogassi, L., Raos, V., Franchi, G., Gallese, V., Luppino, G., and Matelli, M. (1999). Visual
responses in the dorsal premotor area F2 of the macaque monkey. Exp Brain Res 128, 194-
199.
Freund, H.J. (1990). Premotor area and preparation of movement. Rev Neurol (Paris) 146,
543-547.
Friedman, H.R., and Goldman-Rakic, P.S. (1994). Coactivation of prefrontal cortex and
inferior parietal cortex in working memory tasks revealed by 2DG functional mapping in the
rhesus monkey. J Neurosci 14, 2775-2788.
Friston, K.J., Mechelli, A., Turner, R., and Price, C.J. (2000). Nonlinear responses in fMRI:
the Balloon model, Volterra kernels, and other hemodynamics. Neuroimage 12, 466-477.
Fu, Q.G., Suarez, J.I., and Ebner, T.J. (1993). Neuronal specification of direction and
distance during reaching movements in the superior precentral premotor area and primary
motor cortex of monkeys. J Neurophysiol 70, 2097-2116.
Fu, W.T., and Anderson, J.R. (2006). From recurrent choice to skill learning: a
reinforcement-learning model. J Exp Psychol Gen 135, 184-206.
Fujii, N., H., Mushiake, and Tanji, J. (1996). Rostrocaudal differentiation of dorsal premotor
cortex with physiological criteria. In Soc Neurosci Abstr, p. 796.791.
Fujii, N., Mushiake, H., and Tanji, J. (2002). Distribution of eye- and arm-movement-related
neuronal activity in the SEF and in the SMA and Pre-SMA of monkeys. J Neurophysiol 87,
2158-2166.
205
Gallese, V., Fadiga, L., Fogassi, L., and Rizzolatti, G. (1996). Action recognition in the
premotor cortex. Brain 119 ( Pt 2), 593-609.
Gallese, V., and Goldman, A. (1998). Mirror neurons and the simulation theory of mind-
reading. Trends in Cognitive Sciences 2, 493-501.
Gallese, V., Murata, A., Kaseda, M., Niki, N., and Sakata, H. (1994). Deficit of hand
preshaping after muscimol injection in monkey parietal cortex. Neuroreport 5, 1525-1529.
Galletti, C., Battaglini, P.P., and Fattori, P. (1993). Parietal neurons encoding spatial
locations in craniotopic coordinates. Exp Brain Res 96, 221-229.
Galletti, C., Fattori, P., Kutz, D.F., and Battaglini, P.P. (1997). Arm movement-related
neurons in the visual area V6A of the macaque superior parietal lobule. Eur J Neurosci 9,
410-413.
Galletti, C., Fattori, P., Kutz, D.F., and Gamberini, M. (1999). Brain location and visual
topography of cortical area V6A in the macaque monkey. Eur J Neurosci 11, 575-582.
Galletti, C., Kutz, D.F., Gamberini, M., Breveglieri, R., and Fattori, P. (2003). Role of the
medial parieto-occipital cortex in the control of reaching and grasping movements. Exp Brain
Res 153, 158-170.
Gardner, E.P., Babu, K.S., Reitzen, S.D., Ghosh, S., Brown, A.S., Chen, J., Hall, A.L.,
Herzlinger, M.D., Kohlenstein, J.B., and Ro, J.Y. (2007). Neurophysiology of prehension. I.
Posterior parietal cortex and object-oriented hand behaviors. J Neurophysiol 97, 387-406.
Genovesio, A., and Ferraina, S. (2004). Integration of retinal disparity and fixation-distance
related signals toward an egocentric coding of distance in the posterior parietal cortex of
primates. J Neurophysiol 91, 2670-2684.
Gibson, J.J. (1966). The Senses Considered as Perceptual Systems (Boston: Houghton-
Mifflin).
Glover, G.H. (1999). Deconvolution of impulse response in event-related BOLD fMRI.
Neuroimage 9, 416-429.
Gnadt, J.W., Andersen, R.A. (1988). Memory related motor planning activity in posterior
parietal cortex of macaque. Experimental Brain Research 70, 216-220.
Gnadt, J.W., and Beyer, J. (1998). Eye movements in depth: What does the monkey's parietal
cortex tell the superior colliculus? Neuroreport 9, 233-238.
Gnadt, J.W., and Mays, L.E. (1995). Neurons in monkey parietal area LIP are tuned for eye-
movement parameters in three-dimensional space. J Neurophysiol 73, 280-297.
206
Godschalk, M., Mitz, A.R., van Duin, B., and van der Burg, H. (1995). Somatotopy of
monkey premotor cortex examined with microstimulation. Neurosci Res 23, 269-279.
Goense, J.B., and Logothetis, N.K. (2008). Neurophysiology of the BOLD fMRI signal in
awake monkeys. Curr Biol 18, 631-640.
Gold, J., and Shadlen, M. (2000). Representation of a perceptual decision in developing
oculomotor commands. Nature 404, 390-394.
Gold, J., and Shadlen, M. (2003). The influence of behavioral context on the representation
of a perceptual decision in developing oculomotor commands. Journal of Neuroscience 23,
632.
Gorce, P., and Rezzoug, N. (2004). A method to learn hand grasping posture from noisy
sensing information. Robotica 22, 309-318.
Gordon, J., Ghilardi, M.F., and Ghez, C. (1994). Accuracy of planar reaching movements. I.
Independence of direction and extent variability. Exp Brain Res 99, 97-111.
Grafton, S.T., Arbib, M.A., Fadiga, L., and Rizzolatti, G. (1996). Localization of grasp
representations in humans by positron emission tomography. 2. Observation compared with
imagination. Exp Brain Res 112, 103-111.
Grupen, R., and Coelho, J.J. (2000). Structure and growth: A model of development for
grasping with robot hands. In IEEE/RSJ International Conference on Intelligent Robots and
Systems, pp. 1987-1992.
Gurney, K., Prescott, T., and Redgrave, P. (2001). A computational model of action selection
in the basal ganglia. I. A new functional anatomy. Biol Cybern 84, 401-410.
Hamada, I., and DeLong, M.R. (1992). Excitotoxic acid lesions of the primate subthalamic
nucleus result in reduced pallidal neuronal activity during active holding. J Neurophysiol 68,
1859-1866.
Hanes, D., and Schall, J. (1996). Neural control of voluntary movement initiation. Science
274, 427.
Hanks, T., Ditterich, J., and Shadlen, M. (2006). Microstimulation of macaque area LIP
affects decision-making in a motion discrimination task. Nature neuroscience 9, 682-689.
Hassani, O., François, C., Yelnik, J., and Féger, J. (1997). Evidence for a dopaminergic
innervation of the subthalamic nucleus in the rat. Brain Res 749, 88-94.
207
He, S.Q., Dum, R.P., and Strick, P.L. (1993). Topographic organization of corticospinal
projections from the frontal lobe: motor areas on the lateral surface of the hemisphere. J
Neurosci 13, 952-980.
Heeger, D., and Ress, D. (2002). What does fMRI tell us about neuronal activity? Nature
Reviews Neuroscience 3, 142-151.
Hermsdorfer, J., Goldenberg, G., Wachsmuth, C., Conrad, B., Ceballos-Baumann, A.O.,
Bartenstein, P., Schwaiger, M., and Boecker, H. (2001). Cortical correlates of gesture
processing: Clues to the cerebral mechanisms underlying apraxia during the imitation of
meaningless gestures. Neuroimage 14, 149-161.
Hertz, J., Krogh, A., and Palmer, R.G. (1991). Introduction to the theory of neural
computation (Reading: Addison Wesley).
Hewson-Stoate, N., Jones, M., Martindale, J., Berwick, J., and Mayhew, J. (2005). Further
nonlinearities in neurovascular coupling in rodent barrel cortex. Neuroimage 24, 565-574.
Hirano, S., Asanuma, K., Ma, Y., Tang, C., Feigin, A., Dhawan, V., Carbon, M., and
Eidelberg, D. (2008). Dissociation of metabolic and neurovascular responses to levodopa in
the treatment of Parkinson's disease. Journal of Neuroscience 28, 4201.
Horwitz, B., and Tagamets, M.A. (1999). Predicting human functional maps with neural net
modeling. Hum Brain Mapp 8, 137-142.
Horwitz, B., Warner, B., Fitzer, J., Tagamets, M.A., Husain, F.T., and Long, T.W. (2005).
Investigating the neural basis for functional and effective connectivity. Application to fMRI.
Philosophical Transactions of the Royal Society B-Biological Sciences 360, 1093-1108.
Husain, F.T., Nandipati, G., Braun, A.R., Cohen, L.G., Tagamets, M.A., and Horwitz, B.
(2002). Simulating transcranial magnetic stimulation during PET with a large-scale neural
network model of the prefrontal cortex and the visual system. Neuroimage 15, 58-73.
Husain, F.T., Tagamets, M.A., Fromm, S.J., Braun, A.R., and Horwitz, B. (2004). Relating
neuronal dynamics for auditory object processing to neuroimaging activity: a computational
modeling and an fMRI study. Neuroimage 21, 1701-1720.
Iacoboni, M., Woods, R.P., Brass, M., Bekkering, H., Mazziotta, J.C., and Rizzolatti, G.
(1999). Cortical mechanisms of human imitation. Science 286, 2526-2528.
Inase, M., Sakai, S.T., and Tanji, J. (1996). Overlapping corticostriatal projections from the
supplementary motor area and the primary motor cortex in the macaque monkey: an
anterograde double labeling study. J Comp Neurol 373, 283-296.
208
Iriki, A., Tanaka, M., Obayashi, S., and Iwamura, Y. (2001). Self-images in the video
monitor coded by monkey intraparietal neurons. Neurosci Res 40, 163-173.
Izhikevich, E. (2007). Dynamical systems in neuroscience: The geometry of excitability and
bursting (The MIT press).
Izhikevich, E.M. (2004). Which model to use for cortical spiking neurons? IEEE Trans
Neural Netw 15, 1063-1070.
Izhikevich, E.M., and Edelman, G.M. (2008). Large-scale model of mammalian
thalamocortical systems. Proc Natl Acad Sci U S A 105, 3593-3598.
Jaeger, D., Gilman, S., and Aldridge, J.W. (1993). Primate basal ganglia activity in a precued
reaching task: preparation for movement. Exp Brain Res 95, 51-64.
Jahr, C.E., and Stevens, C.F. (1990). A quantitative description of NMDA receptor-channel
kinetic behavior. J Neurosci 10, 1830-1837.
Jeannerod, M., Arbib, M.A., Rizzolatti, G., and Sakata, H. (1995). Grasping objects: the
cortical mechanisms of visuomotor transformation. Trends Neurosci 18, 314-320.
Joel, D., and Weiner, I. (2000). The connections of the dopaminergic system with the
striatum in rats and primates: an analysis with respect to the functional and compartmental
organization of the striatum. Neuroscience 96, 451-474.
Johnson, P.B., Ferraina, S., Bianchi, L., and Caminiti, R. (1996). Cortical networks for visual
reaching: physiological and anatomical organization of frontal and parietal lobe arm regions.
Cereb Cortex 6, 102-119.
Johnson, P.B., Ferraina, S., and Caminiti, R. (1993). Cortical networks for visual reaching.
Exp Brain Res 97, 361-365.
Jones, M.J. (1992). Using Recurrent Networks for Dimensionality Reduction (Cambridge:
MIT Press).
Jordan, M.I. (1986). Attractor dynamics and parallelism in a connectionist sequential
machine. In 8th conference of the cognitive science society, pp. 531-546.
Jueptner, M., and Weiller, C. (1995). Review: does measurement of regional cerebral blood
flow reflect synaptic activity? Implications for PET and fMRI. Neuroimage 2, 148-156.
Kakei, S., Hoffman, D.S., and Strick, P.L. (2001). Direction of action is represented in the
ventral premotor cortex. Nat Neurosci 4, 1020-1025.
209
Kakei, S., Hoffman, D.S., and Strick, P.L. (2003). Sensorimotor transformations in cortical
motor areas. Neurosci Res 46, 1-10.
Kalaska, J.F. (2009). From intention to action: motor cortex and the control of reaching
movements. Adv Exp Med Biol 629, 139-178.
Kamon, I., Flash, T., and Edelman, S. (1996). Learning to grasp using visual information. In
IEEE International Conference on Robotics and Automation (Minneapolis, MN, Citeseer),
pp. 2470-2476.
Kelley, A. (2004). Ventral striatal control of appetitive motivation: role in ingestive behavior
and reward-related learning. Neuroscience & Biobehavioral Reviews 27, 765-776.
Kelley, A., Baldo, B., and Pratt, W. (2005). A proposed hypothalamic-thalamic-striatal axis
for the integration of energy balance, arousal, and food reward. Journal of Comparative
Neurology 493, 72-85.
Koehler, R.C., Gebremedhin, D., and Harder, D.R. (2006). Role of astrocytes in
cerebrovascular regulation. J Appl Physiol 100, 307-317.
Kohler, E., Keysers, C., Umilta, M.A., Fogassi, L., Gallese, V., and Rizzolatti, G. (2002).
Hearing sounds, understanding actions: action representation in mirror neurons. Science 297,
846-848.
Koski, L., Wohlschlager, A., Bekkering, H., Woods, R.P., Dubeau, M.C., Mazziotta, J.C.,
and Iacoboni, M. (2002). Modulation of motor and premotor activity during imitation of
target-directed actions. Cereb Cortex 12, 847-855.
Krimer, L., Muly, E., Williams, G., and Goldman-Rakic, P. (1998). Dopaminergic regulation
of cerebral cortical microcirculation. Nature Neuroscience 1, 286-289.
Kropotov, J., and Etlinger, S. (1999). Selection of actions in the basal ganglia-
thalamocortical circuits: review and model. Int J Psychophysiol 31, 197-217.
Kuhtz-Buschbeck, J.P., Stolze, H., Johnk, K., Boczek-Funcke, A., and Illert, M. (1998).
Development of prehension movements in children: a kinematic study. Exp Brain Res 122,
424-432.
Kuperstein, M. (1988). Neural model of adaptive hand-eye coordination for single postures.
Science 239, 1308-1311.
Kurata, K. (1989). Distribution of neurons with set- and movement-related activity before
hand and foot movements in the premotor cortex of rhesus monkeys. Exp Brain Res 77, 245-
256.
210
Kurata, K. (1993). Premotor cortex of monkeys: set- and movement-related activity
reflecting amplitude and direction of wrist movements. J Neurophysiol 69, 187-200.
Kurata, K. (1994). Information processing for motor control in primate premotor cortex.
Behav Brain Res 61, 135-142.
Kurata, K., and Hoshi, E. (2002). Movement-related neuronal activity reflecting the
transformation of coordinates in the ventral premotor cortex of monkeys. J Neurophysiol 88,
3118-3132.
Kuroshima, H., Fujita, K., Fuyuki, A., and Masuda, T. (2002). Understanding of the
relationship between seeing and knowing by tufted capuchin monkeys (Cebus apella). Anim
Cogn 5, 41-48.
Kusunoki, M., Tanaka, Y., Ohtsuka, H., Ishiyama, K., and Sakata, H. (1993). Selectivity of
the parietal visual neurons in the axis orientation of objects in space. Society for
Neuroscience Abstracts 19, 770.
Lasky, R.E. (1977). The effect of visual feedback of the hand on the reaching and retrieval
behavior of young infants. Child Dev 48, 112-117.
Lauritzen, M., and Gold, L. (2003). Brain function and neurophysiological correlates of
signals used in functional neuroimaging. J Neurosci 23, 3972-3980.
Lee, L., Friston, K., and Horwitz, B. (2006). Large-scale neural models and dynamic causal
modelling. Neuroimage 30, 1243-1254.
Lehky, S.R., and Sereno, A.B. (2007). Comparison of shape encoding in primate dorsal and
ventral visual pathways. J Neurophysiol 97, 307-319.
Lewis, J.W., and Van Essen, D.C. (2000). Corticocortical connections of visual,
sensorimotor, and multimodal processing areas in the parietal lobe of the macaque monkey. J
Comp Neurol 428, 112-137.
Lockman, J.J., Ashmead, D.H., and Bushnell, E.W. (1984). The development of anticipatory
hand orientation during infancy. J Exp Child Psychol 37, 176-186.
Lorrain, D., Riolo, J., Matuszewich, L., and Hull, E. (1999). Lateral hypothalamic serotonin
inhibits nucleus accumbens dopamine: implications for sexual satiety. Journal of
Neuroscience 19, 7648.
Luppino, G., Calzavara, R., Rozzi, S., and Matelli, M. (2001). Projections from the superior
temporal sulcus to the agranular frontal cortex in the macaque. Eur J Neurosci 14, 1035-
1040.
211
Luppino, G., Hamed, S.B., Gamberini, M., Matelli, M., and Galletti, C. (2005). Occipital
(V6) and parietal (V6A) areas in the anterior wall of the parieto-occipital sulcus of the
macaque: a cytoarchitectonic study. Eur J Neurosci 21, 3056-3076.
Luppino, G., Murata, A., Govoni, P., and Matelli, M. (1999). Largely segregated
parietofrontal connections linking rostral intraparietal cortex (areas AIP and VIP) and the
ventral premotor cortex (areas F5 and F4). Exp Brain Res 128, 181-187.
Luppino, G., Rozzi, S., Calzavara, R., and Matelli, M. (2003). Prefrontal and agranular
cingulate projections to the dorsal premotor areas F2 and F7 in the macaque monkey. Eur J
Neurosci 17, 559-578.
Lyon, R. (1982). A computational model of filtering, detection, and compression in the
cochlea. In IEEE International Conference on Acoustics, Speech, and Signal Processing, pp.
1282- 1285.
Lyons, D. (1985). A simple set of grasps for a dextrous hand. In IEEE International
Conference on Robotics and Automation, pp. 588- 593.
MacNeilage, P.F., and Davis, B.L. (2005). The frame/content theory of evolution of speech:
A comparison with a gestural-origins alternative. Interaction Studies 6, 173-199.
Magistretti, P.J., and Pellerin, L. (1999). Cellular mechanisms of brain energy metabolism
and their relevance to functional brain imaging. Philos Trans R Soc Lond B Biol Sci 354,
1155-1163.
Maistros, G., and Hayes, G. (2004). Towards an imitation system for learning robots. In
Methods and Applications of Artificial Intelligence, Proceedings (Berlin: Springer-Verlag
Berlin), pp. 246-255.
Marconi, B., Genovesio, A., Battaglia-Mayer, A., Ferraina, S., Squatrito, S., Molinari, M.,
Lacquaniti, F., and Caminiti, R. (2001). Eye-hand coordination during reaching. I.
Anatomical relationships between parietal and frontal cortex. Cereb Cortex 11, 513-527.
Matelli, M., Govoni, P., Galletti, C., Kutz, D.F., and Luppino, G. (1998). Superior area 6
afferents from the superior parietal lobule in the macaque monkey. J Comp Neurol 402, 327-
352.
Matelli, M., and Luppino, G. (1996). Thalamic input to mesial and superior area 6 in the
macaque monkey. J Comp Neurol 372, 59-87.
Matelli, M., Luppino, G., and Rizzolatti, G. (1985). Patterns of cytochrome oxidase activity
in the frontal agranular cortex of the macaque monkey. Behav Brain Res 18, 125-136.
212
Maunsell, J., and Van Essen, D. (1983). Functional properties of neurons in middle temporal
visual area of the macaque monkey. I. Selectivity for stimulus direction, speed, and
orientation. Journal of Neurophysiology 49, 1127.
Mazziotta, J., Toga, A., Evans, A., Fox, P., Lancaster, J., Zilles, K., Woods, R., Paus, T.,
Simpson, G., Pike, B., et al. (2001). A probabilistic atlas and reference system for the human
brain: International Consortium for Brain Mapping (ICBM). Philos Trans R Soc Lond B Biol
Sci 356, 1293-1322.
McCarthy, G., Blamire, A.M., Puce, A., Nobre, A.C., Bloch, G., Hyder, F., Goldman-Rakic,
P., and Shulman, R.G. (1994). Functional magnetic resonance imaging of human prefrontal
cortex activation during a spatial working memory task. Proc Natl Acad Sci U S A 91, 8690-
8694.
McCarty, M.E., Clifton, R.K., and Collard, R.R. (1999). Problem solving in infancy: the
emergence of an action plan. Dev Psychol 35, 1091-1101.
McGurk, H., and MacDonald, J. (1976). Hearing lips and seeing voices. Nature 264, 746-
748.
McIntyre, J., Stratta, F., and Lacquaniti, F. (1998). Short-term memory for reaching to visual
targets: psychophysical evidence for body-centered reference frames. J Neurosci 18, 8423-
8435.
Mechelli, A., Price, C., and Friston, K. (2001). Nonlinear coupling between evoked rCBF
and BOLD signals: a simulation study of hemodynamic responses. NeuroImage 14, 862-872.
Messier, J., and Kalaska, J.F. (2000). Covariation of primate dorsal premotor cell activity
with direction and amplitude during a memorized-delay reaching task. J Neurophysiol 84,
152-165.
Metea, M.R., and Newman, E.A. (2006). Glial cells dilate and constrict blood vessels: a
mechanism of neurovascular coupling. J Neurosci 26, 2862-2870.
Metta, G., Sandini, G., Natale, L., Craighero, L., and Fadiga, L. (2006). Understanding
mirror neurons - A bio-robotic approach. Interaction Studies 7, 197-232.
Miall, R. (2003). Connecting mirror neurons and forward models. Neuroreport 14, 2135.
Mink, J., and Thatch, W. (1991). Basal ganglia motor control II. Late pallidal timing relative
to movement onset and inconsistent coding of movement parameters. J Neurophysiol 65,
301-329.
Mitchell, S.J., Richardson, R.T., Baker, F.H., and DeLong, M.R. (1987). The primate globus
pallidus: neuronal activity related to direction of movement. Exp Brain Res 68, 491-505.
213
Molina-Vilaplana, J., Feliu-Batlle, J., and Lopez-Coronado, J. (2007). A modular neural
network architecture for step-wise learning of grasping tasks. Neural Networks 20, 631-645.
Mon-Williams, M., and McIntosh, R.D. (2000). A test between two hypotheses and a
possible third way for the control of prehension. Exp Brain Res 134, 268-273.
Morel, A., Garraghty, P.E., and Kaas, J.H. (1993). Tonotopic organization, architectonic
fields, and connections of auditory cortex in macaque monkeys. J Comp Neurol 335, 437-
459.
Morgan, M., and Ward, R. (1980). Conditions for motion flow in dynamic visual noise.
Vision Research 20, 431-435.
Morris, G., Nevet, A., Arkadir, D., Vaadia, E., and Bergman, H. (2006). Midbrain dopamine
neurons encode decisions for future action. Nat Neurosci 9, 1057-1063.
Morrongiello, B.A., and Rocca, P.T. (1989). Visual feedback and anticipatory hand
orientation during infants' reaching. Percept Mot Skills 69, 787-802.
Murata, A., Fadiga, L., Fogassi, L., Gallese, V., Raos, V., and Rizzolatti, G. (1997). Object
representation in the ventral premotor cortex (area F5) of the monkey. Journal of
Neurophysiology 78, 2226-2230.
Murata, A., Gallese, V., Luppino, G., Kaseda, M., and Sakata, H. (2000). Selectivity for the
shape, size, and orientation of objects for grasping in neurons of monkey parietal area AIP. J
Neurophysiol 83, 2580-2601.
Nakamura, H., Kuroda, T., Wakita, M., Kusunoki, M., Kato, A., Mikami, A., Sakata, H., and
Itoh, K. (2001). From three-dimensional space vision to prehensile hand movements: the
lateral intraparietal area links the area V3A and the anterior intraparietal area in macaques. J
Neurosci 21, 8174-8187.
Neal, J.W., Pearson, R.C., and Powell, T.P. (1990). The ipsilateral corticocortical
connections of area 7 with the frontal lobe in the monkey. Brain Res 509, 31-40.
Newell, K.M., McDonald, P.V., and Baillargeon, R. (1993). Body scale and infant grip
configurations. Dev Psychobiol 26, 195-205.
Newman-Norlund, R.D., Noordzij, M.L., Meulenbroek, R.G., and Bekkering, H. (2007).
Exploring the brain basis of joint action: co-ordination of actions, goals and intentions. Soc
Neurosci 2, 48-65.
Newsome, W., Britten, K., and Movshon, J. (1989). Neuronal correlates of a perceptual
decision.
214
Ochiai, T., Mushiake, H., and Tanji, J. (2005). Involvement of the ventral premotor cortex in
controlling image motion of the hand during performance of a target-capturing task. Cereb
Cortex 15, 929-937.
Olson, C.R., and Gettner, S.N. (1995). Object-centered direction selectivity in the macaque
supplementary eye field. Science 269, 985-988.
Oztop, E., and Arbib, M.A. (2002). Schema design and implementation of the grasp-related
mirror neuron system. Biological Cybernetics 87, 116-140.
Oztop, E., Bradley, N.S., and Arbib, M.A. (2004). Infant grasp learning: a computational
model. Experimental Brain Research 158, 480-503.
Oztop, E., Imamizub, H., Cheng, G., and Kawato, M. (2006a). A computational model of
anterior intraparietal (AIP) neurons. Neurocomputing 69, 1354-1361.
Oztop, E., Kawato, M., and Arbib, M. (2006b). Mirror neurons and imitation: a
computationally guided review. Neural Netw 19, 254-271.
Oztop, E., Wolpert, D., and Kawato, M. (2005). Mental state inference using visual control
parameters. Brain Res Cogn Brain Res 22, 129-151.
Palmer, J., Huk, A., and Shadlen, M. (2005). The effect of stimulus strength on the speed and
accuracy of a perceptual decision. Journal of Vision 5, 376-404.
Parthasarathy, H.B., Schall, J.D., and Graybiel, A.M. (1992). Distributed but convergent
ordering of corticostriatal projections: analysis of the frontal eye field and the supplementary
eye field in the macaque monkey. J Neurosci 12, 4468-4488.
Peigneux, P., Van der Linden, M., Garraux, G., Laureys, S., Degueldre, C., Aerts, J., Del
Fiore, G., Moonen, G., Luxen, A., and Salmon, E. (2004). Imaging a cognitive model of
apraxia: The neural substrate of gesture-specific cognitive processes. Human Brain Mapping
21, 119-142.
Pesaran, B., Nelson, M.J., and Andersen, R.A. (2006). Dorsal premotor neurons encode the
relative position of the hand, eye, and goal during reach planning. Neuron 51, 125-134.
Poznanski, R.R., and Riera, J.J. (2006). fMRI models of dendritic and astrocytic networks. J
Integr Neurosci 5, 273-326.
Rand, M.K., Shimansky, Y., Stelmach, G.E., Bracha, V., and Bloedel, J.R. (2000). Effects of
accuracy constraints on reach-to-grasp movements in cerebellar patients. Exp Brain Res 135,
179-188.
215
Raos, V., Umilta, M.A., Gallese, V., and Fogassi, L. (2004). Functional properties of
grasping-related neurons in the dorsal premotor area F2 of the macaque monkey. J
Neurophysiol 92, 1990-2002.
Raos, V., Umilta, M.A., Murata, A., Fogassi, L., and Gallese, V. (2006). Functional
properties of grasping-related neurons in the ventral premotor area F5 of the macaque
monkey. J Neurophysiol 95, 709-729.
Ratcliff, R., and Rouder, J. (1998). Modeling response times for two-choice decisions.
Psychological Science 9, 347.
Rauschecker, J.P., Tian, B., and Hauser, M. (1995). Processing of complex sounds in the
macaque nonprimary auditory cortex. Science 268, 111-114.
Rees, G., Friston, K., and Koch, C. (2000). A direct quantitative relationship between the
functional properties of human and macaque V5. nature neuroscience 3, 716-723.
Rezzoug, N., and Gorce, P. (2003). A biocybernetic method to learn hand grasping posture.
Kybernetes 32, 478-490.
Riehle, A., and Requin, J. (1989). Monkey primary motor and premotor cortex: single-cell
activity related to prior information about direction and extent of an intended movement. J
Neurophysiol 61, 534-549.
Riera, J., Aubert, E., Iwata, K., Kawashima, R., Wan, X., and Ozaki, T. (2005). Fusing EEG
and fMRI based on a bottom-up model: inferring activation and effective connectivity in
neural masses. Philosophical Transactions of the Royal Society B: Biological Sciences 360,
1025.
Riera, J., Bosch, J., Yamashita, O., Kawashima, R., Sadato, N., Okada, T., and Ozaki, T.
(2004a). fMRI activation maps based on the NN-ARx model. Neuroimage 23, 680-697.
Riera, J.J., Jimenez, J.C., Wan, X., Kawashima, R., and Ozaki, T. (2007). Nonlinear local
electrovascular coupling. II: From data to neuronal masses. Hum Brain Mapp 28, 335-354.
Riera, J.J., Wan, X., Jimenez, J.C., and Kawashima, R. (2006). Nonlinear local
electrovascular coupling. I: A theoretical model. Hum Brain Mapp 27, 896-914.
Riera, J.J., Watanabe, J., Kazuki, I., Naoki, M., Aubert, E., Ozaki, T., and Kawashima, R.
(2004b). A state-space model of the hemodynamic approach: nonlinear filtering of BOLD
signals. Neuroimage 21, 547-567.
Rizzolatti, G., and Arbib, M.A. (1998). Language within our grasp. Trends Neurosci 21, 188-
194.
216
Rizzolatti, G., Camarda, R., Fogassi, L., Gentilucci, M., Luppino, G., and Matelli, M. (1988).
Functional organization of inferior area 6 in the macaque monkey. II. Area F5 and the control
of distal movements. Exp Brain Res 71, 491-507.
Rizzolatti, G., Fadiga, L., Gallese, V., and Fogassi, L. (1996a). Premotor cortex and the
recognition of motor actions. Brain Res Cogn Brain Res 3, 131-141.
Rizzolatti, G., Fadiga, L., Matelli, M., Bettinardi, V., Paulesu, E., Perani, D., and Fazio, F.
(1996b). Localization of grasp representations in humans by PET: 1. Observation versus
execution. Exp Brain Res 111, 246-252.
Rizzolatti, G., Gentilucci, M., Fogassi, L., Luppino, G., Matelli, M., and Ponzoni-Maggi, S.
(1987). Neurons related to goal-directed motor acts in inferior area 6 of the macaque
monkey. Exp Brain Res 67, 220-224.
Rizzolatti, G., Luppino, G., and Matelli, M. (1998). The organization of the cortical motor
system: new concepts. Electroencephalogr Clin Neurophysiol 106, 283-296.
Rizzolatti, G., and Matelli, M. (2003). Two different streams form the dorsal visual system:
anatomy and functions. Exp Brain Res 153, 146-157.
Romanski, L.M., Bates, J.F., and Goldman-Rakic, P.S. (1999). Auditory belt and parabelt
projections to the prefrontal cortex in the rhesus monkey. J Comp Neurol 403, 141-157.
Rothi, L.J.G., Ochipa, C., and Heilman, K.M. (1991). A cognitive neuropsychological model
of limb praxis. Cognitive Neuropsychology 8, 443-458.
Rumelhart, D.E., Hinton, G.E., and Williams, R.J. (1986). Learning representations by back-
propagation errors. Nature, 533-536.
Rumiati, R.I., Weiss, P.H., Tessari, A., Assmus, A., Zilles, K., Herzog, H., and Fink, G.R.
(2005). Common and differential neural mechanisms supporting imitation of meaningful and
meaningless actions. J Cogn Neurosci 17, 1420-1431.
Sakata, H., and Taira, M. (1994). Parietal control of hand action. Curr Opin Neurobiol 4,
847-856.
Sakata, H., Taira, M., Kusunoki, M., Murata, A., and Tanaka, Y. (1997). The TINS Lecture.
The parietal association cortex in depth perception and visual control of hand action. Trends
Neurosci 20, 350-357.
Sakata, H., Taira, M., Kusunoki, M., Murata, A., Tanaka, Y., and Tsutsui, K. (1998). Neural
coding of 3D features of objects for hand action in the parietal cortex of the monkey. Philos
Trans R Soc Lond B Biol Sci 353, 1363-1373.
217
Sakata, H., Taira, M., Kusunoki, M., Murata, A., Tsutsui, K., Tanaka, Y., Shein, W.N., and
Miyashita, Y. (1999). Neural representation of three-dimensional features of manipulation
objects with stereopsis. Exp Brain Res 128, 160-169.
Sakata, H., Taira, M., Murata, A., and Mine, S. (1995). Neural mechanisms of visual
guidance of hand action in the parietal cortex of the monkey. Cereb Cortex 5, 429-438.
Sakata, H., Tsutsui, K., and Taira, M. (2005). Toward an understanding of the neural
processing for 3D shape perception. Neuropsychologia 43, 151-161.
Salzman, C., Britten, K., and Newsome, W. (1990). Cortical microstimulation influences
perceptual judgements of motion direction. Nature 346, 174-177.
Salzman, C., Murasugi, C., Britten, K., and Newsome, W. (1992). Microstimulation in visual
area MT: effects on direction discrimination performance. Journal of Neuroscience 12, 2331.
Saper, C., and Loewy, A. (1982). Projections of the pedunculopontine tegmental nucleus in
the rat: evidence for additional extrapyramidal circuitry. Brain Research 252, 367-372.
Saxe, R., Jamal, N., and Powell, L. (2006). My body or yours? The effect of visual
perspective on cortical body representations. Cerebral Cortex 16, 178-182.
Schaal, S., and Schweighofer, N. (2005). Computational motor control in humans and robots.
Curr Opin Neurobiol 15, 675-682.
Schall, J.D., Morel, A., King, D.J., and Bullier, J. (1995). Topography of visual cortex
connections with frontal eye field in macaque: convergence and segregation of processing
streams. J Neurosci 15, 4464-4487.
Schettino, L.F., Adamovich, S.V., and Poizner, H. (2003). Effects of object shape and visual
feedback on hand configuration during grasping. Exp Brain Res 151, 158-166.
Schultz, W. (1998). Predictive reward signal of dopamine neurons. J Neurophysiol 80, 1-27.
Schultz, W., Apicella, P., and Ljungberg, T. (1993). Responses of monkey dopamine neurons
to reward and conditioned stimuli during successive steps of learning a delayed response task
-- Schultz et al. 13 (3): 900 -- Journal of Neuroscience. Journal of Neuroscience 13, 900-913.
Schütz-Bosbach, S., Mancini, B., Aglioti, S., and Haggard, P. (2006). Self and other in the
human motor system. Current Biology 16, 1830-1834.
Sebanz, N., Knoblich, G., and Prinz, W. (2003). Representing others' actions: just like one's
own? Cognition 88, B11-B21.
218
Seltzer, B., and Pandya, D.N. (1986). Posterior parietal projections to the intraparietal sulcus
of the rhesus monkey. Exp Brain Res 62, 459-469.
Seltzer, B., and Pandya, D.N. (1989). Frontal lobe connections of the superior temporal
sulcus in the rhesus monkey. J Comp Neurol 281, 97-113.
Sereno, M.E., Trinath, T., Augath, M., and Logothetis, N.K. (2002). Three-dimensional
shape representation in monkey cortex. Neuron 33, 635-652.
Shadlen, M., and Newsome, W. (1996). Motion perception: seeing and deciding. Proceedings
of the National Academy of Sciences of the United States of America 93, 628.
Shadlen, M., and Newsome, W. (2001). Neural basis of a perceptual decision in the parietal
cortex (area LIP) of the rhesus monkey. Journal of Neurophysiology 86, 1916.
Sheth, S.A., Nemoto, M., Guiou, M., Walker, M., Pouratian, N., and Toga, A.W. (2004).
Linear and nonlinear relationships between neuronal activity, oxygen metabolism, and
hemodynamic responses. Neuron 42, 347-355.
Shikata, E., Hamzei, F., Glauche, V., Knab, R., Dettmers, C., Weiller, C., and Buchel, C.
(2001). Surface orientation discrimination activates caudal and anterior intraparietal sulcus in
humans: an event-related fMRI study. J Neurophysiol 85, 1309-1314.
Shikata, E., Hamzei, F., Glauche, V., Koch, M., Weiller, C., Binkofski, F., and Buchel, C.
(2003). Functional properties and interaction of the anterior and posterior intraparietal areas
in humans. Eur J Neurosci 17, 1105-1110.
Shikata, E., Tanaka, Y., Nakamura, H., Taira, M., and Sakata, H. (1996). Selectivity of the
parietal visual neurones in 3D orientation of surface of stereoscopic stimuli. Neuroreport 7,
2389-2394.
Shimazu, H., Maier, M.A., Cerri, G., Kirkwood, P.A., and Lemon, R.N. (2004). Macaque
ventral premotor cortex exerts powerful facilitation of motor cortex outputs to upper limb
motoneurons. J Neurosci 24, 1200-1211.
Shipp, S., Blanton, M., and Zeki, S. (1998). A visuo-somatomotor pathway through superior
parietal cortex in the macaque monkey: cortical connections of areas V6 and V6A. Eur J
Neurosci 10, 3171-3193.
Shipp, S., and Zeki, S. (1995). Segregation and convergence of specialised pathways in
macaque monkey visual cortex. J Anat 187 ( Pt 3), 547-562.
Shmuelof, L., and Zohary, E. (2005). A mirror representation of external actions in the
anterior parietal cortex. Reviews in the Neurosciences 16, S59-S60.
219
Shmuelof, L., and Zohary, E. (2006). A mirror representation of others' actions in the human
anterior parietal cortex. Journal of Neuroscience 26, 9736-9742.
Singh, S.P., and Sutton, R.S. (1996). Reinforcement learning with replacing eligibility traces.
Machine Learning 22, 123-158.
Slaney, M. (1998). Auditory toolbox, version 2. (Interval Research Corporation).
Smeets, J.B., and Brenner, E. (2001). Independent movements of the digits in grasping. Exp
Brain Res 139, 92-100.
Smith, Y., and Bolam, J. (1990). The output neurones and the dopaminergic neurones of the
substantia nigra receive a GABA-containing input from the globus pallidus in the rat. J Comp
Neurol 296, 47-64.
Snyder, L.H., Batista, A.P., and Andersen, R.A. (1997). Coding of intention in the posterior
parietal cortex. Nature 386, 167-170.
Snyder, L.H., Batista, A.P., and Andersen, R.A. (2000). Intention-related activity in the
posterior parietal cortex: a review. Vision Res 40, 1433-1441.
Soechting, J.F., and Flanders, M. (1989). Sensorimotor representations for pointing to targets
in three-dimensional space. J Neurophysiol 62, 582-594.
Sotero, R., and Trujillo-Barreto, N. (2008). Biophysical model for integrating neuronal
activity, EEG, fMRI and metabolism. Neuroimage 39, 290-309.
Sotero, R.C., Trujillo-Barreto, N.J., Jimenez, J.C., Carbonell, F., and Rodriguez-Rojas, R.
(2009). Identification and comparison of stochastic metabolic/hemodynamic models
(sMHM) for the generation of the BOLD signal. J Comput Neurosci 26, 251-269.
Stratford, T., and Kelley, A. (1999). Evidence of a functional relationship between the
nucleus accumbens shell and lateral hypothalamus subserving the control of feeding
behavior. Journal of Neuroscience 19, 11040.
Suri, R.E., and Schultz, W. (2001). Temporal difference model reproduces anticipatory
neural activity. Neural Comput 13, 841-862.
Sutton, R.S., and Barto, A.G. (1998). Reinforcement Learning: An Introduction (Cambridge,
MA: MIT Press).
Tagamets, M., and Horwitz, B. (2001). Interpreting PET and fMRI measures of functional
neural activity: the effects of synaptic inhibition on cortical activation in human imaging
studies. Brain research bulletin 54, 267-273.
220
Tagamets, M.A., and Horwitz, B. (1998). Integrating electrophysiological and anatomical
experimental data to create a large-scale model that simulates a delayed match-to-sample
human brain imaging study. Cereb Cortex 8, 310-320.
Tagamets, M.A., and Horwitz, B. (2000). A model of working memory: bridging the gap
between electrophysiology and human brain imaging. Neural Networks 13, 941-952.
Taira, M., Mine, S., Georgopoulos, A.P., Murata, A., and Sakata, H. (1990). Parietal cortex
neurons of the monkey related to the visual guidance of hand movement. Exp Brain Res 83,
29-36.
Tanaka, S., Inui, T., Iwaki, S., Konishi, J., and Nakai, T. (2001). Neural substrates involved
in imitating finger configurations: an fMRI study. Neuroreport 12, 1171-1174.
Tanne, J., Boussaoud, D., Boyer-Zeller, N., and Rouiller, E.M. (1995). Direct visual
pathways for reaching movements in the macaque monkey. Neuroreport 7, 267-272.
Tanne-Gariepy, J., Rouiller, E.M., and Boussaoud, D. (2002). Parietal inputs to dorsal versus
ventral premotor areas in the macaque monkey: evidence for largely segregated visuomotor
pathways. Exp Brain Res 145, 91-103.
Todorov, E. (2000). Direct cortical control of muscle activation in voluntary arm movements:
a model. Nat Neurosci 3, 391-398.
Treisman, A. (1996). The binding problem. Curr Opin Neurobiol 6, 171-178.
Tsao, D.Y., Vanduffel, W., Sasaki, Y., Fize, D., Knutsen, T.A., Mandeville, J.B., Wald, L.L.,
Dale, A.M., Rosen, B.R., Van Essen, D.C., et al. (2003). Stereopsis activates V3A and caudal
intraparietal areas in macaques and humans. Neuron 39, 555-568.
Tsutsui, K., Jiang, M., Yara, K., Sakata, H., and Taira, M. (2001). Integration of perspective
and disparity cues in surface-orientation-selective neurons of area CIP. J Neurophysiol 86,
2856-2867.
Tsutsui, K., Taira, M., and Sakata, H. (2005). Neural mechanisms of three-dimensional
vision. Neurosci Res 51, 221-229.
Tunik, E., Frey, S.H., and Grafton, S.T. (2005). Virtual lesions of the anterior intraparietal
area disrupt goal-dependent on-line adjustments of grasp. Nat Neurosci 8, 505-511.
Ueki, A., Uno, M., Anderson, M., and Yoshida, M. (1977). Monosynaptic inhibition of
thalamic neurons produced by stimulation of the substantia nigra. Experientia 33, 1480-1482.
221
Umilta, M.A., Brochier, T., Spinks, R.L., and Lemon, R.N. (2007). Simultaneous recording
of macaque premotor and primary motor cortex neuronal populations reveals different
functional contributions to visuomotor grasp. J Neurophysiol 98, 488-501.
Umilta, M.A., Escola, L., Intskirveli, I., Grammont, F., Rochat, M., Caruana, F., Jezzini, A.,
Gallese, V., and Rizzolatti, G. (2008). When pliers become fingers in the monkey motor
system. Proc Natl Acad Sci U S A 105, 2209-2213.
Umilta, M.A., Kohler, E., Gallese, V., Fogassi, L., Fadiga, L., Keysers, C., and Rizzolatti, G.
(2001). I know what you are doing. a neurophysiological study. Neuron 31, 155-165.
Ungerleider, L., and Mishkin, M. (1982). Two cortical visual systems. Analysis of visual
behavior 549, 586.
Uno, M., and Yoshida, M. (1975). Monosynaptic inhibition of thalamic neurons produced by
stimulation of the pallidal nucleus in cats. Brain Res 99, 377-380.
Vogt, S., Buccino, G., Wohlschlger, A.M., Canessa, N., Eickhoff, S., Maier, K., Shah, J.,
Zilles, K., Freund, H.J., Rizzolatti, G., and Fink, G.R. (2005). The mirror neuron system and
area 46 in the imitation of novel and practised hand actions: an event-related fMRI study. In
23rd European Workshop on Cognitive Neuropsychology (Bressanone).
von Hofsten, C. (1982). Eye-hand coordination in the newborn. Developmental Psychology
18, 450-461.
von Hofsten, C., and Ronnqvist, L. (1988). Preparation for grasping an object: a
developmental study. J Exp Psychol Hum Percept Perform 14, 610-621.
von Hofsten, C., and Spelke, E.S. (1985). Object perception and object-directed reaching in
infancy. J Exp Psychol Gen 114, 198-212.
Waelti, P., Dickinson, A., and Schultz, W. (2001). Dopamine responses comply with basic
assumptions of formal learning theory. Nature 412, 43-48.
Waldvogel, D., Van Gelderen, P., Muellbacher, W., Ziemann, U., Immisch, I., and Hallett,
M. (2000). The relative metabolic demand of inhibition and excitation. Nature 406, 995-998.
Wang, Y., Shima, K., Isoda, M., Sawamura, H., and Tanji, J. (2002). Spatial distribution and
density of prefrontal cortical cells projecting to three sectors of the premotor cortex.
Neuroreport 13, 1341-1344.
Weinrich, M., and Wise, S.P. (1982). The premotor cortex of the monkey. J Neurosci 2,
1329-1345.
222
Weitzenfeld, A., Arbib, M.A., and Alexander, A. (2002). The Neural Simulation Language:
A Framework for Brain Modeling (MIT Press).
Werbos, P.J. (1990). Backpropagation through time: what it does and how to do it.
Proceedings of the IEEE 78, 1550-1560.
Wilson, C., Nomikos, G., Collu, M., and Fibiger, H. (1995). Dopaminergic correlates of
motivated behavior: importance of drive. Journal of Neuroscience 15, 5169.
Winder, R., Cortes, C., Reggia, J., and Tagamets, M. (2007). Functional connectivity in
fMRI: A modeling approach for estimation and for relating to local circuits. Neuroimage 34,
1093-1107.
Winges, S.A., Santello, M. (2005). From Single Motor Unit Activity to Multiple Grip Forces:
Mini-review of Multi-digit Grasping. Integrative and Comparative Biology 45, 679-682.
Winn, P. (1995). The lateral hypothalamus and motivated behavior: An old syndrome
reassessed and a new perspective gained. Current Directions in Psychological Science, 182-
187.
Wise, S.P., Boussaoud, D., Johnson, P.B., and Caminiti, R. (1997). Premotor and parietal
cortex: corticocortical connectivity and combinatorial computations. Annu Rev Neurosci 20,
25-42.
Witherington, D.C. (2005). The Development of Prospective Grasping Control Between 5
and 7 Months: A Longitudinal Study. Infancy 7, 143-161.
Wu, S., Amari, S., and Nakahara, H. (2002). Population coding and decoding in a neural
field: a computational study. Neural Comput 14, 999-1026.
Xiao, J., Padoa-Schioppa, C., and Bizzi, E. (2006). Neuronal correlates of movement
dynamics in the dorsal and ventral premotor area in the monkey. Exp Brain Res 168, 106-
119.
Yin, H. (2008). On multidimensional scaling and the embedding of self-organising maps.
Neural Netw 21, 160-169.
Zatsiorsky, V.M., and Latash, M.L. (2004). Prehension synergies. Exerc Sport Sci Rev 32,
75-80.
Zheng, Y., Johnston, D., Berwick, J., Chen, D., Billings, S., and Mayhew, J. (2005). A three-
compartment model of the hemodynamic response and oxygen delivery to brain. Neuroimage
28, 925-939.
223
Zheng, Y., Martindale, J., Johnston, D., Jones, M., Berwick, J., and Mayhew, J. (2002). A
model of the hemodynamic response and oxygen delivery to brain. Neuroimage 16, 617-637.
Zucker, R.S., and Regehr, W.G. (2002). Short-term synaptic plasticity. Annual Review of
Physiology 64, 355-405.
224
Appendix
Related Synthetic Brain Imaging Models
Table A-1 A meta-analysis of synthetic brain imaging studies in terms of the mechanisms
included: neural model (LI=leaky integrator, SU=sigmoidal units, NaLI=sodium
concentration leaky integrator, GF=gamma function, MFA=mean field approximation,
DI=decaying impulse, LIF=leaky integrate-and fire, PSP=postsynaptic potential,
CM=compartmental model, NM=neural mass model, IZ=Izhikevich, ), synaptic model
(CBK=conductance-based kinetic model), neurovascular coupling signal (WI=sum of
absolute value of connection weight times input, Na/K=ATP consumption by Na/K pump,
FA=field activity, SI=sum of absolute value of synaptic currents, SC=sum of synaptic
conductances, NP=number PSPs, TCC=transmembrane capacitive currents, NAS=number of
active synapses), rCBF generation, BOLD signal generation, temporal smoothing, oxygen
metabolism (0
2
), glucose metabolixm, adjacent voxel crosstalk, neural noise, scanner noise,
network connection variability, scan repetition time (TR), and scanner field strength (B
0
). The
present study is analyzed in the last row.
Neural
model
Syn-
aptic
model
Neuro-
vascular
signal
CBF BOLD Temp.
smooth-
ing
O 2 Gluc-
ose
Voxel
cross-
talk
Neural
noise
Scanner
noise
Conn-
ection
variability
TR B 0
Arbib et al
(1995)
LI WI x
Tagamets
& Horwitz
(1998)
SU WI x x
Horwitz &
Tagamets
(1999)
SU WI x x x x
Friston et
al (2000)
x x x x x
Arbib et al
(2000)
LI WI x x
Tagamets
& Horwitz
(2000)
SU WI x x
Aubert et
al (2001)
NaLI Na/K x x
Tagamets
& Horwitz
(2001)
SU WI x x
Mechelli et
al (2001)
GF x x x x x
Arbib et al
(2002)
LI WI x
Husain et
al (2002)
SU WI x
Corchs &
Deco
(2002)
MFA FA x x
225
Table A-1: Continued
Almeida &
Stetter
(2002)
MFA WI x
Aubert &
Costalat
(2002)
NaLi Na/K x x x x x x
Buxton et
al (2004)
DI x x x x
Corchs &
Deco
(2004)
MFA FA x x x
Husain et
al (2004)
SU WI x x x x
Riera et al
(2004b)
x x x x x x x
Deco et al
(2004)
LIF CBK SI x x x
Babajani
et al
(2005)
PSP NP x x x x x x x x x
Horwitz et
al (2005)
SU WI x x x x x
Riera et al
(2005)
x x x x x x x
Chadderd
on &
Sporns
(2006)
SU WI x x x
Lee et al
(2006)
SU WI x x x x
Riera et al
(2006)
CM TCC x x x x x x
Babajani &
Soltanian-
Zadeh
(2006)
NM NP x x x x x x
Winder et
al (2007)
SU WI x x x
Sotero &
Trujillo-
Barreto
(2008)
NM NAS x x x x x x
Izhikevich
& Edelman
(2008)
IZ CBK SSC x
Babajani-
Feremi et
al (2008)
PSP NP x x x x x x x x x
Sotero et
al (2009)
NM NAS x x x x x x x x
current
study
IZ CBK SSC x x x x x x
226
MNS2 - Recurrent Network Setup
We used Jordan-type recurrent networks (Jordan, 1986) for both the audio and main
networks. The main recurrent network contains 7 external input units, 5 recurrent input units,
15 hidden units, 3 external output units, and 5 recurrent output units. The audio recurrent
network contains 71 external input units, 5 recurrent input units, 20 hidden units, 3 external
output units, and 5 recurrent output units. In both networks each layer is fully connected with
the layer above it, and the recurrent output units are fully connected with the recurrent input
units (see Figure 2-2). The external output layer of the audio recurrent network is fully
connected to the external output layer of the main recurrent network. Each network is
separately trained using backpropagation through time (BPTT, Werbos, 1990), an extension
of the backpropagation learning algorithm (Rumelhart et al., 1986) for use with recurrent
networks. The audio recurrent network is trained first, then the connection weights between
the external output units of the two networks are modified using Hebbian association,
concurrent with the main network's BPTT training.
The following formulation refers to the main recurrent network (the operation of the
audio recurrent network is very similar, minus the Hebbian association). In this formulation,
MR(t) (mirror response) represents the mirror neuron system output at the time t and MP(t)
(motor program) denotes the target of the network at time t. X(t) denotes the hand state input
vector applied to the network at time t, and A(t) denotes the external output vector of the
audio recurrent network at time t. I(t) represents the activity of the recurrent input units at
time t and O(t) represents the recurrent output unit activity. The squashing function we used
227
was
1
( )
1
x
g x
e
- =
+
, bounding each unit's activity by 0.0 and 1.0. η
2
is the backpropagation
through time learning rate, initialized to 1.0. W1 is the 8×5 matrix of real numbers
representing the hidden-to-output (external and recurrent) layer weights, W2 is the 15×12
matrix of real numbers representing the input (external hand state and recurrent) to hidden
layer weights, W3 is the 5×5 matrix of real numbers representing the recurrent output-to-
recurrent input layer weights, and W4 is the 3×3 matrix of real numbers representing the
audio network external output layer to main network external output layer weights. The
following formulation is adapted from (Hertz et al., 1991).
( ) ( ) ( ) ( )
15 7 5 3
1 1 1 1
1 2 2 4
i ij jk k jo o im m
j k o m
MR t g W g W X t W I t W A t
= = = =
= + +
∑ ∑ ∑ ∑
( ) ( ) ( )
15 7 5
1 1 1
1 2 2
n nj jk k jo o
j k o
O t g W g W X t W I t
= = =
= +
∑ ∑ ∑
( ) ( )
5
1
1 3
o on n
n
I t g W O t
=
+ =
∑
The network is run in the forward mode for the length L and the activation of each unit in
the network is saved for each time step. We chose a value for L equal to the entire length of
the sequence. At the end of the sequence, the error is propagated backwards through the
network and the weights are simultaneously updated using the average weight change over
all time steps (Werbos, 1990).
Learning weights from hidden to external output layer
( ) ( ) ( ) ( ) ( ) ( )
15 7 5
1 1 1
1 ' 1 2 2
ij ij jk k jo o i i
j k o
W t g W g W X t W I t MP t MR t δ
= = =
= + -
∑ ∑ ∑
228
( )
1
2
1
1 1
L
ij
t
ij ij
W t
W W
L
δ
η
=
= +
∑
Learning weights from hidden to recurrent output layer
( ) ( ) ( )
( ) ( )
15 7 5
1 1 1
7 5
1 1
1 1 2 2
2 2
nj nj jk k jo o
j k o
jk k jo o
k o
W t g W g W X t W I t
g W X t W I t
δ
= = =
= =
= +
+
∑ ∑ ∑
∑ ∑
( )
1
2
1
1 1
L
nj
t
nj nj
W t
W W
L
δ
η
=
= +
∑
Learning weights from visual input to hidden layer
( ) ( ) ( )
7
1
2 ' 2
jk jk k k
k
W t g W X t X t δ
=
=
∑
( )
1
2
2
2 2
L
jk
t
jk jk
W t
W W
L
δ
η
=
= +
∑
Learning weights from recurrent input to hidden layer
( ) ( ) ( )
5
1
2 ' 2
jo jo o o
o
W t g W I t I t δ
=
=
∑
( )
1
2
2
2 2
L
jo
t
jo jo
W t
W W
L
δ
η
=
= +
∑
Learning weights from recurrent output to recurrent input units
( ) ( ) ( )
5
1
3 ' 3 1 1
on on o o
n
W t g W O t O t δ
=
= - -
∑
229
( )
1
2
3
3 3
L
on
t
on on
W t
W W
L
δ
η
=
= +
∑
MNS2 - Network weights after training
Table A-2 External input layer → Hidden layer weights
a(t) o1(t) o2(t) v(t) d(t) o3(t) o4(t)
Hidden 0
6.90 -5.16 -3.68 -1.15 -0.20 0.02 -0.06
Hidden 1
1.55 0.74 -3.27 -1.82 -9.40 1.52 -0.12
Hidden 2
-1.81 2.09 -1.92 -0.42 3.33 1.70 0.68
Hidden 3
-0.48 3.41 2.64 1.55 7.32 0.04 -0.76
Hidden 4
0.74 -0.78 -2.60 -1.82 4.88 0.03 -1.13
Hidden 5
0.55 0.40 -2.60 -1.23 1.96 1.57 0.66
Hidden 6
2.95 10.18 -8.46 -5.61 7.04 1.29 -1.46
Hidden 7
1.85 -2.47 1.10 2.07 3.18 0.48 -5.26
Hidden 8
-7.55 -8.24 8.41 2.03 8.54 1.05 2.30
Hidden 9
2.33 -4.26 -3.39 -9.53 -3.98 -4.54 3.63
Hidden 10
0.72 7.79 2.76 0.38 8.83 -1.66 2.93
Hidden 11
-4.74 -11.88 4.16 -2.71 6.37 0.11 3.04
Hidden 12
-5.13 4.60 5.03 7.37 -7.35 6.29 -1.91
Hidden 13
-1.40 -2.70 0.45 -2.53 1.24 5.28 -3.70
Hidden 14
-3.79 -1.38 -0.55 -4.66 5.13 -2.41 -3.21
230
Table A-3 Recurrent input layer → Hidden layer weights
Recurrent
In 0
Recurrent
In 1
Recurrent
In 2
Recurrent
In 3
Recurrent
In 4
Hidden 0
0.22 0.16 -0.02 -0.22 -3.08
Hidden 1
-7.94 -8.31 -8.06 -8.45 5.56
Hidden 2
-2.34 -1.72 -2.42 -2.42 0.72
Hidden 3
-5.37 -6.19 -5.45 -5.52 -5.91
Hidden 4
4.31 3.89 4.22 4.06 2.97
Hidden 5
3.73 3.69 3.71 3.79 3.53
Hidden 6
3.64 2.91 3.57 3.59 -6.11
Hidden 7
6.04 4.33 5.65 5.25 -4.29
Hidden 8
4.16 3.08 3.92 3.80 -4.12
Hidden 9
-2.16 1.02 -2.01 -1.82 2.73
Hidden 10
3.64 -0.49 3.00 2.55 -10.18
Hidden 11
2.06 1.82 1.83 1.53 -1.75
Hidden 12
-4.68 -3.26 -4.61 -4.41 1.11
Hidden 13
3.18 3.45 3.10 3.14 1.64
Hidden 14
3.29 -0.51 3.19 3.21 -4.62
Table A-4 Hidden layer → Output layer weights
Precision Side Power Recurrent
Out 0
Recurrent
Out 1
Recurrent
Out 2
Recurrent
Out 3
Recurrent
Out 4
Hidden
0
-6.84 -3.62 -3.01 0.37 0.04 -0.22 0.22 0.81
Hidden
1
-11.92 -6.77 -4.55 0.90 -0.54 -7.27 -0.54 0.59
Hidden
2
-2.05 -2.74 0.59 0.25 -1.86 3.45 -1.35 0.91
Hidden
3
-0.45 1.02 2.39 3.04 1.63 4.28 3.63 1.38
Hidden
4
3.00 2.57 1.58 -5.95 -6.62 -0.71 -5.57 -4.20
231
Table A-4: Continued
Hidden
5
0.76 -1.53 -1.12 -3.90 -4.83 -0.37 -4.23 -2.75
Hidden
6
6.44 -10.76 1.31 0.03 -0.42 0.96 -1.86 1.75
Hidden
7
-10.85 -1.94 0.79 0.55 0.25 0.06 -0.42 3.28
Hidden
8
-0.59 11.24 -3.25 1.06 0.46 0.92 -1.90 2.26
Hidden
9
-10.42 -2.49 -0.35 -2.76 -0.40 -0.70 0.44 -0.60
Hidden
10
-2.69 -1.22 -10.11 3.89 -1.45 4.58 0.06 4.83
Hidden
11
-11.06 -3.77 -4.55 1.29 0.37 2.95 -0.58 0.96
Hidden
12
1.96 -0.83 -12.38 0.46 -1.47 -1.33 0.45 0.32
Hidden
13
-1.72 0.55 7.35 -3.59 -0.86 -0.98 -0.46 -1.37
Hidden
14
2.38 0.47 -0.21 -1.71 -2.72 7.61 -1.15 -2.16
232
Table A-5 Recurrent output layer → Recurrent input layer weights
Recurrent
In 0
Recurrent
In 1
Recurrent
In 2
Recurrent
In 3
Recurrent
In 4
Recurrent
Out 0
-0.76 -1.05 -0.79 -0.89 1.48
Recurrent
Out 1
1.05 1.36 1.02 1.28 -0.44
Recurrent
Out 2
-3.33 0.88 -2.80 -2.24 12.23
Recurrent
Out 3
-1.57 -1.61 -1.55 -1.57 -4.43
Recurrent
Out 4
-1.71 -4.41 -2.17 -2.86 0.85
Table A-6 Audio network external output layer → Main layer external output layer weights
Precision Side Power
Audio 0
0.14 0.19 9.76
Audio 1
9.76 0.10 0.05
Audio 2
0.14 0.19 0.05
ILGA - Simulation Environment
In order to embody models in simulated environments the Neural Simulation Language
(NSL) has been extended to include 3d graphics functionality and a physics engine. Utilities
were developed for creating a simulated 3d environment and embedding bodies in this
environment with limbs connected by hinge, universal, or ball joints. Together the 3d
graphics function, physics engine, and 3d simulation utilities allow NSL models to control
bodies in a simulated 3d world and to receive virtual sensory input from the environment.
233
The simulated 3d world uses Java3d to maintain a scene graph - a data structure
commonly used in computer games to represent the spatial configuration of objects in a
scene. Geometric transformations and compound objects are efficiently handled by
associating transformation matrices with graph nodes. These matrices can be transformed in
order to move an object and all of its child objects (i.e. moving the elbow moves the arm and
the hand).
The Open Dynamics Engine (ODE) is used for the physics simulation. ODE is an open
source library for simulation of rigid body physics. It contains several joint types and
performs collision detection (and contact force application) with friction. When the engine is
initialized, NSL maintains a coupling between it and the Java3d representation. At each time
step the physics engine is polled for the position and orientation of the object which is used
to update the object’s position and orientation in the Java3d scene graph. Forces can be
applied to objects in the scene and torque to objects connected with joints.
ILGA - Arm/Hand Model
The 22 DOF arm/hand model was developed using limb proportions based on those for a
7.5 kg monkey (Chan and Moran, 2006). The arm has a ball joint at the shoulder with 3
DOFs, a 1 DOF hinge joint at the elbow, and a 3 DOF ball joint at the wrist (Figure A-1).
The fingers each have three joints with 1 DOF for the metacarpophalangeal, proximal
interphalangeal, and distal interphalangeal joints, while the thumb has one 2 DOF joint at its
base (simplifying the carpometacarpal joint) and a 1 DOF metacarpophalangeal joint.
234
3DOF
3DOF
2DOF
Figure A-1 Unless specified, each joint has 1 DOF. The simulated arm/hand has a total of 22
DOFs.
The inverse kinematics module uses the pseudo-inverse of the Jacobian matrix for the
arm to generate target joint angles given a desired wrist position and the current wrist
position. The Jacobian matrix for the arm with upper arm length l
1
and forearm length l
2
is
given by:
( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( )
3 1 3 2 1 4 3 1 3 2 1 4 2 3 1 3 2 1 1 3 2 1 4 3 2 1 4 2 3 2 1 1
3 1 3 2 1 4 3 1 3 2 1 4 2 3 1 3 2 1 1 3 2 1 4 3 2 1 4 2 3 2 1 1
2 1 4 2 1 4 2 2 1 1 2 1 4 2 1 4 2 2
- s c -c s s c - -s s -c s c s l - s s +c s c l - c c c c - -c c s s l -c c s l
- -c c -s s s c -(c s -s s c )s l - -c s +s s c l - s c c c - -s c s s l -s c s l
- -c s c - -c c s l -c c l - -s c c - s s s l +s
= J
( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( )
1 1
3 1 3 2 1 4 3 1 3 2 1 4 2 3 1 3 2 1 1 3 1 3 2 1 4 3 1 3 2 1 4 2
3 1 3 2 1 4 3 1 3 2 1 4 2 3 1 3 2 1 1 3 1 3 2 1 4 3 1 3 2 1 4 2
2 1 4 2 1 4 2
s l
- c s -s s c c - c c +s s s s l - -c c -s s s l - - s s +c s c s - s c -c s s c l
- s s +c s c c - s c -c s s s l - -s c +c s s l - - c s +s s c s - -c c -s s s c l
0 - - c c s - -c s c l
wh
ere sin(θ
1
) and cos(θ
1
) are abbreviated as s
1
and c
1
, sin(θ
2
) and cos(θ
2
) as s
2
and c
2
, sin(θ
3
)
and cos(θ
3
) as s
3
and c
3
, and sin(θ
4
) and cos(θ
4
) as s
4
and c
4
. The angle θ
1
is the angle of the
235
shoulder in the x-axis, θ
2
is the shoulder angle in the y-axis, θ
3
is the shoulder angle in the z-
axis, and θ
4
is the elbow angle.
Proportional-derivative (PD) controllers are used to control each degree of freedom
(DOF) of the arm and hand. Each PD controller applies torque τ at time t to its controlled
joint with angle θ in order to reach to a desired value
ˆ
θ :
( ) ( )
( )
( )
ˆ
t p t d t τ θ θ θ = - +
ɺ
where p is the gain and d is a damping parameter.
236
Table A-7 Parameters of the arm/hand model
Joint Parameter Description Value
θ
x
Angle limits in x-axis [-π/2 π/2]
θ
y
Angle limits in y-axis [-π/4 π/4]
θ
z
Angle limits in z-axis [-π/2 π/2]
p
x
Gain parameter for PD controller in x-
axis
100000
d
x
Damping parameter for PD controller in
x-axis
1000000
p
y
Gain parameter for PD controller in y-
axis
30000
d
y
Damping parameter for PD controller in
y-axis
1000000
Shoulder
p
z
Gain parameter for PD controller in z-
axis
30000
θ
x
Angle limits [-π/4 π/4]
p
x
Gain parameter for PD controller in x-
axis
30000
Elbow
d
x
Damping parameter for PD controller in
x-axis
1000000
θ
x
Angle limits in x-axis [-π/6 π/4]
θ
y
Angle limits in y-axis [-π/6 π/2]
θ
z
Angle limits in z-axis [-π/12 π/18]
p
x
Gain parameter for PD controller in x-
axis
7000
d
x
Damping parameter for PD controller in
x-axis
1000000
p
y
Gain parameter for PD controller in y-
axis
8000
d
y
Damping parameter for PD controller in
y-axis
1000000
p
z
Gain parameter for PD controller in z-
axis
5000
Wrist
d
z
Damping parameter for PD controller in
z-axis
800000
θ
x
Angle limits in x-axis [0 π/2]
θ
z
Angle limits in z-axis [-π/6 0]
p
x
Gain parameter for PD controller in x-
axis
5000
d
x
Damping parameter for PD controller in
x-axis
320000
p
z
Gain parameter for PD controller in z-
axis
5000
Thumb –
carpometacarpal joint
d
z
Damping parameter for PD controller in
z-axis
320000
θ
x
Angle limits in x-axis [0 π/2]
p
x
Gain parameter for PD controller in x-
axis
5000
Thumb -
metacarpophalangeal
joint
d
x
Damping for PD controller in x-axis 240000
237
Table A-7: Continued
θ
x
Angle limits in x-axis [0 π/2]
p
x
Gain parameter for PD controller in x-
axis
5000
Index finger –
metacarpophalangeal
joint
d
x
Damping parameter for PD controller in
x-axis
350000
θ
x
Angle limits in x-axis [0 π/2]
p
x
Gain parameter for PD controller in x-
axis
5000
Index finger – proximal
interphalangeal joint
d
x
Damping parameter for PD controller in
x-axis
320000
θ
x
Angle limits in x-axis [0 π/2]
p
x
Gain parameter for PD controller in x-
axis
5000
Index finger – distal
interphalangeal joint
d
x
Damping parameter for PD controller in
x-axis
240000
θ
x
Angle limits in x-axis [0 π/2]
p
x
Gain parameter for PD controller in x-
axis
5000
Middle finger –
metacarpophalangeal
joint
d
x
Damping parameter for PD controller in
x-axis
350000
θ
x
Angle limits in x-axis [0 π/2]
p
x
Gain parameter for PD controller in x-
axis
5000
Middle finger –
proximal
interphalangeal joint
d
x
Damping parameter for PD controller in
x-axis
320000
θ
x
Angle limits in x-axis [0 π/2]
p
x
Gain parameter for PD controller in x-
axis
5000
Middle finger – distal
phalangeal joint
d
x
Damping parameter for PD controller in
x-axis
240000
θ
x
Angle limits in x-axis [0 π/2]
p
x
Gain parameter for PD controller in x-
axis
5000
Ring finger –
metacarpophalangeal
joint
d
x
Damping parameter for PD controller in
x-axis
350000
θ
x
Angle limits in x-axis [0 π/2]
p
x
Gain parameter for PD controller in x-
axis
5000
Ring finger – proximal
interphalangeal joint
d
x
Damping parameter for PD controller in
x-axis
320000
θ
x
Angle limits in x-axis [0 π/2]
p
x
Gain parameter for PD controller in x-
axis
5000
Ring finger – distal
interphalangeal joint
d
x
Damping parameter for PD controller in
x-axis
240000
θ
x
Angle limits in x-axis [0 π/2] Pinky finger –
metacarpophalangeal
joint
p
x
Gain parameter for PD controller in x-
axis
5000
238
Table A-7: Continued
d
x
Damping parameter for PD controller in
x-axis
350000
θ
x
Angle limits in x-axis [0 π/2]
p
x
Gain parameter for PD controller in x-
axis
5000
Pinky finger – proximal
interphalangeal joint
d
x
Damping parameter for PD controller in
x-axis
320000
θ
x
Angle joint limits in x-axis [0 π/2]
p
x
Gain parameter for PD controller in x-
axis
5000
Pinky finger – distal
interphalangeal joint
d
x
Damping parameter for PD controller in
x-axis
240000
239
ILGA - Implementation
Table A-8 Parameters of the ILGA model
Parameter Description Value
N Size of each neural population 100
σ
LIP
LIP population code width .05
ε
LIP
LIP noise Randomly distributed
between -.1 and .1
σ
V6A
V6A population code width .05
ε
V6A
V6A noise Randomly distributed
between -.1 and .1
σ
cIPS
cIPS population code width .05
ε
cIPS
CIPS noise Randomly distributed
between -.1 and .1
ε
AIP
AIP noise Randomly distributed
between -.1 and .1
ψ
AIP
Maximum initial weight for connections to AIP 1.0
r
0
Initial neighborhood radius in AIP N/2
λ Rate of decrease of learning rate and neighborhood size in
AIP
15000
α
0
Initial learning rate in AIP 0.25
ε
F2
F2 noise Randomly distributed
between -.1 and .1
GP Level of tonic inhibitory input to execution-related
populations
1.0
ψ
PM
Maximum initial weight for connections to premotor
populations
0.2
ε
F7
F7 noise Randomly distributed
between -.1 and .1
α
F7
F7 learning rate .001
ε
F5
F5 noise Randomly distributed
between -.1 and .1
α
F5
F5 learning rate .001
ε
WR
WR noise Randomly distributed
between -.1 and .1
α
WR
WR learning rate .001
DA
fail
Negative reinforcement -.001
DA
success
Positive reinforcement .01
κ Grasp enclose phase distance threshold .35
240
ILGA - Dynamic Neural Field Implementation
Each DNF contained a one-, two-, or three-dimensional network of leaky integrator
neurons with sigmoidal transfer functions. Each neuron consisted of a membrane potential
variable, u, and a firing rate variable, f. The membrane potential was computed by integrating
the weighted input over time with input from other neurons in the DNF:
@
DNF DNF
d
h
dt
τ = - + + + +
u
u IN f W ε
where u and f are the population membrane potentials and firing rates, h is the baseline
activation, IN is the weighted input, @ is the convolution operator, W
DNF
is the WTA weight
kernel, and ε
DNF
is a noise term. The firing rate, f, is then a sigmoid function of the membrane
potential:
( )
0
1
1
u
e
β - - =
+
u
f
where β and u
0
are parmeters of the sigmoid.
The weight kernel was set as a Gaussian that was negative except for a center peak,
implementing the WTA functionality. The kernel was set to be twice as large as the
population to ensure global competition. For one-dimensional DNFs, the weight kernel was
given by:
( )
( )
2
2
/ 2
2
DNF
i N
DNF excite inhibit
i w e w
σ
- - = - W
where w
excite
is the height of the peak of the Gaussian, w
inhibit
is the level of inhibition, N is the
size of the population, and σ
DNF
is the width of the Gaussian. Similarly, the two-dimensional
weight kernel was defined as:
241
( )
( ) ( )
( )
2 2
2
/ 2 / 2
2
,
DNF
i N j N
DNF excite inhibit
i j w e w
σ
- - + - = - W
and the three-dimensional DNF was given by:
( )
( ) ( ) ( )
( )
2 2 2
2
/ 2 / 2 / 2
2
, ,
DNF
i N j N k N
DNF excite inhibit
W i j k w e w
σ
- - + - + - = -
Every DNF in each modeled region used the same parameters (Table A-9), which were
determined empirically.
Table A-9 DNF parameters
Dimensions Parameter Description Value
τ time constant 75ms
H baseline activation -3
ε
DNF
noise vector Randomly
distributed [0, 3]
β sigmoid slope 1.5
u
0
sigmoid threshold 0
w
excite
self-excitation 2
w
inhibit
surround inhibition 1
1
σ
DNF
weight kernel
Gaussian width
2
τ time constant 150ms
h baseline activation -1
ε
DNF
noise vector Randomly
distributed [0, 1]
β sigmoid slope 4
u
0
sigmoid threshold 0
w
excite
self-excitation 1.4
w
inhibit
surround inhibition 1.25
2
σ
DNF
weight kernel
Gaussian width
3
τ time constant 150ms
h baseline activation -1
ε
DNF
noise vector Randomly
distributed [0, 1]
β sigmoid slope 4
u
0
sigmoid threshold 0
w
excite
self-excitation 0.9
w
inhibit
surround inhibition 1.25
3
σ
DNF
weight kernel
Gaussian width
2
242
ILGA - Population Decoding
The Primary Motor module decoded the activites of the premotor DNFs to obtain the
values of various parameters used to plan the reach and grasp movement. It was assumed that
each neuron in every DNF had a preferred stimulus value for each dimension ( ˆ x , ˆ y , ˆ z ). In
general the preferred values of each unit could be set arbitrarily, but we set them in a regular
fashion such that the population defines a grid in stimulus space (one-, two-, or three-
dimensional depending on the dimensionality of the DNF). The parameter values were
decoded from the activity of each DNF using the center-of-mass technique (Wu et al., 2002).
Since noise can greatly bias this form of decoding in small populations (unreported
simulations), we only include the activities of neurons that pass a threshold, ξ (set to 0.01 in
these simulations). For a one-dimensional DNF, the encoded value, x, was estimated as:
( ) ( ) ( )
( )
ˆ
i
i
i i
x
i
=
∑
∑
f x
f
where the sums are over all neurons i with an activation greater than or equal to the
threshold, ξ. Similarly, the encoded x, y value was estimated from a two-dimensional DNF
as:
( ) ( ) ( )
( )
( ) ( ) ( )
( )
ˆ ˆ , ,
,
, ,
i j j j
i j i j
i j i i j j
x y
i j i j
= =
∑ ∑ ∑ ∑
∑ ∑ ∑ ∑
f x f y
f f
and the x, y, z value encoded by a three-dimensional DNF as:
243
( ) ( ) ( )
( )
( ) ( ) ( )
( )
( ) ( ) ( )
( )
ˆ ˆ , , , ,
, y ,
, , , ,
ˆ , ,
, ,
i j k i j k
i j k i j k
i j k
i j k
i j k i i j k i
x
i j k i j k
i j k i
z
i j k
= =
=
∑ ∑ ∑ ∑ ∑ ∑
∑ ∑ ∑ ∑ ∑ ∑
∑ ∑ ∑
∑ ∑ ∑
f x f y
f f
f z
f
ILGA - Grasp Control
Given a grasp type and the maximal aperture, the grasp motor controller generated a
series of target joint angles for the finger and thumb PD controllers during the preshape and
enclose phases of the grasp. The preshape phase was triggered by the selection of a reach
target, which set the initial value of the preshape DMP to 0 and the target to 1, resulting in a
normalized time signal. This time signal was used to linearly interpolate between the current
finger joint angles and a set of angles for the selected virtual finger configuration that define
the preshape posture (Table A-10). The enclose phase was triggered by object contact, or
when the distance to the reach target reached a threshold, κ. Similar to the preshape phase, a
DMP generated a normalized time signal that was used to linearly interpolate between the
current finger joint angles and a set of angles defining the enclosure posture for the selected
grasp (Table A-10). The finger joints were rotated until they reached these angles or until the
finger segment contacted the object.
244
Table A-10 Target angles for the finger and thumb joints for the preshape and enclose phase
of side, precision, tripod, and power grasps.
Grasp Type Phase Digit Joint Target Angle
metacarpophalangeal π/2
proximal
interphalangeal
π/2
index finger
distal
interphalangeal
π/2
metacarpophalangeal π/2
proximal
interphalangeal
π/2
middle finger
distal
interphalangeal
π/2
metacarpophalangeal π/2
proximal
interphalangeal
π/2
ring finger
distal
interphalangeal
π/2
metacarpophalangeal π/2
proximal
interphalangeal
π/2
pinky finger
distal
interphalangeal
π/2
carpometacarpal [0 0]
preshape
thumb
metacarpophalangeal
( ) 1
18
maxAperture
π
-
carpometacarpal [2π/5 –2π/9]
Side
enclose thumb
metacarpophalangeal π/2
metacarpophalangeal
( ) 1
2
maxAperture
π
-
proximal
interphalangeal ( ) 1
2
maxAperture
π
-
index finger
distal
interphalangeal ( ) 1
2
maxAperture
π
-
carpometacarpal [π/2 0]
preshape
thumb
metacarpophalangeal 0
metacarpophalangeal π/4
proximal
interphalangeal
π/4
index finger
distal
interphalangeal
π/4
carpometacarpal [π/6 -π/9]
precision
enclose
thumb
metacarpophalangeal π/7
tripod preshape index finger metacarpophalangeal
( ) 1
2
maxAperture
π
-
245
Table A-10: Continued
proximal
interphalangeal ( ) 1
2
maxAperture
π
-
distal
interphalangeal ( ) 1
2
maxAperture
π
-
metacarpophalangeal
( ) 1
2
maxAperture
π
-
proximal
interphalangeal ( ) 1
2
maxAperture
π
-
middle finger
distal
interphalangeal ( ) 1
2
maxAperture
π
-
carpometacarpal [π/2 0]
thumb
metacarpophalangeal 0
metacarpophalangeal π/4
proximal
interphalangeal
π/4
index finger
distal
interphalangeal
π/4
metacarpophalangeal π/4
proximal
interphalangeal
π/4
middle finger
distal
interphalangeal
π/4
carpometacarpal [π/6 -π/9]
enclose
thumb
metacarpophalangeal π/7
metacarpophalangeal
( ) 1
2
maxAperture
π
-
proximal
interphalangeal ( ) 1
2
maxAperture
π
-
index finger
distal
interphalangeal ( ) 1
2
maxAperture
π
-
metacarpophalangeal
( ) 1
2
maxAperture
π
-
proximal
interphalangeal ( ) 1
2
maxAperture
π
-
middle finger
distal
interphalangeal ( ) 1
2
maxAperture
π
-
power preshape
ring finger metacarpophalangeal
( ) 1
2
maxAperture
π
-
246
Table A-10: Continued
proximal
interphalangeal ( ) 1
2
maxAperture
π
-
distal
interphalangeal ( ) 1
2
maxAperture
π
-
metacarpophalangeal
( ) 1
2
maxAperture
π
-
proximal
interphalangeal ( ) 1
2
maxAperture
π
-
pinky finger
distal
interphalangeal ( ) 1
2
maxAperture
π
-
carpometacarpal [π/2 0]
thumb
metacarpophalangeal 0
metacarpophalangeal π/4
proximal
interphalangeal
π/4
index finger
distal
interphalangeal
π/4
metacarpophalangeal π/4
proximal
interphalangeal
π/4
middle finger
distal
interphalangeal
π/4
metacarpophalangeal π/4
proximal
interphalangeal
π/4
ring finger
distal
interphalangeal
π/4
metacarpophalangeal π/4
proximal
interphalangeal
π/4
pinky finger
distal
interphalangeal
π/4
carpometacarpal [2π/5 –2π/9]
enclose
thumb
metacarpophalangeal π/2
ACQ - Simulation Details
To simulate Alstemark’s setup, the variables representing the world took the following
initial values (where visual space is 2 dimensional and bounded by 0 and V
max
in each
dimension):
247
( ) ( ) ( ) ( ) ( )
0 0 5 30
0 100, 0 , 0 , 0 , 0
0 30 30
max
h
V
= = = = =
p m b f
We arbitrarily chose V
max
=35, however only the relative distances are important for these
simulations. These variables are transformed into population codes encoding:
PF :distance between the paw and food
MF: distance between the mouth and food
BF: distance between the food and tube opening
PB: distance between the paw and tube opening
For each population code P, the activity of each element P
x,y
at time t is given by:
( )
( ) ( ) ( ) ( )
2 2
2 2
2 2
,
1 1
2 2
p p
x t x y t y
x y
p p
t e e
σ σ
σ π σ π
Δ - Δ -
- -
= P
for
2 2
max max
V V
x - < <
,
2 2
max max
V V
y - < <
, and ( ) ( ) , x t y t △ △
represent the current value of the
distance encoded that population.
The connection weights between these four populations and the parallel planning layer of
the Actor (W
PF
, W
MF
, W
BF
, and W
PB
) encode each action’s executability in the current state
of the world. For each action a, other than irrelevant actions, the executability e(a) at time t is
given by:
( ) ( ) ( )
( ) ( ) ( ) ( )
, ,
,
, ,
( , , ) , ,
( , )
, , , ,
x y PF x y MF
e
x y
x y BF x y PB
t x y a t x y a
e a t
t x y a t x y a
ε
+ +
= +
+
∑
PF W MF W
BF W PB W
where
e
ε
is the executability noise. The executability signal is then thresholded at 0 and 1.
The executability of each dummy action is always set to 0.99.
248
The connection weights between the Internal State schema and the Parallel Planning
Layer of the Actor (W
IS
) encode each action’s desirability given the internal state of the
organism (in these simulations the only internal state variable is hunger, but these equations
can be extended for N-dimensional internal states). For each action a, the desirability d(a) at
time t is the noise-corrupted product of hunger h and these weights:
( ) ( ) ( ) ,
IS d
d a t h t a ε = + W
where
d
ε
is the desirability noise.
The neurons in the Parallel Planning Layer combine executability and desirability to
compute priority. For each action a, the priority pr(a) at time t is given by:
( ) ( ) ( ) , , , pr a t e a t d a t =
In the full version of ACQ this signal is input into a winner-take-all (WTA) Competitive
Choice Layer for action selection. For computational efficiency in evaluating the role of the
Mirror System in motor program reorganization, we simply select the action with the highest
priority for execution. If two or more actions have the same maximal priority, the model is
run until one of them wins due to executability and desirability noise.
Given a selected action for execution, its effects are enforced if its preconditions are met:
249
Table A-11 Set of relevant actions with preconditions and effects.
Action Preconditions Effects
Eat Food in jaws
( ) ( ) 1 t t - < m f
Hunger reduced; positive reinforcement
( )
( )
1 0
1 1
d
h t
r t
+ ←
+ ←
Grasp-Jaws Food close to jaws
( ) ( ) .1 5 t t < - ≤ m f
Mouth moves to food
( ) ( ) 1 t t + ← f m
Bring to Mouth Food grasped by paw but not close to
mouth
( ) ( )
( ) ( )
0
5
t t
t t
- = ∧
- >
p f
m f
Bring paw close to mouth with food still
grasped by paw
( ) ( )
5
1
0
t t
+ ← +
p m
( ) ( ) 1 1 t t + ← + f p
Grasp-Paw Paw close to food
( ) ( ) 0 5 t t < - ≤ p f
Paw grasps food
( ) ( ) 1 t t + ← f p
Reach-Food Food in tube and paw aligned with or
within tube or food out of tube but not
close to paw
( ) ( )
( )
( ) ( ) ( ) ( ) ( )
5 100
0
y
x x t y
t t
t
t t t t
< - < ∧
= ∨
> ∧ ≥
p f
f
p b p b
Paw is moved close to food
( ) ( )
0
1
5
t t
+ ← +
p f
Reach-Tube Paw not near tube
( ) ( ) ( ) ( )
x x y y
t t t t < ∨ < p b p b
Move paw near end of tube
( ) ( )
3
1
5
t t
+ ← +
p b
if ( ) ( ) t t = f p
, then
( ) ( ) 1 1 t t + ← + f p
Rake Paw at a position both beyond and
higher than the food
( ) ( )
( ) ( )
( ) ( )
( )
0 5
1
x x
y y
x
t t
t t
t t
t
< - ≤ ∧
≥ ∧
> ∧
>
p f
p f
p f
f
Bring paw closer, with food coming with
the paw
( )
1
1
0
t
+ ←
f
( )
2
1
3
t
+ ←
p
250
Table A-11: Continued
Lower Neck Neck above lowest position
( ) 3
y
t > m
Bring neck to lowest position
( ) 1 3
y
t + ← m
if ( ) ( ) t t = f m
, then
( ) ( ) 1 1 t t + ← + f m
Raise Neck Neck below highest position
( )
y max
t V < m
Bring neck to highest position
( ) 1
y max
t V + ← m
if ( ) ( ) t t = f m
, then
( ) ( ) 1 1 t t + ← + f m
If the grasp motor schema is lesioned, its effects are changed so that it moves the food by a
random amount, with a mean displacement towards the animal:
( ) ( )
0
1
5
t t
+ ← +
p f
( ) ( ) ( ) 1 15,5
x x
t t rand + ← + - f f
if ( ) ( ) ( ) ( ) 1 1
y y x x
t t t t + = ∧ + < f b f b
, then ( ) 1 0
y
t + ← f
where rand(x, y) returns a normally distributed random number between x and y.
The input to the Mirror System schema is a pattern of perceptual inputs and changes in
those inputs. The changes in perceptual inputs are derived by comparing their current values
with a working memory trace of their values in the previous time step. This is because in the
current implementation, the action recognition schema does not examine environmental
variables throughout the course of an action, but how they change from action to action.
Since the input to the action recognition schema does not change during the course of an
251
action but at its completion, it was implemented with a feedforward neural network. The
inputs to the network are:
( ) h t
, ( ) ( ) 1 h t h t - - , ( )
y
t m
, ( ) ( ) 1
y y
t t - - m m
,
( ) ( ) t t - m f
,
( ) ( ) ( ) ( ) 1 1 t t t t - - - - - m f m f
,
( ) ( ) t t - p f
,
( ) ( ) ( ) ( ) 1 1 t t t t - - - - - p f p f
,
( ) ( ) t t - p b
,
( ) ( ) ( ) ( ) 1 1 t t t t - - - - - p b p b
, ( )
( ) ( ) 0 : 0
1 :
t t
pt t
otherwise
- >
=
p f
, and
( )
( ) ( ) 0 : 0
1 :
t t
mt t
otherwise
- >
=
m f
All inputs are normalized to be in the range [0, 1]. The network had 12 input units, 20 hidden
units, and 9 output units (one for each action). The hidden and output layers used a log-
sigmoidal activation function and the network was trained using Levenberg-Marquardt
backpropagation with a dynamic learning rate. Sample runs of the model were used to
generate training data with the motor output, x, serving as the training signal. The network
was trained for 5000 epochs, or until the performance gradient fell below 1.0×10
-10
.
The output of the Mirror System was used to generate the reinforcement signals for the
executability and desirability weights. Each executability weight matrix (W
PF
, W
MF
, W
BF
,
and W
PB
) was modified with a positive or negative reward signal depending on the success or
failure of the intended action. A comparison between the motor efference copy x and the
Mirror System output
ˆ x is used to determine whether or not an action was successful. For
each action a, the executability reinforcement r
e
(a) at time t is given by:
252
( )
( )
( ) ( )
ˆ 1 : , 0
ˆ , 1 : , 0 , .25
0 :
e
a t
r a t a t a t
otherwise
>
= - > ∧ <
x
x x
This means that the executability reinforcement will be positive if the action was recognized
by the Mirror System as successfully performed (whether or not it was intended), negative if
an action was attempted but not recognized by the Mirror System (indicating that it was
unsuccessful), and zero if the action was not attempted and not recognized. This
reinforcement is then used to update each executability weight matrix (W
PF
, W
MF
, W
BF
, and
W
PB
) as follows:
( ) ( ) ( ) , , 1
PF e
a t r a t t α = - W PF △
( ) ( ) ( ) , , 1
MF e
a t r a t t α = - W MF △
( ) ( ) ( ) , , 1
BF e
a t r a t t α = - W BF △
( ) ( ) ( ) , , 1
PB e
a t r a t t α = - W PB △
where α is the learning rate.
The output of an Adaptive Critic is used to update the weights, W
IS
, encoding action
desirability. The input to the critic is given by the desirability of the action recognized by the
Mirror System by connection weights. This represents the current prediction of that action’s
desirability, ( )
ˆ
d t
. The error between this prediction and the discounted desirability estimate
of the next action, ( )
ˆ
1 d t +
, and primary reinforcement, r
d
(t), is the temporal difference
error, or effective desirability reinforcement ( )
ˆ
d
r t
:
253
( ) ( ) ( ) ( )
ˆ ˆ
ˆ 1
d d
r t r t d t d t γ = + + -
The effective reinforcement is used to update the weights encoding the desirability of any
actions recognized by the Mirror System:
( ) ( ) ( ) ( )
ˆ ˆ 1 1
IS d
t h t r t t α Δ = - - W x
The weight update is scaled by the level of hunger when the action was executed (i.e., the
desirability of the Eating action is more increased when satiating hunger than when eating
while already full).
254
Synthetic Brain Imaging - Imitation Model Implementation
Table A-12 Imitation model summary
Populations
Three: Visual Input (VI), Input Praxicon (IP, two layers), Output Praxicon (OP,
two layers)
Topology
Grid of uniformly distributed values representing different gestures
Connectivity
One-to-one topological connections between regions, center-surround excitation /
inhibition within IP and OP
Neuron model
Izhikevich spiking neural model
Channel
models
-
Synapse model
Conductance-based kinetic model (AMPA, GABA
A
, GABA
B
), additional
instantaneous sigmoidal voltage dependence (NMDA)
Plasticity
-
Input
All populations: spontaneous Poisson spike trains. VI: evidence for each gesture
(in a normalized range).
Measurements
Membrane potential, hemodynamic response
Table A-13 Imitation model populations
Name Elements Size
VI Modulated Poisson CN
IP
I
Iz neuron N
IP
E
Iz neuron N
OP
I
Iz neuron N
OP
E
Iz neuron N
255
Table A-14 Imitation model connectivity
Name Source Target Target
synapses
Pattern
VI-IP VI IP
E
AMPA, NMDA Non-overlapping C→1, weight w
V
, no
delay
IP-OP IP
E
OP
E
AMPA, NMDA Topological, 1→1, weight w
O
, no delay
IP-EE IP
E
IP
E
AMPA, NMDA Divergent weights, Gaussian kernel
( )
( )
2
2
2
,
WE
i j
IE
w i j w e
σ
- - = , no delay
IP-EI IP
E
IP
I
AMPA, NMDA Divergent weights, Gaussian kernel
( )
( )
2
2
2
,
WE
i j
IEI
w i j w e
σ
- - = , no delay
IP-IE IP
I
IP
E
GABA
A
, GABA
B
Divergent weights, inverted Gaussian
kernel ( )
( )
2
2
,
WI
i j
II II
w i j w w e
σ
- - = - , no
delay
OP-EE OP
E
OP
E
AMPA, NMDA Divergent weights, Gaussian kernel
( )
( )
2
2
2
,
WE
i j
OE
w i j w e
σ
- - = , no delay
OP-EI OP
E
OP
I
AMPA, NMDA Divergent weights, Gaussian kernel
( )
( )
2
2
2
,
WE
i j
OEI
w i j w e
σ
- - = , no delay
OP-IE OP
I
OP
E
GABA
A
, GABA
B
Divergent weights, inverted Gaussian
kernel ( )
( )
2
2
,
WI
i j
OI OI
w i j w w e
σ
- - = - , no
delay
256
Table A-15 Imitation model neural implementation
Name
Iz neuron
Type
Izhikevich spiking neuron, conductance-based synapses
Subthreshold dynamics
( ) ( ) ( ) ( ) ( ) ( ) ( )
1
r t syn ext
dv
k v t v v t v u t I t I t
dt Cm
= - - - - +
( ) ( ) ( )
r
du
a b v t v u t
dt
= - -
( ) ( ) ( )
syn j j j
j
I g v t E s t = - ∑
Spiking
If ( )
peak
v t v ≥
1. emit spike with time-stamp t, ( ) 1 z t ←
2. ( ) v t c ←
3. ( ) ( ) u t u t d ← +
Table A-16 Imitation model synapse implementation
Synapse Type Implementation
AMPA, GABA
A
, GABA
B
( ) ( ) ( ) , ,
j
ds
s t w i j z i t
dt
τ = - +
∑
NMDA
( ) ( ) ( ) ( )
1
decay
NMDA NMDA
ds
s t x t s t
dt
τ α = - + -
( ) ( ) ( ) , ,
rise
NMDA
j
dx
x t w i j z i t
dt
τ = - +
∑
( )
1
2
16.13
1
3.57
v t
mV
NMDA NMDA
CMg
g g e
mM
- - +
= +
Table A-17 Imitation model input
Type Description
VI input Normalized values representing the evidence for each gesture. These modulate the
frequency at which each Poisson generator spikes
Synaptic noise Poisson input to each existing AMPA, NMDA, GABA
A
, GABA
B
synapse at
frequency I
noise
.
257
Table A-18 Imitation model parameters
Parameter Description Value Justification
C Number of Poisson generators per neuron 50 Set empirically
N Number of neurons in each population 100 Large enough to capture
behavior of the system, but
small enough to allow
efficient simulation
w
V
Connection weight between VI and IP 100 Set empirically
w
O
Connection weight between IP and OP 50 Set empirically
w
IE
Excitatory kernel weight in IP 50 Set empirically
WE
σ
Width of excitatory kernel 3 Set empirically
w
IEI
Connection weight from pyramidal cells to
interneurons in IP
100 Set empirically
w
II
Inhibitory kernel weight in IP 25 Set empirically
WI
σ
Width of inhibitory kernel 6 Set empirically
w
OE
Excitatory kernel weight in OP 50 Set empirically
w
OEI
Connection weight from pyramidal cells to
interneurons in OP
100 Set empirically
w
OI
Inhibitory kernel weight in OP 25 Set empirically
I
noise
Frequency of noise input 5Hz Set empirically
Synthetic Brain Imaging - RDMDD Model Implementation
Table A-19 RDMDD model summary
Populations
Six: MT, LIP (two layers), FEF (two layers), PFC
Topology
Grid of angles in visual-space coordinates
Connectivity
One-to-one topological connections between regions, center-surround excitation /
inhibition within LIP and FEF
Neuron model
Izhikevich spiking neural model
Channel
models
-
Synapse model
Conductance-based kinetic model (AMPA, GABA
A
, GABA
B
), additional
instantaneous sigmoidal voltage dependence (NMDA)
Plasticity
-
Input
All populations: Biphasic current input (mircostimulation simulations),
spontaneous Poisson spike trains. MT: direction of motion of each dot in visual
field, stimulus coherence.
Measurements
Membrane potential, response time, accuracy, hemodynamic response
258
Table A-20 RDMDD model populations
Name Elements Size
MT Modulated Poisson N
LIP
I
Iz neuron N
LIP
E
Iz neuron N
FEF
I
Iz neuron N
FEF
E
Iz neuron N
PFC Poisson generator CN
Table A-21 RDMDD model connectivity
Name Source Target Target
synapses
Pattern
MT-LIP MT LIP
E
AMPA, NMDA Topological, 1→1, weight w
M
, no delay
LIP-FEF LIP
E
FEF
E
AMPA, NMDA Topological, 1→1, weight w
L
, no delay
PFC-LIP PFC LIP
E
AMPA, NMDA Non-overlapping C→1, weight w
P
, no
delay
LIP-EE LIP
E
LIP
E
AMPA, NMDA Divergent weights, Gaussian kernel
( )
( )
2
2
2
,
WE
i j
LE
w i j w e
σ
- - = , no delay
LIP-EI LIP
E
LIP
I
AMPA, NMDA Divergent weights, Gaussian kernel
( )
( )
2
2
2
,
WE
i j
LEI
w i j w e
σ
- - = , no delay
LIP-IE LIP
I
LIP
E
GABA
A
, GABA
B
Divergent weights, inverted Gaussian
kernel ( )
( )
2
2
2
,
WI
i j
LI LI
w i j w w e
σ
- - = - , no
delay
FEF-EE FEF
E
FEF
E
AMPA, NMDA Divergent weights, Gaussian kernel
( )
( )
2
2
2
,
WE
i j
FE
w i j w e
σ
- - = , no delay
FEF-EI FEF
E
FEF
I
AMPA, NMDA Divergent weights, Gaussian kernel
( )
( )
2
2
2
,
WE
i j
FEI
w i j w e
σ
- - = , no delay
FEF-IE FEF
I
FEF
E
GABA
A
, GABA
B
Divergent weights, inverted Gaussian
kernel ( )
( )
2
2
2
,
WI
i j
FI FI
w i j w w e
σ
- - = - , no
delay
259
Table A-22 RDMDD model neural implementation
Name
Iz neuron
Type
Izhikevich spiking neuron, conductance-based synapses
Subthreshold dynamics
( ) ( ) ( ) ( ) ( ) ( ) ( )
1
r t syn ext
dv
k v t v v t v u t I t I t
dt Cm
= - - - - +
( ) ( ) ( )
r
du
a b v t v u t
dt
= - -
( ) ( ) ( )
syn j j j
j
I g v t E s t = - ∑
Spiking
If ( )
peak
v t v ≥
4. emit spike with time-stamp t, ( ) 1 z t ←
5. ( ) v t c ←
6. ( ) ( ) u t u t d ← +
Table A-23 RDMDD model synapse implementation
Synapse Type Implementation
AMPA, GABA
A
, GABA
B
( ) ( ) ( ) , ,
j
ds
s t w i j z i t
dt
τ = - +
∑
NMDA
( ) ( ) ( ) ( )
1
decay
NMDA NMDA
ds
s t x t s t
dt
τ α = - + -
( ) ( ) ( ) , ,
rise
NMDA
j
dx
x t w i j z i t
dt
τ = - +
∑
( )
1
2
16.13
1
3.57
v t
mV
NMDA NMDA
CMg
g g e
mM
- - +
= +
Table A-24 RDMDD model input
Type Description
Stimulation input Biphasic current applied to pyramidal neurons (10uA biphasic current pulses of
300uA at a rate of 200Hz) with a Gaussian spread.
MT input Stimulus coherence, direction of motion of each dot [-π, π], generated every 10ms
from a uniform distribution with a certain percentage (defined by the stimulus
coherence) moving either to the left (-π/2) or right (π/2) depending on the trial.
Synaptic noise Poisson input at 5Hz to each existing AMPA, NMDA, GABA
A
, GABA
B
synapse.
260
Table A-25 RDMDD model parameters
Parameter Description Value Justification
C Number of Poisson generators per
neuron
17 Determined by genetic algorithm
N Number of neurons in each population 100 Large enough to capture behavior
of the system, but small enough to
allow efficient simulation
w
M
Connection weight between MT and
LIP
40.094 Determined by genetic algorithm
w
L
Connection weight between LIP and
FEF
92.063 Determined by genetic algorithm
w
P
Connection weight between PFC and
LIP
90.2812 Determined by genetic algorithm
W
LE
Excitatory kernel weight in LIP 1.002 Determined by genetic algorithm
WE
σ
Width of excitatory kernel 5 Set empirically
w
LEI
Connection weight from pyramidal
cells to interneurons in LIP
21.451 Determined by genetic algorithm
w
LI
Inhibitory kernel weight in LIP 0.684 Determined by genetic algorithm
WI
σ
Width of inhibitory kernel 10 Set empirically
w
FE
Excitatory kernel weight in FEF 0.378 Determined by genetic algorithm
w
FEI
Connection weight from pyramidal
cells to interneurons in FEF
13.137 Determined by genetic algorithm
w
FI
Inhibitory kernel weight in FEF 0.548 Determined by genetic algorithm
I
noise
Frequency of noise input 5Hz Set empirically
261
Synthetic Brain Imaging – Neural Model
Table A-26 Neural model parameters
Parameter Description Value Justification
C
m
Membrane capacitance Pyramidal cells:
100pF, inhibitory
interneurons: randomly
distributed between 50
and 100pF
Makes pyramidal cells behave as
regular spiking or chattering cells;
makes interneurons behave as low
threshold spiking or fast spiking cells
(Izhikevich, 2003)
v
r
Resting potential Pyramidal cells: -
60mV, inhibitory
interneurons: randomly
distributed between -
60mV and -56mV
Makes pyramidal cells behave as
regular spiking or chattering cells;
makes interneurons behave as low
threshold spiking or fast spiking cells
(Izhikevich, 2003)
v
t
Instantaneous threshold
potential
Pyramidal cells: -
40mV, inhibitory
interneurons: randomly
distributed between -42
and -40mV
Makes pyramidal cells behave as
regular spiking or chattering cells;
makes interneurons behave as low
threshold spiking or fast spiking cells
(Izhikevich, 2003)
v
peak
Peak membrane
potential during a spike
Pyramidal cells:
randomly distributed
between 25 and 30mV,
inhibitory interneurons:
randomly distributed
between 25 and 40mV
Makes pyramidal cells behave as
regular spiking or chattering cells;
makes interneurons behave as low
threshold spiking or fast spiking cells
(Izhikevich, 2003)
g
AMPA
Maximal AMPA
conductance
2.08nS
g
NMDA
Maximal NMDA
conductance
1.64nS
g
GABAA
Maximal GABA
A
conductance
0.49nS
g
GABAB
Maximal GABA
B
conductance
0.32nS
E
AMPA
AMPA current reversal
potential
0mV
E
NMDA
NMDA current
reversal potential
0mV
E
GABAA
GABA
A
current
reversal potential
-70mV
E
GABAB
GABA
B
current
reversal potential
-90mV
2
CMg
+
Extracellular
magnesium
concentration
1.5M
rise
NMDA
τ
NMDA synapse rise
time
2ms (Hestrin S et al., 1990; Spruston N et
al., 1995)
decay
NMDA
τ
NMDA synapse decay
time
100ms (Hestrin S et al., 1990; Spruston N et
al., 1995)
262
Table A-26: Continued
AMPA
τ
AMPA synapse decay
time
2ms (Hestrin S et al., 1990; Spruston N et
al., 1995)
GABAA
τ
GABA
A
synapse decay
time
8ms (Salin PA and DA Prince, 1996)
GABAB
τ
GABA
B
synapse decay
time
260.9ms (Mott DD et al., 1999)
NMDA
α
0.5ms
-1
A Izhikevich neural
parameter
Pyramidal cells: 0.3,
inhibitory interneurons:
randomly distributed
between 0.3 and 2
Makes pyramidal cells behave as
regular spiking or chattering cells;
makes interneurons behave as low
threshold spiking or fast spiking cells
(Izhikevich, 2003)
B Izhikevich neural
parameter
Pyramidal cells: -2,
inhibitory interneurons:
randomly distributed
between 1 and 8
Makes pyramidal cells behave as
regular spiking or chattering cells;
makes interneurons behave as low
threshold spiking or fast spiking cells
(Izhikevich, 2003)
C Izhikevich neural
parameter
Pyramidal cells:
randomly distributed
between -50 and -40,
inhibitory interneurons:
randomly distributed
between -53 and -40
Makes pyramidal cells behave as
regular spiking or chattering cells;
makes interneurons behave as low
threshold spiking or fast spiking cells
(Izhikevich, 2003)
D Izhikevich neural
parameter
Pyramidal cells:
randomly distributed
between 100 and 150,
inhibitory interneurons:
randomly distributed
between 20 and 150
Makes pyramidal cells behave as
regular spiking or chattering cells;
makes interneurons behave as low
threshold spiking or fast spiking cells
(Izhikevich, 2003)
K Izhikevich neural
parameter
Pyramidal cells:
randomly distributed
between 0.7 and 1.5,
inhibitory interneurons:
randomly distributed
between 1 and 1.5
Makes pyramidal cells behave as
regular spiking or chattering cells;
makes interneurons behave as low
threshold spiking or fast spiking cells
(Izhikevich, 2003)
Synthetic Brain Imaging - Firing Rate Estimation
A linear filter with a causal kernel is used to estimate the firing rate of each neuron
(Dayan et al., 2001) in order to decode the value represented by the population. The kernel is
an alpha function:
263
( )
2
e
ατ
ω
ω τ α τ
- +
=
The spike train history is convolved with this function to estimate the firing rate:
( ) ( ) ( )
1
ˆ
n
i
i
r t d t t τω τ δ τ
∞
-∞
=
= - -
∑
∫
where n is the total number of spikes fired and t
i
is the time of each spike for i=1,...,n. The
temporal resolution of the estimate depends on the value
1
ω
α
, which was 100ms in these
simulations.
Synthetic Brain Imaging - Genetic Algorithm
A genetic algorithm was used to set network parameters in the random dot motion
discrimination model. The model parameters were encoded as binary strengths of length 14,
which were concatenated into a genotype of length 98. The population was initialized with
20 random genotypes and evolved for 1,000 generations. The fitness of each genotype in the
population was evaluated on every iteration. The genotypes were ordered according to their
fitness, and the top 30% were selected for the mating. To produce the next generation,
random genotypes from the mating pool were selected and their genotypes were recombined
according to probability proportional to their fitness. Genotype recombination was performed
by selecting a random locus in the genotype and generating two offspring by splitting each
parent genotype at this locus and recombining the two. Each gene was mutated with a small
probability that was inversely proportional to the variance in the population genotypes. This
ensured that the mutation rate increased when the population became too homogeneous. The
mutation probability was computed by:
264
max mutate y
y
p χ σ σ
= -
∑
where the parameter χ was set to 0.1, the sum is over each gene, y, σ
max
is the maximum
standard deviation of a gene in the population, and σ
y
is the standard deviation of gene y in
the population.
We used an elitist genetic algorithm, meaning that a genotype with high fitness can
persist in the population and after reproduction the offspring’s fitness is evaluated and only
the 20 fittest genotypes remain in the population. To evaluate a genotype’s fitness, we
decoded it and initialized a network with its parameters. The network was run on the random
dot motion discrimination task with various levels of stimulus coherence (0, 0.032, 0.064,
0.128, 0.256, and 0.512) for 50 trials at each level. The fitness was computed by comparing
the behavior of the model with neurophysiological data from LIP and FEF and
psychophysical data from human performance of the task. Separate fitness functions were
computed for LIP, FEF, and behavioral performance and averaged together to get a global
fitness value.
During the same task, neurons in LIP encoding the selected saccade direction reach a
peak firing rate of 70Hz for saccade initiation (Gold and Shadlen, 2000). In other responses
time tasks, neurons in FEF achieve a threshold level of activity of 100Hz in anticipation of a
saccade (Hanes and Schall, 1996; Ratcliff and Rouder, 1998). For each trial, T, the optimal
firing rate for each LIP neuron, i, was computed as a Gaussian with height 70, centered on
the maximally firing LIP neuron, x:
( )
( )
2
2
2
, 70
T
x i
LIP
O i T e
σ
- - =
265
The width, σ
T
was set to π/12. The fitness function was evaluated for each neuron during
each trial at time rt, where rt is the time the firing rate of any neuron in LIP reaches its
maximum during that trial. The fitness of each neuron on trial T was given by the Gaussian
over the difference between its firing rate at time rt and its optimal firing rate:
( )
( ) ( ) ( )
2
2
ˆ , , ,
2
,
LIP
f
r i rt T O i T
LIP
f i T e
σ
- - =
where σ
j
was set to 50. LIP fitness, F
LIP
, was then computed as the mean of f
LIP
over all
neurons and trials.
Similarly, the optimal firing rate of each FEF neuron, i, was a Gaussian with height 100.
Since the firing of LIP neurons reflects the subject’s decision (Shadlen and Newsome, 1996,
2001) and FEF microstimulation evokes saccades (Gold and Shadlen, 2000, 2003) we wanted
LIP and FEF to converge on the same value. We therefore also center the Gaussian
calculating the optimal FEF firing rate, O
FEF
, on the maximally firing LIP neuron, i:
( )
( )
2
2
2
, 100
T
x i
FEF
O i T e
σ
- - =
The fitness function for FEF was evaluated at the same time, rt, as the LIP fitness function
and was defined as:
( )
( ) ( ) ( )
2
ˆ , , ,
2
,
FEF
f
r i rt T O i T
FEF
f i T e
σ
- - =
FEF fitness, F
FEF
, was then the mean of f
FEF
over all neurons and trials.
The behavioral performance time fitness functions, F
beh
, was computed by fitting the
accuracy and response times of the model over all tested stimulus coherence levels to the
psychometric and chronometric functions used by Palmer, Huk & Shadlen (2005) for
266
analyzing human behavioral data. The psychometric function for accuracy gives the
proportion of correct responses for a given stimulus strength x:
( )
2
1
1
C A kx
P x
e
′ - =
+
where the free parameters A’ and k are the normalized bound and sensitivity, respectively.
The chronometric function for mean response time is:
( ) ( ) tanh
R
A
RT x A kx t
kx
′
′ = +
where the free parameter t
R
is the mean residual time. To fit the data to these functions, we
used the same maximum likelihood estimation method used by Palmer, Huk & Shadlen
(2005), with the log likelihood of the fit providing the value for F
beh
. The global fitness of a
genotype was then given by:
( )
1
3
LIP FEF beh
F F F F = + +
Abstract (if available)
Abstract
Both premotor and parietal cortex of the macaque brain contain mirror neurons each of which fires vigorously both when the monkey executes a certain limited set of actions and when the monkey observes some other perform a similar action. Turning to the human, we must rely on brain imaging rather than single-neuron recording. The goals of this thesis are to a) develop biologically plausible models of the mirror system and its interactions with other brain regions in grasp observation and execution, b) suggest a new role for the mirror system in self-observation and feedback-based learning, and c) present an extension of synthetic brain imaging that allows computational models to address monkey and human data.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Learning lists and gestural signs: dyadic brain models of non-human primates
PDF
Experience modulates neural activity during action understanding: exploring sensorimotor and social cognitive interactions
PDF
The symbolic working memory system
PDF
Engagement of the action observation network through functional magnetic resonance imaging with implications for stroke rehabilitation
PDF
Structural and functional neural correlates of developmental dyspraxia in the mirror neuron system
PDF
Schema architecture for language-vision interactions: a computational cognitive neuroscience model of language use
PDF
Spatiotemporal processing of saliency signals in the primate: a behavioral and neurophysiological investigation
PDF
Nonlinear modeling of causal interrelationships in neuronal ensembles: an application to the rat hippocampus
PDF
Toward understanding speech planning by observing its execution—representations, modeling and analysis
PDF
Face recognition and 3D face modeling from images in the wild
PDF
Functional models of fMRI BOLD signal in the visual cortex
Asset Metadata
Creator
Bonaiuto, James J.
(author)
Core Title
Modeling the mirror system in action observation and execution
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Neuroscience
Publication Date
08/05/2010
Defense Date
04/28/2010
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
action recognition,computational modeling,mirror system,motor control,OAI-PMH Harvest,reaching and grasping,reinforcement learning,synthetic brain imaging
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Arbib, Michael A. (
committee chair
), Bradley, Nina (
committee member
), Delgado, Roberto A. (
committee member
), Itti, Laurent (
committee member
)
Creator Email
bonaiuto@caltech.edu,bonaiuto@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m3288
Unique identifier
UC1285665
Identifier
etd-Bonaiuto-3517 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-375965 (legacy record id),usctheses-m3288 (legacy record id)
Legacy Identifier
etd-Bonaiuto-3517.pdf
Dmrecord
375965
Document Type
Dissertation
Rights
Bonaiuto, James J.
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
action recognition
computational modeling
mirror system
motor control
reaching and grasping
reinforcement learning
synthetic brain imaging