Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Learning lists and gestural signs: dyadic brain models of non-human primates
(USC Thesis Other)
Learning lists and gestural signs: dyadic brain models of non-human primates
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Learning lists and gestural signs: dyadic brain models of non-human primates
Brad Gasser
Faculty of the USC Graduate School
University of Southern California
Submitted for the degree of Doctor of Philosophy (Neuroscience)
August 2016
ii
Acknowledgements
I would like to firstly thank my advisor, Dr. Michael Arbib, for providing the
broad and rigorous intellectual environment that made this research possible. There exist
few research labs where discussions – let alone research topics – are as varied and
challenging as the one cultivated by Dr. Arbib. He also provided the opportunity to meet
and collaborate with numerous researchers from around the world, through conference
attendance and workshops arranged by our USC group, and for that I am thankful.
Ultimately, the technical and analytic skills I have developed through the Ph.D. process
find their root in his thought, and I am wiser for that.
The lab in the Hedco basement, room 10A, has seen many colleagues pass
through during my time: Dr. Jimmy Bonaiuto, Dr. Jinyong Lee, Varsha Ganesh, Madhuri
Harway, Arthur Simmons, Robert Schuler and those of the Arbib lab even before myself
that set a strong research tradition I have hoped to be able to continue. My labmate for
nearly six years, Victor Barres, deserves particular mention, as equal parts mentor (in all
things technical especially) and as a truly great friend. I cannot understate the value of
his friendship and his tireless efforts to assist myself when challenges arose – both in and
out of lab. Many thanks for helping see me through, and apologies for missing your own
defense!
The friendships I have made at USC have been numerous and without those
connections I would not be able to look back at graduate school with the same sense of
fondness I imagine I now always will. From the weekend roadtrips to the desert (how
many, I have long lost count), to the rock-climbing throughout southern California, I
grew as a person as much outside of lab as within. To Simren, Kingson, Jillian, Leigh,
Caty, Vilay, Shane, Mona, Madeline, Natalie, Anna, Raina, Lei, Amar, Radhika, Hanke,
Tor, Panthea, Arvind, Dave, Amit, Jenny and many others, you helped make my time at
USC among the most memorable of my life. Shout-out to my great friend Meaghan in
Tucson, by way of the University of Maryland where I spent the summer of 2008, whose
love of the outdoors and photography I shameless stole as my own!
To all those friends I left behind in Ottoville and throughout Ohio, I may have
spoke with you less than I had wanted, but the times we had together served as an infinite
iii
source of stories to share with my friends in California who couldn’t imagine life in rural
Ohio. To Chase, Rob, Josh, Dave, Eric, Ike, Jarrod, Brett…I have never laughed as much
as hanging with ya’ll! To Wurst, my main man, you set an example of how to excel
without losing any of that spontaneity. And for my time at Bowling Green State
University, I was able to grow a love of science and philosophy without having to move
far from home.
To my family, Jan, Sonny, Ryan and Andy, sister-in-law Jennifer and nieces
Lydia, Charlotte and Evelyn, I’ll keep it short: I love you all and your phones calls, texts
and photo albums in the mail meant more to me than I ever cared to admit.
iv
Table
of
Contents
Acknowledgements
ii
1:
Introduction
................................................................................................................
1
2:
Modeling
sequence
learning,
representation,
and
production
.....................................
5
2.1
THE
SIMULTANEOUS
CHAINING
PARADIGM
.................................................................................
9
2.2
MODEL
DESIGN
....................................................................................................................
19
2.3
METHODS
AND
SIMULATION
RESULTS
......................................................................................
34
2.4
DISCUSSION
.........................................................................................................................
49
3:
Modeling
observational
learning
...............................................................................
59
3.1
THE
SIMULTANEOUS
CHAINING
PARADIGM
...............................................................................
63
3.2
MODEL
DESIGN
....................................................................................................................
69
3.3
METHODS
AND
SIMULATION
RESULTS
......................................................................................
73
3.4
DISCUSSION
.........................................................................................................................
76
4:
Dyadic
brain
modeling
of
ape
gestural
learning
.........................................................
80
4.1
SOCIAL
LEARNING
INFLUENCES
ON
GESTURAL
USAGE
IN
APES
........................................................
81
4.2
MODEL
DESIGN
....................................................................................................................
83
4.3
METHODS
AND
SIMULATION
RESULTS
.......................................................................................
87
4.4
DISCUSSION
.........................................................................................................................
94
5:
Conclusion:
a
computational
approach
to
neuro-‐primatology
....................................
99
6:
Supplementary
material
for
Chapters
2
and
3
..........................................................
100
6.1
MODEL
DESIGN
..................................................................................................................
100
6.2
METHODS
AND
SIMULATION
RESULTS
.....................................................................................
107
6.3
DISCUSSION
.......................................................................................................................
110
References
...........................................................................................................................
114
v
List
of
Figures
2.1:
Competitive
decision
processes
in
CQ
and
ACQ
models
................................................
6
2.2:
The
Simultaneous
Chaining
Paradigm
(SCP)
...............................................................
11
2.3:
Results
from
derived
list
manipulations
.....................................................................
16
2.4:
High-‐level
model
schematic
.......................................................................................
20
2.5:
Event-‐level
evolution
of
decision-‐related
representations
..........................................
25
2.6:
Competition
network
.................................................................................................
29
2.7:
Parameter
values
......................................................................................................
33
2.8:
Shaping
structure
and
list
manipulations
...................................................................
34
2.9:
Model
behavioral
results
...........................................................................................
36
2.10:
Temporal
memory
layer
retrieval
of
biases
during
list
manipulations
........................
39
2.11:
Stabilization
in
temporal
memory
layer
across
training
.............................................
43
2.12:
Visuo-‐spatial
features-‐to-‐temporal
memory
layer
weight
changes
over
simulation
...
44
2.13:
Sample
activation
profiles
during
a
correct
trial
.........................................................
45
2.14:
Priority
map
activation
over
course
of
trial
................................................................
48
2.15:
Time-‐course
of
activations
in
behavioral
priority
map
................................................
49
2.16:
Rank
order
sensitivity
in
area
5
during
sequencing
of
manual
actions
........................
52
2.17:
Affordance
Competition
Hypothesis
...........................................................................
56
3.1:
Modeling
a
‘spectrum’
of
social
learning
...................................................................
63
3.2:
The
Simultaneous
Chaining
Paradigm
(SCP)
...............................................................
65
3.3:
Observational
learning
effects
from
the
SCP
..............................................................
68
3.4:
High-‐level
model
schematic
.......................................................................................
70
3.5:
Dyadic
brain
simulation
method
for
observational
learning
condition
.......................
71
vi
3.6:
Model
behavioral
results
...........................................................................................
74
3.7:
List-‐naïve
versus
task-‐naïve
models
during
observation
.............................................
75
4.1:
Dyadic
brain
modeling
of
apes
...................................................................................
81
4.2:
Model
of
gestural
learning,
representation,
production,
and
comprehension
............
87
4.3:
Parametric
variation
leads
to
varied
gestural
learning
..............................................
88
4.4:
Dyadic
interaction
across
time
...................................................................................
89
4.5:
RNN
and
integrator
activation
in
mother
model
........................................................
91
6.1:
Chunk
buffer
and
learning
to
plan
sequential
actions
..............................................
103
6.2:
Chunk
hierarchy
in
the
service
of
a
7-‐item-‐long
list
..................................................
104
6.3:
Supplementary
model
extension
for
action
recognition
system
................................
106
6.4:
Short-‐term
buffer
for
individual
and
observational
learning
....................................
107
6.5:
Timing
results
between
‘default
model’
and
the
‘chunk-‐based’
model
.....................
108
6.6:
Simulated
neural
responses
between
the
‘default
model’
and
the
‘chunk-‐based’
model
............................................................................................................................................
109
6.7:
Planning
of
multi-‐movement
sequences
..................................................................
111
6.8:
Affordance
Competition
Hypothesis
........................................................................
112
1
Chapter 1: Introduction
Neuro-computational models have been important in characterizing the
distributed mechanisms involved in a variety of tasks, from motor control of reaching
(Bullock & Grossberg, 1988; Fagg & Arbib, 1998), grasping (Bonaiuto & Arbib, 2015)
and saccadic behavior (Dominey & Arbib, 1992; Silver, Grossberg, Bullock, Histed, &
Miller, 2012), to cognitive control (O'Reilly & Frank, 2006; Rougier, Noelle, Braver,
Cohen, & O'Reilly, 2005) and action recognition (Bonaiuto, Rosta, & Arbib, 2007;
Chersi, Ferrari, & Fogassi, 2011; Oztop & Arbib, 2002). However, the hallmark of
primate behavior and brain evolution – social behavior and cognition (Byrne & Whiten,
1988) – have largely been ignored by computational modelers. We will consider both
structured experimental tasks involving macaques, and ethological behavioral data
involving apes, that will inform our research efforts dedicated to computational analysis
and simulation of social cognitive mechanisms in non-human primates. Our focus on
non-human primates will allow us to address the important comparative neuro-
primatological data that will be needed to fully understand and contextualize human
cognitive and linguistic skill, by placing the key data and questions in a comparative,
evolutionary, and computational framework. In particular, we wish to characterize
distributed systems in the macaque brain that can be used to explain the learning and
representation of complex lists of items, and to understand the mechanisms coordinating
learning from observation, and then to use this monkey-level ‘scaffolding’ to understand
social learning, interaction, and communication in apes, specifically by addressing
learning processes involved in the acquisition, production, and recognition of manual
gestural signs. By simulating, simultaneously, neural-network-implemented agents
engaged in interaction – dyadic brain modeling – we wish to elucidate those neural
mechanisms, likely conserved in humans, that help coordinate primate social life.
We have two main efforts to describe here: the modeling, via computer
simulation, of (1) brain mechanisms of serial learning in monkeys and their transfer
through observation of others’ performances, and (2) brain mechanisms of social learning
and interaction that support the development of gestural repertoires in apes. The first
main effort is focused on analyzing previously published neurophysiological (Averbeck,
Chafee, Crowe, & Georgopoulos, 2002; Balan & Gottlieb, 2009; Berdyyeva & Olson,
2
2010; Cisek & Kalaska, 2002) and behavioral (Swartz, Chen, & Terrace, 1991; Terrace,
Son, & Brannon, 2003) data in monkeys to construct novel computational models to
explain these data, challenge existing interpretations, and generate testable hypotheses.
The models simulate neural systems that process visual information from touch-screen
monitors (as based on the literature), generate abstractions from these inputs and generate
internal temporal signals to manage behavioral organization, and drive motor outputs that
correspond to the monkey’s selections. We show how monkeys may learn sequences of
items – say, selecting the pictures ‘A-B-C-D’ – through trial-and-error mechanisms that
manage multiple, concurrent learning processes, including of temporal order and reward-
predictive value.
The model can yield behavioral patterns – learning rates, error distributions, etc. –
that qualitatively match those in the literature, while simulated neural responses predict
response profiles for a variety of cortical and sub-cortical areas in macaques. Further data
from the literature suggest how learning a list may be aided by observing another monkey
generate that list on a touch-screen monitor. These data are intriguing since the capability
to learn socially, while not unique to primates, is an important hallmark of primate
cognition. A crucial innovation in our work involves going beyond simulating the
mechanisms coordinating an individual’s actions or decisions, by having multiple
simulated agents able to learn from (and later we show, to also interact with) others’
behavior. Specifically, visual inputs and motor outputs of the teacher monkey are
simulated, with the view of the teacher passed to the observer, who must process the
selection and, crucially, the feedback (was the selection right or wrong?) in such a way
that in a subsequent session, the prior observation may facilitate the observer’s
performance. However, our model suggests that prior experience – having some level of
‘task expertise’ – is required in order to observe these observational learning effects, and
that ‘task naïve’ monkeys would not be facilitated by observation in this way. Thus our
model can agree with experimental data, and make predictions for future research. This
also makes explicit that not only must an observer recognize the other’s performance, but
must also process the feedback that results (e.g., reward or punishment) and any relevant
contextual information (e.g., order of selections).
3
In our second main effort, we go further by simulating: (i) ape brains, (ii)
interaction and gestural communication, and (iii) dynamic exchange of information,
back-and-forth and not just one-way, between modeled apes. This dyadic brain modeling
can show how the ‘mutual shaping of behavior’ between apes may lead to novel gestural
forms that serve as communicative signals between individuals. (Briefly, if an infant ape
reaches and pulls on his mother, and over time the mother learns to anticipate this – and if
in parallel the infant ape learns his mother’s anticipation – something like a ‘beckoning’
gesture, seen as a truncated form of ‘reaching’, may be derived and mutually understood.)
This process has long been a suggestion from primatologists (Rossano, 2014; Tomasello,
Gust, & Frost, 1989), though there exist strong evidence that at least many gestures are
genetically inherited (Hobaiter & Byrne, 2011a) – with our effort being the first study to
chart the underlying neural mechanisms potentially involved. These mechanisms are
based on the analyses and the models of the first project, but ‘extended’ to address further
datasets from research on gestural and vocal communication (Cartmill & Byrne, 2007;
Hobaiter & Byrne, 2011a; Seyfarth & Cheney, 1986), to comparative behavioral (Dean,
Kendal, Schapiro, Thierry, & Laland, 2012; Horner & Whiten, 2005) neuroanatomical
(Hecht et al., 2013) and neuro-functional research of primates (Hecht et al., 2013). We
also address competing hypotheses concerning the acquisition of gestures in apes, with
the main alternative hypothesis contending that a large ‘gestural space’ is inherited
genetically, with learning processes serving only to ‘prune’ this repertoire and yield the
variation in gestural repertoires seen in the wild. We show how we can explain multiple
learning pathways leading to varied gestural repertoires in a single unified and
computationally-specific model.
In order for research programs to be successful navigating highly interdisciplinary
questions, tools for data management and sharing must be developed and used more
widely (Gasser, Cartmill, & Arbib, 2014; Tomasello & Call, 2011). There exist
dedicated databases for brain modeling (e.g., the Brain Operation DataBase (BODB) and
ModelDB), neuroanatomy (e.g., NeuroHomology DataBase (NHDB) and Allen Brain
Atlas) and neuroimaging (e.g., Brede and BrainMap), for example, but little in the way of
informatics resources dedicated to primate behavioral studies – either in the laboratory or
in the wild (though see the Gesture and Behavior DataBase (GBDB) developed at USC).
4
It is thus necessary for additional efforts to develop these resources in order to grow the
burgeoning field of comparative neuro-primatology.
5
Chapter 2: Modeling sequence learning, representation, and production
Sequential and/or hierarchical learning and behavior, and social learning and
behavior, are key aspects of primate cognition. We here present a neuro-computational
model of primate sequential learning, and then build on this model in the following
chapter to describe additions to the machinery that can explain aspects of primate social
learning as well. We contrast our model with previous efforts to characterize aspects of
sequential behavior – including learning, representation, and production – and models
(and theories) of social learning skill in primates. In doing so, we introduce a case of
‘dyadic brain modeling’ (Arbib, Ganesh, & Gasser, 2014; Gasser, Cartmill, & Arbib,
2014), in which we simulate two model primate brains in a social condition, and discuss
the implications for neuro-computational research in social behavior.
Our effort is described in two chapters, and organized as follows: in Chapter 1,
here, we discuss our general model for sequence learning, production and representation
and test it against a thorough behavioral dataset from the Simultaneous Chaining
Paradigm (SCP), specifically to show the variety of learning processes engaged during a
complex, trial-and-error-based list-learning task. In Chapter 2, we build on this model
and detail model extensions that characterize action recognition and vicarious reward
processing to facilitate learning from others’ behavior. We then compare the modeling
methodology used to accomplish this uni-directional flow of social information with a
more robust effort towards full social interaction and ‘dyadic brain modeling’ (Arbib,
Ganesh, & Gasser, 2014), which is the focus of Chapter 3.
Modeling of sequential behavior has a rich history. Competitive Queuing (CQ)
models represent temporal knowledge through the activation patterns a sequence node
induces across a layer of planning neurons, with each planning neuron corresponding to
an individual action (see, for example: Bullock & Rhodes, 2003). The relative activations
of these neurons can then be ‘read out’ in order of greatest activity by a third layer of
competition nodes. These neurons implement a Winner-Take-All (WTA) dynamic – a
neural implementation of a maximum selector algorithm – with the added feature that
inhibition-of-return from this layer, which inhibits selected items from the planning layer
of neurons, avoids repetition of the initial actions (see Figure 2.1, left). Such models (or
similar variants) have been applied in modeling verbal short-term memory (Hartley &
6
Houghton, 1996), behavioral planning (Cisek, 2006; Ferreira, Erlhagen, & Bicho, 2011),
motor control (Bullock & Grossberg, 1988), and serial recall (Page & Norris, 1998), and
have been integrated with explicit ‘rank ordering’ modules to model more complex
sequential performances thought to be dependent on leveraging ordinal knowledge – as in
tasks that require ‘ABA’-style sequences wherein items repeat within a single sequence
(Silver et al., 2012) (and see also (Davelaar, 2007)).
Figure 2.1. Competitive decision processes in CQ and ACQ models. (Left) The sequence
layer has one neuron representing each well-learned action sequence. Activating such a
neuron drives an activity gradient in the next layer. The relative activation of each
neuron in this planning layer encodes the serial order in the sequence, with the most
active neuron selected first through a Winner-Take-All process implemented by a third
layer of neurons that directly compete for expression. Selection of a neuron in this layer
both generates a motor act, and immediately drives inhibition of the corresponding
neuron in the second layer via backwards inhibitory projections. This “inhibition of
return” then allows the second-most active neuron to be selected, and so on until
sequence completion. (Right) A partial view of the ‘augmented’ CQ (ACQ) schema for
action sequencing presented in (Bonaiuto & Arbib, 2010). Parallel representations for
potential actions (priority) are maintained in a buffer and read out via a layer of neurons
implementing a WTA, as in the classic CQ models. The novel additions as presented here
7
are (i) the identification of the parallel active neurons as a representation of action
‘priority’ – the combination of dopaminergically-learned reward expectations, termed
‘desirability’, and estimates of the likely success of an action given skill and the current
affordances of the environment, termed ‘executability’ – and (ii) the controlled, feedback-
heavy operation of the network when well-learned action sequences are not available:
instead, priority can be re-computed after each action selection due to changes in
distribution of affordances (reflected in executability). Mirror neurons (not shown) assess
self-action to play a crucial role in learning executability and desirability. (See text for
further details.)
However, others have criticized these localist CQ models, preferring distributed,
connectionist models trained through back-propagation algorithms and exhibiting
recurrent activations. These include models of short-term memory for serial order
(Botvinick & Plaut, 2006a), and routine, sequential behavior (Botvinick & Plaut, 2006b;
Botvinick & Plaut, 2004) – the latter model an explicit alternative to hierarchical, localist,
and schema-based models which share certain features with the above CQ architectures
(Cooper & Shallice, 2006; Cooper & Shallice, 2000) (and see Dehaene & Changeux,
1997). Lastly, and distinctively, (Chersi et al., 2011) presented a model that ‘chains’
pools of neurons coding for distinct actions through functional connectivity to generate
sequential behaviors, with different pools coding for the same action when ‘chained’
within different sequences.
However, many of the above models are more concerned with the production
and/or representation of sequences, and less so with the learning processes required for
competent behavior to be achieved – either requiring hard-coded connections, or else
utilizing learning algorithms (e.g., back-propagation) that are implausible during online
trial-and-error learning. The Augmented Competitive Queuing (ACQ) model (Bonaiuto
& Arbib, 2010) combines reinforcement learning mechanisms and action recognition
modules to support opportunistic scheduling of behavior. For each action in the
repertoire, ACQ associates with each action desirability signals (reward predictive, value-
based) learned through temporal-difference reinforcement learning (Sutton & Barto,
1998), and executability signals (likelihood of successful execution) related to the
8
affordances for that action and which are learned through cumulative (context-based)
success or failure. The desirability and executability of an action within a context are then
combined to represent an action’s behavioral priority, and so behavior can more quickly
converge on stable sequences of actions to achieve some goal (Figure 2.1 right).
(Dominey, Arbib, & Joseph, 1995) also modeled reinforcement learning
mechanisms and prefrontal ‘context memory’ in sequential saccade-generation tasks,
reproducing behavioral and neurophysiological responses from (Barone & Joseph, 1989).
The model used recurrent loops, based on prefrontal cortical organization in relation to
basal ganglia, to generate deterministic state transitions between context states. (The
temporal context states of their model are similar to the ‘abstract states’ discussed by
(Lashley, 1951).) In this deterministic and recurrent loop, learning associates successive
prefrontal context states with successive actions as read out through the basal ganglia.
Importantly, the task cued the order of saccades through iterative presentation of the
items on the monitor, and so sufficient temporal information was always provided.
In summary, some efforts seek to characterize the performance of distributed
neural systems during sequential behaviors (Chersi et al., 2011; Cooper & Shallice, 2006;
Silver et al., 2012), while others seek to describe the learning and acquisition of such
behaviors (Bonaiuto & Arbib, 2010; Dominey et al., 1995). On the other hand, some
models abstract from the details of decision-making, and model processes at the event-
level, or the stringing of decisions or actions together in discrete-time (Bonaiuto & Arbib,
2010; Botvinick et al., 2009), while other simulate time-varying neural signaling (e.g.,
spiking neurons, or leaky-integrators) engaging in competition for expression, or
trajectory-level modeling (Chersi et al., 2011; Dominey et al., 1995; Silver et al., 2012).
Lastly, some of the reviewed models pre-specify in their architecture cognitive structures
for organizing behavior (e.g., ‘rank neurons’) (Botvinick & Watanabe, 2007; Silver et al.,
2012), while others use distributed or otherwise more general mechanisms for organizing
behavior temporally (Dominey et al., 1995).
The model we present here borrows notions from several of the above models,
including disambiguating, and then combining, separable learnable signals to inform
decision making, by and applying internal context states which can learn and retrieve
associations with unique items, as in (Dominey et al., 1995). However, it departs in some
9
respects with the latter by learning to stabilize the application of the internal context
signals by generalizing across different lists. Lastly, the competition for expression
among action plans occurs, not in an ill-defined amodal structure, but in spatially-
structured maps for action correlated with relevant environmental properties (Cisek,
2007; Zelinsky & Bisley, 2015).
We now turn to an analysis of the main behavioral dataset we will use to explore
issues in sequential learning thus far neglected in the field.
2.1 THE SIMULTANEOUS CHAINING PARADIGM
The Simultaneous Chaining Paradigm (SCP) (Terrace, 2005) violate the stimulus-
response (S-R) mappings implicit in many experiments probing serial behavior in
animals (see Figure 2.2). Terrace and colleagues probed monkeys’ skill at list-learning
and the extent to which they (i) can learn the serial structure of individual lists, (ii) can
gradually become more competent at the task, (iii) are sensitive to the serial structure of
learned lists, and (iv) are sensitive to observation of others’ actions (which we take up in
Chapter 2). Items (photographs) intended to be learned as a list are presented
simultaneously on a touch-screen monitor for the duration of a trial, and a pre-specified
order of selecting the items must be discovered through trial-and-error (e.g., it is not
given by their order of presentation; compare with, for example, (Barone & Joseph,
1989), the basis for the model of (Dominey et al., 1995).) The locations of the items on
the touch-screen monitor are randomized across trials, and thus do not contribute to
learning the ordering, making the task quite difficult. During training, the animal is
incrementally shaped on the list, first mastering smaller sub-sequences of the list, before
more items are presented on the screen, until the full list is learned – giving implicit order
information, at least at this stage of learning. Once performance criteria are met for the
current iteration – typically, consecutive blocks of trials below a certain error rate, which
may vary between experiments – items are appended to the end of the list, such that lists
are incrementally presented – and learned – as follows:
List 1, Increment 1: A
List 1, Increment 2: A-B
List 1, Increment 3: A-B-C
10
List 1, Increment 4: A-B-C-D
Still, the data show full, 4-item-long lists can be learned in the absence of this
form of incremental shaping by monkeys that have already learned a number of such
incrementally-shaped lists (Swartz, Chen, & Terrace, 2000), in which case all 4 items of a
new list are shown simultaneously (essentially, “List X, Increment 4” as denoted above;
other data suggest naïve monkeys can actually learn 3-item-long lists – “List X,
Increment 3” – from the beginning without any prior incremental shaping (Terrace, Son,
& Brannon, 2003), but it has not been demonstrated, so far as we are aware, that naïve
animals can learn 4-item-long lists without at least some incremental shaping.)
Following selection of each correct item – other than the last – the only feedback the
animal receives is a briefly highlighted outline of the item selected, while incorrect
selections are immediately followed by a ‘time-out’ (Swartz et al., 1991) during which an
overhead light goes out, eventually followed by a new arrangement of the current set of
items (e.g., new trial). Food delivery (reward) follows successful completion of the
current full increment. Repeat selections of an item are tolerated, but are not ‘correct’.
The configuration on the monitor does not change during a single trial, and so differential
feedback offers little to facilitate learning.
11
Figure 2.2. The Simultaneous Chaining Paradigm (SCP). Subjects must learn to press
items on the screen in a pre-specified order. Each item can appear in nine possible
positions on the screen, but these positions vary randomly across trials, thus removing
sensorimotor associations that might facilitate learning. During training, any error
immediately halts the trial, while successful selections are followed by a brief signal
confirming that the selection has been registered. In the above, each column is a new
trial for the same underlying sequence of patterns, with the first row showing the
appearance of the touch-screen monitor on a specific trial, while the bottom row shows
the proper sequence of motor responses the monkey must make. The lettering is included
for the reader, not for the monkey. For the current model, we only consider lists up to 4
items long. [Adapted from (Terrace, 2005).]
Having introduced the experimental paradigm the monkeys operated within, we
will now briefly report the major behavioral findings as culled from the literature.
12
Behavioral Data
1. Monkeys learn the SCP through explicit shaping
Monkeys do indeed master the SCP task following the incremental shaping
procedure just described, as 4-long lists are eventually learned and performed at high
rates of success. In the original study of (Swartz et al., 1991), four lists of four items were
learned, with 3-item, and especially 2-item increments often learned in the fewest
possible number of trials to meet the criterion of two consecutive blocks of 60 trials at
>75% accuracy. The four-item increments – that is, the whole list – often required many
more total trials to reach criterion, upwards of more than 1000, though substantial
variation did exist.
To test the effect of the ‘size’ of the incremental shaping, experimentally-naïve
monkeys – subjects not familiar with the SCP task prior – were tasked to learn 3-item-
long increments. That is, whereas above the task began with a single item, only to append
items until the full list was presented, here the new animals begin with increments of
three items immediately. It was shown that, even starting with three items initially,
monkeys were still able to eventually reach a criterion of single session accuracy of
>65% (Terrace et al., 2003), though it required substantially more training than the
baseline established in previous experiments. (See (Krueger & Dayan, 2009) for a
discussion of ‘shaping’.)
2. Monkeys learn the SCP without explicit shaping
Following enough training, it was found that shaping periods could be completely
foregone and new lists learned with all four list items presented simultaneously from the
start (Swartz et al., 2000). The subjects – trained extensively through the incremental
shaping method above – responded poorly at first, to the point of zero correct trials in
their first session, though eventually criterion levels (>75% accuracy) were achieved. The
total number of trials to reach criterion varied between lists and between subjects, but it
was evident this task was substantially more difficult than lists first confronted in the
incremental shaping method, where responding only to a single item, and then two and
three, was required.
Interestingly, monkeys are able to eventually learn lists up to seven items long
(Terrace et al., 2003), with no studies that we know of testing macaques beyond this
13
limit. The monkeys had extensive task experience prior to testing (on the order of having
mastered ~20 separate lists), and required substantial time – upwards of 30 sessions of 60
trials per session – to achieve criterion levels (>65% accuracy in a session), but the lists
were eventually mastered too. Until the Discussion section, we will only focus on 4-
item-long lists.
3. Monkeys can develop serial expertise
A further result established by the SCP is that monkeys, over time, accumulate
‘expertise’ for the task. In both the incremental shaping method, and in the
‘simultaneous’ list presentation method – for example, new 4-item-long lists presented in
absence of any incremental shaping – it was shown that accuracy improved over time,
suggesting list learning strategies were being extracted by the subjects. For example,
recalling the data from (1) above, successive increments of new lists (i.e., 2-item
increments, and 3-item increments, since a 1-item increment was trivially learned)
required, on average, fewer total trials to reach criterion levels (Swartz et al., 1991). For
those subjects from (1) that began the SCP with 3-item-long increments, each new list
tended to require fewer total trials to reach criterion, on average (Swartz et al., 2000). In
fact, for the 7-item-long lists, there was a strong trend towards increased competency,
with substantially fewer total blocks of trials to reach criterion accuracy for the fourth 7-
item-long list encountered, as compared to the first 7-item-long list encountered (Terrace
et al., 2003).
4. Monkeys are sensitive to manipulations of serial structure
Following the demonstration that monkeys can learn SCP lists, various
manipulations of the lists’ serial structure were performed in order to probe the
mechanisms the monkeys employed in order to solve and execute the lists. For example,
one possible learning strategy could involve learning associations between the items – so-
called ‘chaining’ models, as reviewed above. However, the following data from list
manipulations dispute this class of models, as utilizing item-item associations was not
available.
Conserved lists
Conserved lists are lists composed entirely of items already encountered. Each
conserved list contains an item from a different list previously performed at criterion
14
levels – thus, a 4-item-long conserved list derives elements from four different lists – and,
crucially, each item is derived from a unique serial position, which is conserved in these
lists (that is, if item X was ‘third’ in its original list, then it is also ‘third’ under this list
manipulation). That way, learned associations between the items in these lists and their
putative serial position are maintained, which was hypothesized to facilitate reproducing
these lists (Chen, Swartz, & Terrace, 1997).
The data show that the monkeys perform these lists at levels comparable to lists
they have already been extensively trained on (Chen et al., 1997) – achieving
performance upwards of 97% correct (29/30) over the 1
st
30 trials – suggesting little need
for trial-and-error learning or otherwise adapting to the novel combination of familiar
items (Figure 2.3). In fact, for the most part these lists require the minimum total trials
necessary to reach criterion performance rates.
Scrambled lists
Like conserved lists, scrambled lists are composed of items encountered in
previous lists but the previous serial positions are scrambled (Chen et al., 1997). (Chen
and colleagues refer to scrambled and conserved lists as “changed” and “maintained”,
respectively.) These lists take at least as long to reach criterion levels as lists composed of
elements with no previous associations (e.g., novel lists) (Chen et al., 1997) (and see
Figure 2.3), suggesting some amount of ‘un-learning’ the previous (now-invalid)
associations is required. Additionally, the previous associations were predictive of the
errors the monkeys would make: items previously learned to be first in a list were
selected as such, even following many successive errors, suggesting the monkeys
required substantial time to un-learn these associations.
Wild card lists
Wild card lists are formed by taking a learned list and eliminating an item and
replacing it with a novel ‘wild card’ item the monkey has no experience with. The
monkey must discover that the novel item replaces the missing item (D’Amato &
Colombo, 1989). (Note that the design of this task – e.g., lists being upwards of 5 items
long – and the training undergone by the monkeys, is not the same as the SCP, though the
results hold.) In the wild card lists, the monkeys perform at high levels almost
15
immediately by selecting, in the list position vacated by the missing item, the wild card
item. This result, too, questions chaining models.
List Manipulation Summary
The data from the above list manipulations clearly show effects of previous
learned associations of serial order (and see (Orlov, Yakovlev, Amit, Hochstein, &
Zohary, 2002; Orlov, Yakovlev, Hochstein, & Zohary, 2000) for similar results not
reviewed here), and dispute claims for ‘chaining’ models as an exclusive explanation of
list representation and production. Instead, whether they are valid for particular
circumstances of list production – well-learned lists or sequences – such models fail to
explain the above data for conserved lists as any item-to-item associations disappear,
while performance rates are not affected. Indeed, all of the models discussed thus far fail
to explain the learning required in such a task, as either no online trial-and-error learning
mechanisms are modeled (Botvinick & Watanabe, 2007; Silver et al., 2012), or the
learning is incapable of handling simultaneous lists per se (Bonaiuto & Arbib, 2010;
Botvinick et al., 2009; Dominey et al., 1995).
16
Figure 2.3. Results from derived list manipulations. The figure shows the results for two
monkeys (dark and hashed bars, respectively) for three different manipulations, each
with two instances of lists. On the left are novel ‘control’ lists, showing the trials-to-
criterion for each list, for each monkey. The ‘maintained’ lists conserve the learned
serial structure, while the ‘changed’ lists scramble these associations. The maintained
lists are performed at criterion levels in the fewest possible trials for three of the four
lists, demonstrating impressive transfer effects. The control lists are learned moderately
quickly, while the changed lists, though variable, seem to be the most difficult lists to
learn and reproduce. [Adapted from (Chen et al., 1997).]
17
5. Monkeys are facilitated on new lists following observation of a teacher monkey’s
performance
We take up these data in Chapter 2.
Neurophysiological Data
The results from the SCP are purely behavioral, but neurophysiological
investigations of decision-making and sequential behavior in macaques have revealed
characteristic patterns of responses that provide some sketches of the mechanisms likely
to participate in representing decision variables for the monkeys. We give greater
attention to these data in the Discussion, but we note several important findings here.
Firstly, several nuclei within at least parietal and frontal regions have shown
response profiles correlated with ‘rank’, ‘ordinality’ and/or ‘numeric’ variables (Nieder
& Dehaene, 2009). It has been shown that multiple cortical regions in macaques can
represent serial order for both particular sequences of actions, and for sequences of items
(towards which actions are targeted) (Berdyyeva & Olson, 2009, 2010). Sequences of
lever manipulations – for example, twisting a lever – which do not resultant in great
environmental feedback, are managed by internal representations for the state of
execution, and disruptions of these circuits (here, it was area 5 of parietal cortex) by
muscimol impair not the motor coordination, but the temporal organization (Sawamura,
Shima, & Tanji, 2002, 2010). These class of responses can be controlled against possible
reward-modulated effects (Berdyyeva & Olson, 2009), which are also observed to be
present across wide portions of cortex and subcortical structures (Ding & Hikosaka,
2006; Pan et al., 2014; Wallis & Kennerley, 2011). Together, this suggests decision-
related signals can be separable, while further investigation has shown these signals can
also combine to affect decision-making (Watanabe & Sakagami, 2007), as also suggested
above by the ACQ model: reward-related desirability and affordance/context-based
executability combine to give the behavioral priority of each action.
Additionally, neurons within fronto-parietal networks – lateral intra-parietal (LIP)
region in particular – have shown response characteristics across a wide variety of task
conditions and demands that has lead some to regard these networks as implementing
‘priority maps’ crucial for integrating and structuring information for real-world
interaction with one’s immediate environment (Bisley & Goldberg, 2010; Gottlieb,
18
2007). Mostly investigated through saccadic eye tasks, decision-making tasks requiring a
reach response also suggest some decision-related signals (in relevant reaching-control
areas, like dorsal premotor cortex) maintain a topography correlated sensorimotor
demands (Cisek & Kalaska, 2005). This suggests more generally a perspective that
“…perceptual, cognitive, and motor systems may not reflect the natural categories of
neural computations that underlie sensory-guided behavior” (Cisek & Kalaska, 2010).
Finally, it is well-established that competitive interactions between neurons
representing potential decisions or motor plans underlie much of decision-making.
Networks representing, in parallel, potential motor plans have been described (Cisek &
Kalaska, 2002), and it can be observed that when a cue indicates which plan is
appropriate for that context, the correlated neural population then wins out. Parallel
planning of sequential motor acts have also been described in prefrontal cortex, with the
relative activation of a plan at a given time predictive of its order in the sequence
(Averbeck et al., 2002). Neurons responsive exclusively to a single, well-learned
sequence of movements (and even sequence sub-sets) have also been isolated in various
parts of frontal cortex (Tanji & Shima, 1994). Such future planning and competitive
interactions largely comport with the CQ models discussed above. However, the issue
remains how such plans are acquired and how the sequences are learned and
consolidated. These and other issues pertaining to relevant neurophysiological data, and
their relations to behavioral data and modeling claims, are further fleshed out in the
Discussion.
Summary of the Empirical Data
Monkeys were shown capable of learning lists of items without possible S-R
associations. List manipulations suggest the monkeys are sensitive to the serial structure
of the list, and do not rely (at least exclusively) on mere ‘chaining’ of items. The
monkeys also appear to learn the structure of the task, in the sense of becoming
increasingly competent list learners over time. Lastly, neural recording data suggest a
suite of mechanisms is available to the monkeys during list representation and
production. However, it is up to computational modeling to offer precise hypotheses on
how such machinery may relate to each other, and how learning may rely on one or
another at different times, from initial acquisition to well-rehearsed performance.
19
2.2 MODEL DESIGN
We provide a brief sketch of our model here, demonstrating the two key
mechanisms for solving this complex sequence learning task before expanding our scope
and presenting the model in full detail. This way, we can immediately contrast our model
against a subset of the models briefly reviewed above.
The first issue any model confronting the data presented above must solve is how
to acquire the lists – first through incremental shaping and then without that crutch. The
two key mechanisms in our model are a learning and decision-making network that
represents temporal context and item value together, along with a mechanism for
generating and maintaining precisely these internal context states that correlate with the
temporal progression in execution of the learned sequence. We base the primary learning
mechanisms of our model on the two complementary and learnable signals from ACQ:
reward-related signals (desirability), and context-related signals (executability), which
combine to give a behavioral priority signal. ACQ derives its value-based representation
on the likelihood of achieving some general goal state (such as eating food as delivered
by the task protocol) while the executability of an action is based on whether or not it can
be carried out in the current context, which may change when actions are executed.
However, an action in SCP does not change the state of the world (i.e., the set of patterns
available for touching on the screen) in such a way that the resultant state change can be
uniquely associated with a subsequent action. Thus, internal ‘updating’ of the state of
execution of a sequence must be maintained.
(Lashley, 1951) argued long ago for ‘abstract state’ representations in explaining
sequential and/or hierarchical behavior. Here we see the relevance of the recurrent
memory loops in the model of (Dominey et al., 1995). Their sequence learning model for
generating saccadic eye movements relied on a novel interpretation of cortico-striatal
loops. These loops, it was claimed, maintained internal states with two key properties: (i)
each state could be associated with a particular action plan, and (ii) state transitions were
deterministic. (These systems came to be regarded as the first implementation of what
are now known as ‘reservoirs’, or reservoir computing (Jaeger, Maass, & Principe,
2007).) We similarly suggest in our model that networks can maintain internal states
20
with deterministic transitions that facilitate the learning of items in a sequence. However,
we depart in one respect in that, while both models come to associate temporal context
states with a list item, we extend the learning to include generalizing from the variations
in visuo-spatial input, such that most/all list items come to be learned in relation to a
particular sequence of temporal context states. In this way our model can readily explain,
e.g., the high rate of performance for conserved lists. This is discussed in more detail
below.
Having given a high-level perspective of the key elements of our model, we will
now detail the model precisely.
Figure 2.4. High-level model schematic. Visual input in the form of a 3x3 array
provides input to the model. Immediately, a dorsal and ventral path process the input,
computing a binary representation of target locations preserving spatial relations
between items (dorsal), and visually discriminating the items based on their unique
feature-vector representation (ventral). Integrated visuo-spatial information is
represented in a visual working memory layer. This layer informs downstream structures
of the content and visuo-spatial features of the monitor. The temporal memory layer
21
maintains the internal state of execution for the current list and projects, complementary
to the value-based signal provided by the current motivational state, a temporal context
signal that informs the behavioral priority map, which is composed of multiple layers not
shown here. The major representations used in the model are visible, as are the learning
pathways (dotted lines) that contribute to model performance. Learning itself is managed
by an outcome processing module that manages the feedback of the environment and
interacts with relevant learning pathways (circled cross-sections).
Visual processing
The input to the model is given by a 3x3 array, with the visual patterns
represented as sparsely-coded feature-vectors, ‘n’-long, with ‘i’ total active units (set
equal to 1) for each vector. (See Appendix for parameter values, layer sizes, noise levels,
etc. for the reported simulation results.) The visual patterns are randomly arranged on the
monitor, provided that the middle location is unoccupied. For example, the visual pattern
for the character ‘A’ in Figure 2.4 is give by:
[010100]
Replacing the feature-vectors with characters for simplicity, we have for example:
D
B
A C
We do not explicitly model visual search processes to identify items and their
respective positions. Rather, we assume visual search processes and instead model dorsal
and ventral pathways that collectively build up a visual working memory representation
that binds both item information (ventral) with spatial information (dorsal).
The upper (dorsal) visual pathway of Figure 2.4 processes the image to establish a
binary representation of all occupied target locations (i.e., those locations containing a
visual pattern). In our example, this pathway gives the dorsal representation “d” as:
22
1 0 0
0 0 1
1 1 0
And note that since there is no change in stimuli configuration following
selections, “d(t)” does not vary in time (for each trial).
A second (ventral) pathway decodes each visual feature vector and activates a unit
in the object recognition layer “o”, for each pattern. This mapping is fixed and arbitrary.
This active unit is the model’s internal representation for that visual pattern, and its
learned associations over time are what contribute to list acquisition, as we see below.
Again, the activations in “o” do not vary in time (for each trial), and so for 4 visual
patterns on the screen, 4 units are active at “o(t)”.
Finally, an integrated representation of the monitor combines both dorsal and
ventral representations. This visual working memory “ ” (again, it follows that
“ ” does not vary in time) preserves the spatial distribution of the items and the rich
distributed feature vectors, algorithmically combining both representations: the first
dimension (rows) corresponds to all grid locations (itself a 3x3 array, but re-arranged as a
1-D array here), and the second dimension (columns) to the length “n” of the feature
vectors. For the 3x3 monitor and feature vectors “n=16”, this array becomes 9x16. It is
this layer, maintaining all visually-relevant information for the performance of the task,
that projects downstream via modifiable weights “ ” to the temporal memory layer to
establish a sequence of temporal context states that facilitate associations over time
without S-R relations present.
Temporal memory layer
By now, the model has established a visual working memory representation of
what items are visible, and where the items are located. Importantly, up to here, none of
the representations built up in the model vary in time, and so the model must generate a
stable, internal temporal signal from these representations. We model a temporal memory
system “ ” as a layer of neurons that firstly are activated by the visual working
memory representation, from which a k-WTA process maintains only the “k” most active
units (set equal to 1). The resultant state can then learn biases that influence item
y
v
(t)
y
v
(t)
W
f
y
m
(t)
23
selections. Then, following successive item selections from the output of the model,
efferent feedback “e(t)” can update this layer due to its recurrent connectivity: each unit
in the layer is connected with excitatory weights to all other units, with inhibitory self-
connectivity. This gives the property that any state of activation “ ”, once excited
again through “e(t)”, will deterministically follow with “ ” in such a way that “
” . For example, the weights for a 3-unit layer would be:
Together, the activation of the temporal memory layer is given by:
(1)
where the efferent copy “e(t)” is equal to 0 unless immediately following a selection, in
which case “e(t)” is equal to 1, and where “ ” is equal to “1” if “t=1”, else it is equal
to “0” – meaning it gates the influence of the bottom-up activation from “ ”. Thus, “
” is stable until efferent copy updates the layer via other-excitatory / self-
inhibitory projections. Following each update of this layer, the k-WTA process maintains
only those “k” neurons.
However, because the composition of visual working memory can change every
trial as a function of random spatial arrangement on the monitor, and introduction of new
visual patterns, the mapping from visual working memory content “ ” to the
temporal memory system “ ” is widely variable unless managed through learning –
that is, every trial will tend to result in a different initial state for the temporal memory
layer due to varied inputs. Following trial termination – either as a result of list
completion or failure, that is positive or negative feedback – learning processes
strengthen or weaken associations between visual working memory content and the
initial activation state of the temporal memory layer:
(2)
where “ ” is a learning rate constant, “ ” is the value of the primary feedback (“1”
if rewarded, “-0.1” if punished, and “0” otherwise), and “ ” is the change in weight
y
m
(t)
y
m
(t+1)
y
m
(t+1)≠y
m
(t)
W
r
=
-1 1 1
1 -1 1
1 1 -1
!
"
#
#
#
$
%
&
&
&
y
m
(t+1)=g
m
(t)×(y
v
(t)×W
f
)+e(t)×y
m
(t)×W
r
g
m
(t)
y
v
(t)
y
m
(t+1)
y
v
(t)
y
m
(t)
ΔW
r,s
f
=α
f
×r
p
(t)
α
f
r
p
(t)
ΔW
r,s
f
24
values in “ ”, the weight matrix mapping visual features to the temporal memory layer
(“r” and “s” are the active nodes, i.e., those equal to ‘1’, in “ ” and “ ”, respectively).
Thus, over many trials with varying spatial arrangement, and over many different
lists with novel visual patterns, both contributing to novel visual working memory
content, the model ‘generalizes’ the task demands in part by establishing a consistent
mapping to the temporal memory layer, which facilitates list acquisition over time. This
also gives the property that the model is robust to the list manipulations: because the
temporal memory layer evolves deterministically, successive states allow retrieval of past
associations, even if the content of the monitor – and so as well, the visual working
memory – is novel. We can see, then, how the added learning process on the ‘input side’
of the temporal memory layer – lacking in the model of Dominey et al. – allows us to
explain how items not previously presented together are nonetheless easily executed
when ordinal positions are conserved. In the model of Dominey et al., each list generates
a different trajectory in ‘context state-space’, meaning conserved lists would not carry-
over past associations (nor would scrambled lists greatly interfere with learning and
execution). Instead, the model seeks to generalize to a shared trajectory in ‘context state-
space’.
W
f
y
v
y
m
25
26
Figure 2.5. Event-level evolution of decision-related representations. Visual
processing establishes the content of the visual working memory layer, which at trial
initiation activates the temporal memory layer and so generates temporal context state 1.
The temporal memory layer projects this activation to the behavioral priority map,
concurrently with the visual working memory providing the visuo-spatial structure.
Additionally, the motivational state of the animal informs the behavioral priority map of
the reward-predictive value of each visible item. Thus, the behavioral priority map –
rendered here as a single layer, but see Figure 2.6 for more details – integrates visuo-
spatial, cognitive and motivational information. Following selection, corollary discharge
and recurrent activities induce a new pattern of activation in the temporal memory layer,
now discriminable as context state 2 (and deterministically computed from state 1, as
recurrent connectivity is fixed), and provides new input to the behavioral priority map.
The dashed lines in the figure denote data streams that change in time, while solid lines
indicate data that does not vary over the course of a single trial. Thus, it can be seen here
that the initial activation of the temporal memory layer by the visual working memory
layer selects a trajectory in state space that unfolds predictably, allowing reliable
retrieval of temporal context information that, along with reward-predictive
considerations, influence downstream decision processes.
Behavioral priority
In order to enact decisions, the model must retrieve learned associations about
each item: temporal context associations and reward-predictive or ‘value-based’
associations. Value-based associations “v” are encoded in a modifiable weight matrix
that maps the internal motivational state “g” to each item, as follows:
(3)
where “g” is an array of units representing different motivational states, but with a
single, fixed node equal to “1” (in the future, this can expand to simulate motivational
depletion, or reward-driven behavior where rewards themselves vary), and “ ” is the
weight matrix encoding value. The value weight matrix is updated according to the
difference in expected and actual reward-prediction, based upon temporal-difference
reinforcement learning (Sutton & Barto, 1998):
v(t) = g×W
v
W
v
27
(4)
where “ ” is the reward-prediction error, “ ” gives the estimated value of the
selection made at time “t”, “ "gives the estimated value of the selection at “t-x” (the
previous selection, a variable “x” number of time-steps prior), and “ ” gives the value
of the feedback (described above). The estimated value “v” of each item is updated on
the basis of the output of the “ ”, so that the ‘difference’ in expected and actual reward-
predictive value updates “v” appropriately. “ ”, in turn, is updated according to the
delta term above:
(5)
where “ ” is the value learning rate. Thus, the goal state of the model retrieves value
associations for each visible item, as depicted in Figures 2.4 and 2.5.
The value or reward-predictive power of each visible item influences decision-
making complementary to learned temporal context signals. Associations must be
established between the evolving temporal context states and the visible items, and these
associations must bias selections independent of value-based biases. The temporal
context signal is given by:
(6)
where “ ” is the context signal at time “t”, “ ” is the temporal memory state at
time “t” (see above), and “ ” is the modifiable weight matrix which encodes the
temporal context associations. The modifiable weight matrix is adapted according to
associative learning
(7)
where “ ” is the context learning rate, and “ ” is the effective reinforcement
(different than the primary reinforcement above), which is equal to “1” unless the
selection is incorrect, in which case it is equal to “-1”. (This means state+item
associations are implicitly positively reinforced unless followed directly by negative
feedback.) Weights here are re-normalized by decreasing all the weights of non-
selections by a factor of “ ”. The learning here, then, is ‘competitive’, in that
δ(t) = v(t) - v(t-x) + r
p
(t)
δ(t) v(t)
v(t-x)
r
p
(t)
δ
W
v
ΔW
v
= α
v
×δ
α
v
c(t) = y
m
(t)×W
c
c(t) y
m
(t)
W
c
ΔW
c
= α
c
× r
e
(t)
α
c
r
e
(t)
ΔW
c
28
learning of one state+item association, per selection, comes at the expense of the
associations between the current temporal context state and all other items.
Finally, value-based and context-based signals are combined into a behavioral
priority signal which ultimately determines item selection. Behavioral priority is
computed as:
(8)
where the behavioral priority is given by “ ” and where “ ” is Gaussian noise.
Again, this is reminiscent of the ACQ model discussed prior, wherein desirability and
executability integrate to give priority. However, we situate this signal within
topographically-maintained visuo-spatial maps for acting on the environment, in
particular the touch-screen monitor. The behavioral priority map is a network of 3x3
layers of neurons, topographically correlated with the monitor and the visual working
memory layer. The behavioral priority signal is dynamically mapped onto the input layer
of this network, and through competitive interactions between these layers of neurons an
action plan is isolated on the output end of the network. Accumulator neurons driven
above threshold spike in the output layer of this network, projecting topographic-specific
activation onto the basal ganglia gating system. The gating system is driven to disinhibit
the relevant action plan and a motor act is finally outputted by the model. Thus, the
behavioral priority signals encode reward-relevant information, temporal context
information, but also the location of each item on the monitor, such that selecting across
the behavioral priority signals immediately gives motor parameters for acting (see Figure
2.6). We detail these layers next.
p(t)=v(t)×c(t)+σ
b
(t)
p(t) σ
b
(t)
29
Figure 2.6. Competition network. The Planning layer of neurons activate to the degree
to which they are driven by input (behavioral priority signal). The distribution of activity
30
seen here - darker = more active - encodes the relative order of selections. The
competition layer receives input from the planning layer, an inhibitory layer, and
recurrent activation from itself (one-to-one recurrence). The inhibitory layer sums the
activations in the competition layer, with each neuron then projecting one-to-one back to
the competition layer. The result, in the competition layer, is that the neuron with the
largest input tends to dominate, while the other neurons are driven to lower activations.
Finally, each neuron in the competition layer drives a neuron in the accumulator layer,
which must pass a threshold before any firing (the threshold decays with time,
encouraging selections even when the inputs are not clearly separated in activation).
Competition network
We model a competitive neural network composed of leaky-integrator style
membrane potentials and sigmoidal firing rates to arrive at an item selection. Each layer
of neurons has an activation state given by their membrane potentials, and each state is
translated to a firing rate – the quantity that is outputted by that layer and passed to other
layers. The generic equation for a leaky-integrator is given by:
(9)
where “a” and “b” are constants that sum to one, “ ” is the sum of input to that
neuron, “ ” is the membrane potential, and “ ” is Gaussian noise. The generic
equation for the nonlinear thresholding sigmoidal we use is given by:
(10)
where “ ” is the firing rate, “h” is a constant, and “ ” is the membrane potential
from above.
The competitive network works by taking the behavioral priority input “ ”
and generating activations in the planning layer. (Note that the neurons here do not
compete; see Figure 2.6.) The competition layer receives one-to-one projections from
this layer, as well as input from an inhibitory layer, whose activation is proportional to
the total activity in the competition layer. Finally, the network drives an accumulator
neuron above threshold: each competition neuron, the most active of which tends to
dominate over all others, projects to an accumulator neuron which must pass a threshold
y(t)= a×y(t-1)+b×x(t)+σ(t)
x(t)
y(t) σ(t)
f(t)=
1
1+e
−h+y(t)
f (t) y(t)
p(t)
31
for selection. Once driven above threshold, a spike is output and a cascade of activity
through the basal ganglia gating system will disinhibit the relevant motor act. As the
motor action is outputted (in a topographically consistent fashion), efferent feedback
drives a new temporal context state, a new priority signal drives a new pattern of activity
in the competition network, and so on.
The details of each layer are the following (and see Appendix for parameter
values, etc.). The planning layer’s membrane potential is given by:
(11)
The firing rate for the planning layer is given by:
(12)
The membrane potential of the competition layer is given by:
(13)
where “ ” is the total input to the layer, which itself is given by:
(14)
where “ ” is the firing rate of the inhibitory layer, described below. The firing rate
for the competition layer neurons is given by:
(15)
The membrane potential for the inhibitory layer is given by:
(16)
where “ ” is the sum of activities in the competition layer:
(17)
The firing rate in this layer is directly proportional to the membrane potential:
(18)
The membrane potential for the accumulator layer is the following:
(19)
The firing rate for the accumulator neurons is given by:
y
p
(t)=a
p
×y
p
(t-1)+b
p
×(c
p
× p(t))+σ
p
f
p
(t)=
1
1+e
−h
p
+y
p
(t)
y
c
(t)= a
c
×y
c
(t-1)+b
c
×x
c
(t)+σ
c
x
c
(t)
x
c
(t)= f
c
(t)− f
i
(t)+4× f
p
(t)-1
f
i
(t)
f
c
(t)=
1
1+e
−h
c
+y
c
(t)
y
i
(t)= a
i
×y
i
(t-1)+b
i
×x
i
(t)+σ
i
x
i
(t)
x
i
(t)= f
c
(t)
∑
f
i
(t)= y
i
(t)
y
a
(t)= a
a
× y
a
(t-1)+b
i
× f
c
(t)
32
(20)
where “ ” is a time-varying threshold. " ” begins equal to “1” at the beginning of
every trial and is rest to “1” following every selection. Every time-step that a selection is
not reached, " ” decays – lowering the threshold for action such that low inputs (or
inputs closely equal to each other) to the network still manage to reach threshold. The
equation for the time-varying threshold is given by:
(21)
where “d” and “ ” are constants and “ ” is the time since the last selection. Thus, as
time between selections increase, the threshold for action decreases. (For example, see
(Thura, Beauregard-Racine, Fradet, & Cisek, 2012) for similar considerations.) The
passing of threshold results in the efferent signal that updates the temporal memory layer.
The corresponding unit in the inhibitory layer is briefly driven to strong activation, as a
neurophysiological ‘inhibition-of-return’ (Klein, 2000). Additionally, the values of the
corresponding unit in the competition layer and the accumulator layer are driven back to
“0”.
Basal ganglia gating
Motor output in our model involves disinhibiting a particular unit (corresponding
to the selection of that tile on the touch screen monitor) through a basal ganglia-based
gating mechanism. The passing of threshold by an accumulator neuron causes a brief
spike of activity in the topographic map, which downstream releases the tonic inhibition
which gates action selection. The gating system involves a direct and indirect pathway
which together work to project tonic inhibition across motor output cells, unless an
upstream activation, in the form of an accumulator neuron spike, releases a particular
motor act from inhibition. Just as the competitive neural network, the units in the basal
ganglia are arranged in a 3x3 grid corresponding to the touch screen monitor. The direct
pathway’s membrane potential is given by:
(22)
The indirect pathway is given by:
f
a
(t)=0 for all y
a
(t)< z(t)
f
a
(t)=1 for all y
a
(t)≥ z(t)
z(t) z(t)
z(t)
z(t)=
d
d+α
z
(T
t
+z(t-1)
α
z
T
t
y
dir
(t)= f
a
(t)
33
(23)
The globus pallidus external layer (GPE) is given by:
(24)
The subthalamic nuclei (STN) is given by:
(25)
The substantia nigra pars reticulata layer (SNr) is given by:
(26)
And finally, the thalamus layer neurons are given by:
(27)
Upon reaching “0”, a motor act is disinhibited. It is only at this point that the efferent
copy is sent back to update the temporal memory layer, a motor act is outputted, and the
selection is evaluated and feedback computed. (For all layers in the basal ganglia-based
gating system, firing rates are equal to membrane potentials as described here.) See
Figure 2.7 for parameter values used within.
Parameter Description Value
n Feature vector length 16
i Features that obtain 3
k k-WTA 4
Learning rate within 0.05
Learning rate within 0.05
Learning rate within 0.05
Membrane potential constant in 4
Membrane potential constant in 4
Membrane potential constant in 2
Sigmoidal constant in -0.6
Sigmoidal constant in -4
d Time-varying threshold constant 60
Time-varying threshold constant 2
y
ind
(t)= 0
y
gpe
(t)=1+y
ind
(t)
y
stn
(t)=(−1)×y
gpe
(t)
y
snr
(t)=(−1)×y
dir
(t)+(−1)×y
stn
(t)
y
tha
(t)=−1+y
snr
(t)
α
f
W
f
α
v
W
v
α
c
W
c
a
p
y
p
(t)
a
c
y
c
(t)
a
i
y
i
(t)
h
p
y
p
(t)
h
c
y
c
(t)
a
z
34
Figure 2.7 Parameter values. Parameter values for reported simulations.
2.3 METHODS AND SIMULATION RESULTS
Here, we detail the SCP protocol as implemented in our model, with a few
modifications to suit simulation needs (see Figure 2.8). For example, in the experimental
SCP results as reported by Terrace and colleagues, ‘repeat’ selections were considered
‘not wrong’: selections of a single item back-to-back were not followed by negative
reinforcement, but were in no way ‘correct’ (e.g., required/rewarded). (Note, however,
that repeats were not very common regardless.) Instead, we do consider these repeats
‘wrong’, but only for ease of simulation. (Repeat errors contributed only ~ 5% of total
errors for our model.) Additionally, due to random processes in the model
implementation – random weight initializations, random feature-vector encoding, random
arrangement of items, etc. – we ran 10 total simulations of the model to arrive at
simulation results we could test for statistical significance. We compare our results
qualitatively to the behavioral data reported in the literature, contrasting overall model
performance, error rates and error distributions to the literature, as well as contrasting
model neural response profiles against known response types as recorded in macaques
during structurally-similar tasks. We begin with analyzing gross behavioral measures.
35
Figure 2.8. Shaping structure and list manipulations. (Left) Our primary
implementation of the Simultaneous Chaining Paradigm begins by presenting a single
item (“1-long” list) and requiring selection until some performance criterion is met.
New items are appended (“2-long”, “3-long”), following successful performances at the
previous increment, until the final form of the list is displayed (“4-long”). Many lists are
learned in this way (e.g., lists B, C and D undergo this incremental shaping as well).
(Right) Following mastery of four, 4-long lists, incremental shaping is abandoned and a
new list (E) is learned. Then, two derived lists are created, composed of familiar items:
one which conserves absolute ordinal position (first row), and one which violates these
relations (second row). (And note that for both lists, no items have been presented
together prior.) Finally, a wild card list is made, with a single novel item in place of a
familiar item presented in the context of a well-learned list.
1) Learning SCP lists through explicit shaping
We first demonstrate that our model can acquire SCP lists through incremental
shaping. As in the literature, we begin with a single item placed randomly on the monitor.
Each successful trial of a single increment of a list was rewarded, and following criterion
completion of blocks of 50 trials for each increment – consecutive blocks of 50 trials
above 80% performance, though we have considered alternative criteria (e.g.,
instantaneous error rate below) which nonetheless do not greatly change the overall
behavioral pattern – the increment was updated by appending a new item to the existing
increment, until the full list was presented. In this way, the model must demonstrate high
performance levels at each increment of a list. In total, we trained four lists of four items
each this way.
Results
The model was able to learn all four lists in this way. At each increment, the
model learned to append the new item to the end of the existing sequence. As was noted
above in the summary of the data from the literature, naïve monkeys – monkeys with no
experience on the task prior – were able to learn lists starting, not with a single item, but
with two, or even three items initially (Terrace et al., 2003). (Still, the protocol contained
the ‘implicit’ shaping in that wrong selections terminated the trial immediately.) To test
36
this, we re-initialized and re-ran the model with increments starting at, for example, ‘A-
B’ and ‘A-B-C’ initially. For the condition beginning with ‘A-B-C’, it was significantly
more difficult to learn, compared to beginning with a single item as above, as well as an
‘A-B’ beginning, but each condition can be learned eventually (though the range of
parameter values, weight initializations, etc. are more narrow than when beginning with a
single item). We do not further discuss these different starting conditions, and for the rest
of the document only consider initial shaping as beginning with a single item.
Figure 2.9. Model behavioral results. Performance measures across list types are
shown. For all lists and for all performance measures, the 4-long increment only is
shown. The incrementally-shaped lists (Lists 1-4) are easier to learn, on average, than
the simultaneous (baseline) list (List 5). Compared to this baseline, conserved lists (List
6) are much easier, requiring little learning, if any, to master. Conversely, the scrambled
lists (List 7) are just as difficult as the baseline, but more difficult than conserved lists.
2) Learning SCP lists without explicit shaping
We next demonstrate that our model can learn a full list without the explicit
shaping procedure. Following the learning in (1), a new list of four items was presented
simultaneously to the model.
37
Results
The model eventually reached criterion levels, but on averaged needed more
blocks to reach criterion levels as compared to lists from (1). Comparing the 4
th
increment performance for all lists in (1) to the performance here (recalling that the lists
here are ‘4
th
increment lists’ but without the crutch of incremental shaping) we can
clearly see that the shaping procedures greatly influence model performance and learning
rates. 92.8 shows model performance rates across a range of measures. As compared to
performance rates (on the 4
th
increment) for all incrementally-learned lists, the
simultaneous lists are significantly more difficult to learn (p<0.05). This is easily
demonstrated since in the case of learning the 4
th
increment for lists from (1), the
ordering ‘A-B-C’ is more-or-less fully learned at this point and the further learning
essentially involves one extra update of the temporal memory layer and establishing
appropriate associations from this state to the new item ‘D’, whereas for the simultaneous
list here, all of the sequence ‘A-B-C-D’ must be learned, greatly increasing the search
space for the model (or animal).
We note here that the learning required to reach criterion levels of performance
for even just this small part of the task is non-trivial. Many of the models we reviewed
above are incapable of learning in this way, from CQ-style models (Silver et al., 2012)
that, while comprehensive in their account of the circuitry subserving sequential
saccades, do not give any account of the learning required to reach high performance
levels, to schema-hierarchical models that similarly build-in the required structure to
competently perform the modeled tasks (Cooper & Shallice, 2006; Cooper & Shallice,
2000), to backpropagation-trained models that learn offline and have little to say
regarding the significant trial-and-error learning necessarily involved (Botvinick & Plaut,
2006a, 2006b). The ACQ model that inspires much of the architecture and learning in
this model similarly fails even at this stage since its learning is driven solely by relying
on the feedback from the environment that results from instrumental interaction. That is,
it lacks the capacity to maintain internal temporal context signals. Finally, the model
from Dominey et al. from which we borrow the notion of recurrent temporal memory,
similarly fails, strictly speaking, in that its performance is predicated on being cued by
the temporal presentation of the items. However, alternative models, such as the chaining
38
models of (Chersi et al., 2011), are still viable in their explanatory power (though note
that their architecture still does not model learning). They would contend that, as
opposed to the context+item associations we establish, their learning through item+item
associations can give the same, or similar, results up through here. However, as we will
show below, these models fail when confronted with the list manipulations.
3) Sensitivity to list manipulations
Conserved and scrambled lists
To test the model on both conserved and scrambled lists, we return to the first
four lists from (1): conserved lists consisted of items that preserved their former serial
relations, while scrambled lists consisted of items that defied their former serial relations.
We ensured that no item was in both the conserved and scrambled lists.
Results
The model was able to reproduce the pattern of behavioral results we reviewed
above (see Figure 2.9). In particular, it performed conserved lists at criterion levels
immediately, and performed scrambled lists at very low rates. As a control, we used the
list from (2) as a baseline here, and compared conserved and scrambled lists against it.
Conserved lists were performed at much higher rates than the baseline list from (2)
(p<0.01). In fact, on average, the total trials required for the conserved list was the
minimum necessary to reach criterion. Conversely, scrambled lists were performed much
worse. Across the first 50 trials, the scrambled lists were performed more poorly than
baseline (p<0.01), as also indicated by the measure of trials until first correct
performance (p<0.05). On average however, simultaneous lists required more total
blocks to reach criterion (p<0.01), owing probably to the fact that these items are novel
and the weight initializations established weak initial biases. Finally, the comparison
between conserved and scrambled lists was as predicted, with conserved lists obviously
being easier than the scrambled lists (p<0.01).
These results follow from the structure of the temporal memory layer and how its
trajectory in state-space is essentially selected upon presentation of visual patterns on the
monitor. There exists no privileged starting position – no privileged initial activation
state – since variation in monitor configuration projects activity to the temporal memory
layer in highly variable ways. Instead, however, the network learns to map the visuo-
39
spatial features to the temporal memory layer – this mapping is, necessarily, arbitrary – it
need only do so consistently in order to retrieve the appropriate biases. For conserved
lists, the network activates the temporal memory layer in the same initial state despite the
features appearing together for the first time. As the temporal memory layer iterates
through its state-space, it retrieves the biases previously established, for each state and for
each item, and the resultant priority signals drive correct selections (see Figure 2.10). For
scrambled lists, the same process occurs, though the biases that are retrieved interfere
with correct performance and must be re-learned.
Figure 2.10. Temporal memory layer retrieval of biases during list manipulations.
During the conserved list manipulation, the model must execute a sequence of item
selections despite the novel composition of the monitor: the visible items have never
appeared together prior. The network activates the temporal memory layer into its initial
state, from which both the state evolution immediately follows, and the pre-established
biases are retrieved, according to whether or not the item is currently visible. Thus, as
the temporal memory layer settles into state “ ", state “ ” and so on, the appropriate
biases are retrieved as indicated by the bold letters. Faded gray letters represent
previously-established biases, but since these items are not visible they do not interfere
with proper execution of the list items. During scrambled lists (not shown), the retrieved
biases interfere, at first, with discovery of the correct sequence.
S
1
S
2
40
Wild card lists
Wild card lists consisted of the learned list from (2), with the 3
rd
item replaced by
a novel item (e.g., ‘ABXD’, where ‘X’ is a novel item not previously encountered by the
model).
Results
The wild card list is produced at a high rate. Compared against the baseline list
from (2), wild card lists were performed at significantly higher rates (p<0.01). The
model readily achieves this since both the introduction of the novel wild card item does
not disrupt the temporal memory layer’s state evolution – in the same way that list
manipulations do not disrupt this – as well as each ‘familiar’ item is strongly associated
with a particular context state and not the ‘open’ 3
rd
list position. Thus, the model
essentially selects the novel item by default, though no inference process takes place. We
note here, however, that it is possible that such ‘exclusive inference’ processes may
underlie at least some cases where monkeys make categorical judgments (Call, 2006; Pan
et al., 2014).
Interim summary
The gross behavioral results for the list manipulations follow the pattern described
in the literature, wherein lists retaining serial order information are performed at high
rates, and lists defying such serial relations are performed poorly, at least at first. The
model can accomplish this by transitioning through temporal context states in predictable
patterns. Despite the possible violation of item-to-item associations – which then show
how chaining-based models like Chersi’s fail – context+item associations are retrievable.
The evolution of the activation pattern in the temporal memory layer can still retrieve
appropriate biases despite the novelty of the lists (see Figure 2.10). For scrambled lists,
this also explains why the model is slow to learn, as the associations are still retrieved,
but are incorrect.
4) Error rates and distributions
Terrace and colleagues showed that over the course of learning many lists, the
error patterns changed from the monkeys, reflecting more optimal choices over time and
apparently making better use of so-called informative errors: those errors that do not give
redundant information, like selecting A-B-A or A-B-C-B, but rather errors like A-C and
41
A-B-D, from which novel information can be gleaned (and is necessary to master the
task). We detail analyses of the errors made by our model below.
For one, and not surprisingly, the number of total mean errors made during 4-long
increments is greater than for 2- or 3-long increments. Additionally, the number of total
errors made during 4-long increments of simultaneous lists (those from (2) above) are
greater than 4-long increments of incrementally-shaped lists (those from (1) above)
which is also not surprising because, as we stated above, the learning involved for
simultaneous lists is much greater than for incrementally-shaped lists because for the
latter, the subset increment A-B-C is essentially well-learned before item D appears. By
other measures, too, the 4-long simultaneous lists are more difficult than 4-long
incrementally-shaped lists, including total number of trials to criterion, total number of
blocks to criterion and mean error totals (p<0.01).
More precisely, we can examine, for each error made, at what position in the list the
error was made (1
st
, 2
nd
, 3
rd
, or 4
th
position), what item was (erroneously) selected, and
what type of error that constitutes: repeat (e.g., A-A), forward (e.g., A-C), or backward
error (e.g., A-B-A). Note, however, that for each list position, the types of errors possible
vary: it is impossible to make a forward error at position 4, for example. Comparing the
ratio of forward errors made, versus total errors, forward errors comprise a greater
proportion for simultaneous lists, as compared to incrementally-shaped lists. Or instead,
for incrementally shaped lists, there is a much greater tendency to make backward errors,
selecting items from the previous increment (‘A-B-C’) instead of the new item (‘D’).
However, after correcting until after the first correct trial, this difference disappears: once
the model finds the correct solution, backwards errors comprise only ~10% of total errors
(recall that backwards errors are never informative). In this way we can see that the
model perseverates for a list when trained by the incremental shaping protocol, but when
the shaping is removed the pattern of selections change.
Looking at the errors made at positions 3 and 4 (still, for just the 4-long increment),
across the incremental and simultaneous lists, we see that making a forward error to item
‘D’ is most common, followed by item ‘C’. Both of these errors reflect reward-predictive
influences, since it is this item that is most closely (temporally) associated with reward.
Additionally, making errors at position 1 was most common, for both list types.
42
5) Simulated neurophysiological responses and learning-related weight changes
The model is composed of layers of neurons of various types that process the
visual information provided by the monitor and retrieve reward-related and context-
related information to inform decisions about the order of selections that lead to goal
satisfaction. Neurophysiological studies have characterized functional response profiles
of neurons involved in many of these information-processing steps and allow us to assess
how the information-processing units of our model correlate with these data. It is
important to point out here that the behavioral data against which we are testing our
model (the results of the SCP) did not have concurrent neurophysiological data, and so
the neurophysiological data we compare our simulated responses against are pulled from
a multitude of different, though to an extent structurally-similar, experimental paradigms.
Additionally, we emphasize multiple, concurrent learning processes that contribute to
task performance. Learning-related changes are reflected in the re-organization of values
in the weight matrices that map activation levels in one layer of neurons to neurons in
another layer.
Temporal memory layer stabilization
The temporal memory layer in our model is not an a priori rank-order module,
and so must learn to generalize across instances of trials, and lists, in such a way that any
past associations can be retrieved in future trials. This requires the model to map the
visuo-spatial features from the monitor to a subset of neurons in the temporal memory
layer – which effectively selects a trajectory in state-space. At first, the mapping is
inconsistent (Figure 2.11, top left). Over exposure to many trials wherein the spatial
configuration varies, and over many increments and lists wherein the visual features vary,
this mapping generalizes to select an arbitrary temporal memory state as the initial state
of the network. By the fourth and final list, the model consistently selects the same initial
state for the temporal memory layer, resulting in predictable state evolution of the layer
such that strong associations can be established, and existing associations can be
retrieved (Figure 2.11). This is also reflected in observing the distribution of weight
strengths in the matrix that maps the visuo-spatial features of the monitor to the neurons
of the temporal memory layer. By the completion of the protocol, the weights cleanly
map those features that obtain to the appropriate set of neurons (Figure 2.12). It is worth
43
pointing out that should spatial configuration of items on the monitor have informational
value, this would be reflected in the organization of the weights here, but since the spatial
configuration is random and unimportant there is no organization according to tile
position. Finally, this also gives insight into the notion of generalization, though only in
a limited sense. In the Discussion, we show that more is involved in this concept, and
thus our model is not yet capable of fully explaining the increase in task expertise
observe in the dataset.
Figure 2.11. Stabilization in temporal memory layer across training. We can observe
that the model learns to generalize the mapping from visuo-spatial features to the
temporal memory layer across training. For the first set of 50 trials (top left; trials are
rows, from bottom – first trial – to top – last trial of block – while the columns are
temporal memory layer neurons: red = active, blue = quiescent) the initial activation in
the temporal memory layer is sporadic, as indicated by a pseudo-random distribution of
activation over time. This suggests that as the visual pattern is randomly displayed on
the monitor, it interferes with a consistent mapping to the temporal memory layer. For
the first set of 50 trials for lists 2 and 3 (top right, bottom left) the mapping appears much
more consistent, though there is some variation, owing at this point to the novel
combinations of visual features (i.e., a new list means a new set of visual patterns). By
44
the first set of 50 trials for the final, 4
th
list, the network appears to have successfully
generalized the mapping, as the initial activation pattern in the temporal memory layer is
fully consistent. This is important for, e.g., conserved lists to be able to retrieve the
appropriate biases.
Figure 2.12. Visuo-spatial features-to-temporal memory layer weight changes over
simulation. (Top) Random weight initialization (warm colors indicate strong connection,
cool colors indicate weak connection) for the weight matrix mapping the visuo-spatial
features (x-axis) to the neurons in the temporal memory layer (y-axis). There is no initial
structure in this mapping. (Bottom) Generalized mapping from visuo-spatial features to
specific temporal memory layer neurons. There are four neurons in the temporal
memory layer that are exclusively mapped from the visuo-spatial features layer. Some
visuo-spatial features are more ‘salient’ in that their presence is strongly associated with
those neurons in the temporal memory layer (deep reds), while other features have
weaker associations (light greens and blues). This results either from some features
being more common in the simulated data set, and some less so, or some lists (hence, sets
of features) require more training than others, resulting in stronger weights. If spatial
configuration contributed to the task, we would observe spatially-selective mapping, but
that is absent as indicated by the wide distribution across grid locations (x axis).
45
Temporal maintenance
Figure 2.13. Sample activation profiles during a correct trial. The temporal memory
layer neurons encode the state of execution through binary activation levels. (Recall that
the representation here is distributed, but the ‘profile’ view here only renders a single
curve.) These states retrieve the temporal context biases that integrate with value-based
signals to drive planning layer neurons. The planning layer neurons encode the
behavioral priority for each visual pattern, while the competition layer neurons integrate
planning layer, inhibitory layer (not shown) and self-excitatory signals to compete for
expression. The instantaneous firing rates are shown for each. Finally, the accumulator
neurons are driven by corresponding competition layer neurons above threshold, and
initiate dis-inhibition in the basal ganglia gating system. The membrane potentials of
accumulator layer neurons are shown here.
The temporal memory layer in our model encodes the state of execution, and each
state can then be read to bias a particular selection. Neural populations encoding
46
temporal state information can be found in various nuclei in frontal and parietal regions,
which are reviewed in the Discussion. The temporal memory layer resembles the
recurrent memory loops in (Dominey et al., 1995), the architecture of which came to be
known as ‘reservoirs’, though we simplify our implementation of this layer by applying a
k-WTA process across the population. The activations of neurons in this layer, over
time, can be seen in 32.12, which clearly show how following efferent feedback from
item selections, the layer updates and settles into a new state, as indicated by a new
pattern of activity across its surface (though note that the distributed code across the
surface is rendered as a single curve in Figure 2.13). The biases that are retrieved from
each state influence the behavioral priority signal represented in the planning layer
neurons, also shown. These neurons inform competition layer neurons, which themselves
drive a final layer of neurons above threshold – the accumulator neurons. The activations
of these neurons participating in competitive decision-making follow response profiles
characterized in the literature. We discuss some of these data in the Discussion, but here
quickly point out that these simulated responses largely comport with activations in CQ-
type models, further reinforcing the notion of our model as a bridge between trial-and-
error learning models and CQ-inspired performance models.
Predictions
Our model can generate predictions to be evaluated by further experimentation.
For example, the stabilization of the temporal memory layer, in which neurons become
more correlated with the state of execution of the sequence over time, can be tested. For
example, it has been shown for different categories and in different brain regions (e.g.,
prefrontal cortex and parietal cortex) that spontaneous representation of abstract
categories can be extracted by macaques when not demanded by a task (Roitman,
Brannon, & Platt, 2007; Shima, Isoda, Mushiake, & Tanji, 2007). It is an open question
whether, as tasks demand it, more neurons get recruited to participate in these encodings,
or possibly as well become more precisely tuned (though we do not model tuning here;
see (Botvinick & Watanabe, 2007)). More generally, the primary claim of the model is
that the strict temporal organization must be learned, and not just item+order associations
to pre-existing representations.
47
A central claim of our model is that decision-making does not occur, at least
exclusively, in amodal networks not structured in relation to relevant environmental
features, and instead decision-related variables (contextual, motivational, etc.) are
represented together with visuo-spatial and motoric parameters (Cisek, 2006). Above, in
Figure 2.13, we displayed the neural response profiles of a subset of the neurons
comprising our behavioral priority map. Alternatively, we can visualize these responses
within a structure topographically-correlated with the touch-screen monitor to show how
these responses additionally encode parameters for motor control – as opposed to, as in
the case of amodal decision networks , needing then to retrieve these motor parameters
from other networks. Figure 2.14 shows activations of the planning layer neurons at the
time of item selections, where warm colors correlate with stronger activation and cool
colors with weak activation. Figure 2.15 collapses the 2-D structure from Figure 2.14
and adds a time dimension. We predict that neurons in posterior parietal cortex and
dorsal premotor cortex – each involved in managing reaching movements (e.g., parietal
reach region and dorsal premotor cortex) – would encode reaching parameters together
with decision-related variables in behavioral maps.
48
Figure 2.14. Priority map activation over course of trial. The priority map in our model
provides a topographic structure, correlated to the visuo-spatial organization provided
by the monitor, to the competition network that ultimately makes item selections. Areas
of warm, bright colors indicate strong activation, and show not just what item selection
was made, but what, motorically, that selection involves in actual execution.
49
Figure 2.15. Time-course of activations in behavioral priority map. The competition
network is topographically correlated with the touch-screen monitor. We see here, by
rendering the 2-D surface of the planning layer neurons as a 1-D layer (y-axis), how the
time-course (x-axis) of activation varies spatially and temporally, and how the strength of
activation (warm colors = more activation, cool colors = less activation) correlates with
time of selection. Note: this visualization applies a Gaussian filter across the map for
ease of viewing.
2.4 DISCUSSION
We have demonstrated how our model can master a complex list learning task by
leveraging multiple, concurrent learning processes while other models appear insufficient
in explaining the range of behavioral results obtained. We additionally have shown how
decision-making networks based on competitive interactions between action plans may
be situated in topographically-structured maps to facilitate fluid interaction with one’s
immediate environment. Of course, our model is necessarily simplified in its
50
implementation, but provides nonetheless a framework for modeling more complex
sequential behaviors and the learning that support those behaviors not buttressed by, for
example, S-R associations and explicit environmental feedback following one’s actions.
We detail future work to build upon this base model, and review the modeling and
experimental literature that may inform how to proceed.
Learning temporal structure
One of the primary results of the SCP work was the demonstration, behaviorally,
that the tested macaques were sensitive to the serial structure of the task, and did not – at
least exclusively – rely on mere chaining of list elements to learn and represent the lists.
Our model reproduced these patterns in the reported list manipulations, and in the
neurophysiological response patterns of the simulated neurons. Importantly, the model
does not hypothesize a dedicated ‘rank-ordering’ module, but instead comes to learn this
structure. Still, this does not preclude mechanisms dedicated to numeric and/or serial
structure in the primate brain.
Briefly, single-unit recording and muscimol inactivation studies have shown that
at least several nuclei in each of parietal and frontal lobes of macaque contain neurons
whose response properties are modulated by measures of ‘numerosity’, or “the empirical
property of cardinality of sets of objects or events”, or ‘rank’, or “the empirical property
of serial order” (Nieder & Dehaene, 2009). For example, neurons in Intra-Parietal Sulcus
(IPS) – possibly centered around Ventral Intra-Parietal area (VIP) – have been shown to
maximally respond to particular numerosities presented visually as dots, either
simultaneously presented, or presented sequentially over time (Nieder, Diester, &
Tudusciuc, 2006; Nieder & Miller, 2004). Interestingly, it was found that separable
populations – with minimal overlap – exclusively processed either the sequential or
simultaneous dot presentations, with a third, convergent population coding the extracted
cardinality irrespective of the presentation format. Additionally, structures in frontal
regions have shown response properties modulated by rank and numerosity, including
Lateral Prefrontal Cortex (LPFC) (Barone & Joseph, 1989; Mushiake, Saito, Sakamoto,
Itoyama, & Tanji, 2006; Nieder & Merten, 2007; Saga, Iba, Tanji, & Hoshi, 2011;
Tudusciuc & Nieder, 2009), and regions within the Supplementary Motor Complex
(SMC): Supplementary Motor Area (SMA), pre-SMA, and Supplementary Eye Field
51
(SEF) (Berdyyeva & Olson, 2009, 2010, 2011; Clower & Alexander, 1998; Isoda &
Tanji, 2003). Importantly, (Berdyyeva & Olson, 2010) found that in all recorded regions
(dorsal-lateral PFC, SEF, pre-SMA, SMA), neurons selective for particular ranks – across
both serial action tasks and serial object tasks, were isolated. That is, beyond managing
sequencing of motor actions, these neurons also managed sequencing of visual patterns.
In their case, the cued actions were saccades (and for a review of rank and numerosity
representations in monkey neurophysiology and human brain imaging, see (Nieder &
Dehaene, 2009)).
Most intriguingly for our purposes are the results of Sawamura & colleagues (see
Figure 2.16) who showed that neurons in area 5 of Posterior Parietal Cortex (PPC)
responded preferentially to the number of iterations the monkey performed of a particular
action, coding the serial position of each action within the on-going sequence (Sawamura
et al., 2002, 2010). In the task, monkeys were required to either (i) twist, or (ii) pull a
lever a total of five times in a sequence in order to receive a reward. This finding of
iteration tracking – regardless of the action, (i) or (ii), and without explicit corresponding
changes in the environment, as in the SCP – was further supported when muscimol
injection of the recorded region of area 5 resulted in significantly decreased performance
levels for the task, though motor skill remained, presumably due to an inability to
maintain a representation of the current iteration, as evidenced by error distributions
(Sawamura et al., 2010). Interestingly, (Lu & Ashe, 2005) demonstrate similar findings,
in single-unit responses and in reversible muscimol inactivation, in M1 (see also:
Carpenter, Georgopoulos, & Pellizzer, 1999), suggesting a network of distributed brain
regions, coordinating rank and numerosity computations and linking these to behavioral
output, operate across at least parietal and frontal lobes, and includes otherwise ‘primary’
motor regions.
52
Figure 2.16. Rank-order sensitivity in area 5 during sequencing of manual actions.
Different neurons from area 5 of macaque cortex, in columns A-E, display sensitivity to
serial position, rows 1-5, within a sequence of manual actions of the same type. For a
given neuron here, peak response is seen exclusively during the i-th movement of a
sequence. The set of neurons A-E here fully encode the state of execution of the sequence
of motor movements. As discussed in the text, following local muscimol injection,
performance levels for the task were significantly worse, though recovered as the
inactivation effects diminished. [Adapted from (Sawamura, Shima, & Tanji, 2010)].
Given the above neurophyiosological data, it is reasonable to hypothesize that
similar mechanisms may coordinate the selection of list items in the SCP task. As we
suggested above, the monkey must maintain a robust, evolving representation of the
temporal state of its performance in the task. (As (Barone & Joseph, 1989) noted: “If the
environment does not provide cues regarding the state of the sequence, the animal has to
perform the task exclusively on the basis of cognitive (i.e. of memorized spatial and
temporal) information.”) Indeed, as we detailed above, several models have incorporated
rank representations explicitly into their machinery, overriding other cues that could, a
priori, otherwise be salient to the animal, such as feature-based or spatial-relation-based
cues.
monkeys were required to serially execute the same movement. After
performing each block of five trials, the monkeys had to choose to
performanalternativemovementwithoutexternalinstructionsandrepeat
that movement during the next block of five trials. To prevent the
monkeys from selecting the movement based on the lengthofthewaiting
period,thewaitingperiodvariedunpredictablyfrom1.4to7.5s.Asaresult,
the total time for one block of trials varied in the range of 20–46 s.
Premature selection of the alternative movement before completing
ablockoffivetrialswasdefinedasa“premature-shifterror”(Fig.1B)
and resulted in an auditory error signal. Thus the monkeys had to
select and perform the correct movement until the block of five trials
was completed correctly. “Late-shift errors” occurred when the mon-
keys failed to switch to the alternative movement and repeated the
same movement six times. After these errors, the monkeys were not
rewarded following the last trial, which prompted the subjects to
performthealternativemovementinthenexttrial.“Limit-overerrors”
occurred when the subject did not perform the required movement
within 4 s of the movement-trigger signal, at which point the trial
ended and the next trial started; the monkeys continued performing
the correct movement until a block of five trials was completed. The
successratewascalculatedastheratioofthenumberofcorrectblocks
divided by the number of total blocks.
A
0.5 s
1st
2nd
3rd
4th
5th
Neuronal response
: Trial start : Movement trigger
30
imp/s
40
imp/s
30
imp/s
20
imp/s
20
imp/s
B C D E
FIG. 2. Examples of numerosity-selective activity from cells recorded in area 5. The cells displayed in columns A to E exhibited selectivity to the 1st, 2nd,
3rd, 4th, and 5th trials, respectively, during a correct block of either the PUSH (A and D) or TURN (B, C, and E) movements. Cellular activity is displayed as
spike density histograms constructed from the data obtained in 10 trials, using 50-ms bins. The activity histograms are significantly tuned to a particular
numerosity of a trial (black). Activity was aligned with the onset (triangles at left) or with the end (arrows at right) of the waiting period, depicted along the
abscissa. Each unit along the ordinate represents 20, 30, or 40 spikes/s as indicated. Scale bar: 500 ms.
Time course of trials Time course of trials
Start Wait Go
Movement
14 75
Reward
Movement
1.4 7.5 s
Ehtil ITI LT Each trial I T I L L L L L T T T T T Low Tone
Ti Time
Trial blocks 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 Trial blocks
(20 46 s)
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
(20 46 s)
TURN PUSH TURN TURN PUSH TURN
B Definition of correct & error trial blocks
Mthift m1 m2 m3 m4 m5 Movement shift m1 m2 m3 m4 m5
Correct
w5 w4 w3 w2 w1 w1 w5 w4 w3 w2 w1 w1
Movement shift m1 m2 m3 m4 m5
Premature-
Movement shift m1 m2 m3 m4 m5
Premature
shift
w4 w3 w2 w1 w5 w5 w1 w4 w3 w2 w1
Error signal
w5 w5 w1
Movement shift m1 m2 m3 m4 m5 m6
Late-shift
5 4 3 2 1 1 1 w5 w4 w3 w2 w1 w1 w1
A
FIG. 1. Explanation of behavioral task and the sequence of
events in correct and error blocks. A: a diagram showing
behavioral events constituting each trial (top) and the sequence
ofoccurrencesoftrialsrequiringeitherTURNorPUSHineach
block of 5 trials (bottom). B: the definition of a correct trial
block and each of error blocks as depicted in the sequences of
successivetrials.Monkeyswererewardedaftertheexecutionof
each correct movement (m1–m5) following each of the waiting
periods (w1–w5). If the monkeys made a premature movement
shiftinresponsetothe5thwaitsignal(w5)orearlier,theywere
notrewarded.Inthatcase,anerrorsignalwasgiven.Noreward
was given after the 6th movement (m6) in the late-shift block,
which prompted the animals to shift to an alternative move-
ment.
903 NUMERICAL-PROCESSING DEFICITS AFTER AREA 5 INACTIVATION
J Neurophysiol • VOL 104 • AUGUST 2010 • www.jn.org
on November 17, 2011 jn.physiology.org Downloaded from
53
In our model, stressing the trial-and-error learning necessary for the development
of task expertise, we take a different approach. Firstly, our temporal memory system is
not an a priori ordinal representation module. While it is a deterministic system and
provides sufficient cues for list learning, it contains both no necessary ‘initial state’ that
must be discovered through learning, nor any pre-established biases mapping states to
actions. Instead, both must be learned in time, and in a sense this pattern of learning –
and the activations that causally influence the decision processes – is arbitrary, since each
simulation can, and does, result in different activation patterns within the context layer
that nonetheless result in highly competent performance. Thus, the expert animal
presumably has access to these representations and has appropriate associative biases, but
we instead suppose a naïve animal may not spontaneously understand the task (though
see (Hauser, Carey, & Hauser, 2000; Hauser, MacNeilage, & Ware, 1996) for field
research with monkeys suggesting some numerical competency outside of the laboratory,
though of perhaps a limited and diminished degree.) In the single-unit recording studies
cited above, too, the monkeys undergo considerable training – upwards of many months
– prior to any neural recording sessions, and so have time for systems to adapt themselves
to task conditions. (Though note (Roitman et al., 2007) show LIP responds to numerosity
in visual receptive fields, without task demands; and see as well (Shima et al., 2007)
discussed below.) Thus, for our model, the monkey must slowly learn that leveraging
higher-order contextual representations contribute to reward, while other equally
probable sensory variables may not.
Reward-modulated and integrative responses
Our model contrasts temporal context signals with reward-modulated value
signals – each essential and necessary components of most decision-making. For
instance, multiple brain regions, including LPFC (Kennerley & Wallis, 2009a, 2009b;
Pan, Sawa, Tsuda, Tsukada, & Sakagami, 2008; Watanabe & Sakagami, 2007),
OrbitoFrontal Cortex (OFC) (Wallis & Miller, 2003; Wallis, 2007), SMC (Campos,
Breznen, Bernheim, & Andersen, 2005), Premotor Cortex (PM) (Roesch & Olson, 2003),
Frontal Eye Fields (FEF) (Ding & Hikosaka, 2006; Roesch & Olson, 2003), Striatum
(Cromwell & Schultz, 2003; Ding & Hikosaka, 2006), and LIP (Bendiksby & Platt, 2006;
Gottlieb, 2007; Platt & Glimcher, 1999) are known to be modulated by expectations
54
related to explicit rewards. In fact, these modulations are affected by many aspects of
‘reward’ including size (Pan et al., 2008; Platt & Glimcher, 1999), probability
(Kennerley, Dahmubed, Lara, & Wallis, 2009; Platt & Glimcher, 1999), subjective
desirability (Dorris & Glimcher, 2004), and time-discounted value (Roesch & Olson,
2005), even to the point of disrupting optimal task performance (Peck, Jangraw, Suzuki,
Efem, & Gottlieb, 2009). These reward-modulated responses can be contrasted with
neurons representing cognitive variables like ‘rank’ and ‘numerosity’ – reviewed above –
that are not greatly modulated by explicit reward-based / value-based considerations, and
so are viewed as distinct (Berdyyeva & Olson, 2009, 2011). Thus, it appears that in the
service of complex, extended tasks, representations of reward expectation and
representations specific to the structure of the task (temporal, spatial, etc.) can be
maintained in separable populations of neurons, only to be eventually combined to affect
response selection, as demonstrated by Watanabe and colleagues (Watanabe & Sakagami,
2007; Watanabe, 2007).
This more-or-less fits the paradigm from ACQ that affordance-based signals
identify the executability of different actions and then combine with reward-based signals
that identify the desirability of those actions. The resultant integrated signal is the
behavioral priority from which decision-making is based. We merely extend the notion
of affordance-based executability to encompass not just externally-driven representations
but also internally-driven and maintained signals like temporal context, discussed above,
which then also combines with reward-related information to inform decision-making.
Behavioral Priority Maps
We hypothesized that spatially tuned maps, integrating signals from perceptual,
cognitive and motivational structures, act as behavioral priority maps that serve to enact
decision processes for goal-directed behavior. PPC, in particular LIP, has been suggested
to participate in this function (Bisley & Goldberg, 2010; Gottlieb, 2007) as it contains
multi-modal, spatially-tuned neurons (Andersen, Snyder, Bradley, & Xing, 1997;
Andersen, 1997) modulated by bottom-up saliency (Arcizet, Mirpour, & Bisley, 2011;
Colby, Duhamel, & Goldberg, 1996) intentional signals (Snyder, Batista, & Andersen,
1997), reward considerations (Bendiksby & Platt, 2006; Platt & Glimcher, 1999), quality
of decision variables (Rorie, Gao, McClelland, & Newsome, 2010; Shadlen & Newsome,
55
2001), attention (Bisley & Goldberg, 2003; Colby & Goldberg, 1999), saccadic planning
(Ipata, Gee, Goldberg, & Bisley, 2006; Snyder et al., 1997), categorical judgments
(Freedman & Assad, 2006), numerosity (Roitman et al., 2007), task states (Campos,
Breznen, & Andersen, 2010), top-down inhibition (Ipata, Gee, Gottlieb, Bisley, &
Goldberg, 2006) (Ipata, Gee, Gottlieb, Bisley, & Goldberg 2006) and even social gaze
cues (Shepherd, Klein, Deaner, & Platt, 2009). We hypothesize that posterior parietal
neurons – including LIP and Parietal Reach Region (PRR) – in concert with frontal
populations – dorsal Premotor (dPM), FEF, and SEF (see below) (Cisek & Kalaska,
2010; Cisek, 2007; Medendorp, Buchholz, Van Der Werf, & Leoné, 2011) – function as
behavioral priority maps, structuring information about behaviorally-relevant
environmental (perceptual) variables with cognitive, motivational and, crucially, motoric
information unique to the animal. These representations in motor-compatible spatial
coordinates serve to fluidly control and adapt the animal to behaving in a complex, time-
varying, spatially-distributed environment (see Figure 2.17). In our model, the
behavioral priority maps serve to integrate cognitive and motivational information and
represent the combined behavioral priority of making particular actions to particular
targets in space – namely, the touch-screen monitor. These representations interact with
explicitly motor regions that can execute the given target-directed action that is selected
through a competitive decision process – mediated perhaps by basal ganglia loops
(Bhutani et al., 2013). In this way, our implementation of behavioral priority maps should
not be identified as a model of PPC, but an abstraction of various posterior parietal and
pre- and supplementary-motor regions, each possibly contributing unique functions,
while none likely ‘do it all’ (Balan & Gottlieb, 2009).
56
Figure 2.17. Affordance Competition Hypothesis. Overall, we take a position similar to
Cisek and others of multiple, competing plans distributed across large cortical areas
representing features of the environment, expectations of reward and structure in the
world, and possible action or action sets that can achieve goals (motivated states). Note
that distributed regions of the brain interact in serial, parallel and recurrent ‘looped’
pathways. [Adapted from (Cisek, 2007).]
Varieties of learning
Our model leverages multiple, concurrent learning processes to master the
Simultaneous Chaining Paradigm, though it is worth pointing out that multiple other
learning processes are neglected in this iteration of the model. The monkeys tested by
Terrace and colleagues mastered numerous lists, and in the course of learning were
exposed to upwards of a hundred or more visual patterns. So far as we can tell, no real
attempt was made to study the perceptual discrimination abilities of the monkeys and
how visual interference possibly contributed to error rates. Additionally, (Terrace, Son,
& Brannon, 2003) reported an increase in task expertise over the course of many trials, as
NE33CH13-Cisek ARI 19March2010 20:9
Selection
rules
Object
identity
Attention
Behavior
biasing
Potential actions
Motor
command
Visual feedback
Predicted
feedback
Specifi cation
Selection
Dorsal stream
Ventral stream
Premotor cortex
Parietal cortex
Temporal
cortex
Basal
ganglia
Prefrontal
cortex
Behavioral
relevance
Payoff
Cerebellum
Figure 1
Sketchoftheaffordancecompetitionhypothesisinthecontextofvisually-guidedmovement.Theprimate
brainisshown,emphasizingthecerebralcortex,cerebellum,andbasalganglia.Darkbluearrowsrepresent
processesofactionspecification,whichbegininthevisualcortexandproceedrightwardacrosstheparietal
lobe,andwhichtransformvisualinformationintorepresentationsofpotentialactions.Polygonsrepresent
threeneuralpopulationsalongthisroute.Eachpopulationisdepictedasamapwherethelightestregions
correspondtopeaksoftunedactivity,whichcompeteforfurtherprocessing.Thiscompetitionisbiasedby
inputfromthebasalgangliaandprefrontalcorticalregionsthatcollectinformationforactionselection(red
double-line arrows).Thesebiasesmodulatethecompetitioninseveralloci,andbecauseofreciprocal
connectivity,theirinfluencesarereflectedoveralargeportionofthecerebralcortex.Thefinalselected
actionisreleasedintoexecutionandcausesovertfeedbackthroughtheenvironment(dotted blue arrow)as
wellasinternalpredictivefeedbackthroughthecerebellum.ModifiedwithpermissionfromCisek(2007).
Ifacompetitionbetweenrepresentationsof
potential actions exists in frontoparietal cir-
cuits, then intelligent behavior requires a way
toinfluencethatcompetitionbyfactorsrelated
to rewards, costs, risks, or any variable per-
tinent to making good choices. A variety of
brain systems can contribute their votes into
thisselectionprocesssimplybybiasingactivity
within the ongoing frontoparietal competition
(Figure 1,reddouble-linearrows).Thisin-
cludes influences from subcortical structures
suchasthebasalganglia(Mink1996,Redgrave
et al. 1999, Schultz 2004) and cortical regions
suchastheprefrontalcortex(Miller2000,Sak-
agami&Pan2007,Tanji&Hoshi2001,Wise
2008). In turn, the prefrontal areas receive in-
formationpertinenttoactionselectionthatin-
clude object identity from the temporal lobe
(Pasupathy&Connor2002,Tanakaetal.1991)
andsubjectivevaluefromtheorbitofrontalcor-
tex (Padoa-Schioppa & Assad 2008, Schultz
et al. 2000, Wallis 2007). In summary, the
hypothesis is that interaction with the envi-
ronment involves continuous and simultane-
ous processes of sensorimotor control and ac-
tionselectionfromamongthedistributedrep-
resentations of a limited number of response
options. This perspective is consistent with a
278 Cisek
·
Kalaska
57
indicated by a reduction in un-informative errors and in the number of total trials required
to reach criterion levels. We have shown partial results that support generalization, but
note that a full accounting of the development of task expertise is lacking in our model.
Our generalization involves learning to span the space of visual features, applying the
learning encountered across a subset of features to the full set – thus, later lists become
less difficult to master. However, more sophisticated ‘meta-learning’ is also possible, as
in the models of (Doya, 2002; Schweighofer & Doya, 2003). There, the reinforcement
learning formalisms provide a set of parameters – learning rates, time-discounting, etc. –
that can be controlled through learning, and which provide a more adaptive learning
framework. Higher-order processing mechanisms are also possible to model. For
example, a simple mechanism can account for the high performance on wild card lists:
because previously encountered items have been uniquely associated with a particular list
position – at the expense of other possible list positions – the novel item’s baseline
association with the vacant position is enough for selection to proceed correctly (on
average). However, it is known that monkeys are capable of inference processes, from
transitive inference (Jensen, Altschul, Danly, & Terrace, 2013; Merritt & Terrace, 2011)
to exclusive inference (Pan et al., 2014), which is the relevant domain here. We do not
model any higher-order processing in this respect
The shaping procedure undergone by the monkeys – both the incremental shaping
(starting with 1- 2- or 3-item long increments to start) and the implicit shaping (ending a
trial immediately following an error – assisted in the acquisition of lists and affected the
strategies used to master the lists. (Krueger & Dayan, 2009) modeled shaping effects on
the acquisition of a complex task by a neural network, showing how learning of a smaller
dimensional space of the task contributed to the later learning of the whole task. This
result is important to keep in mind when analyzing data such as the SCP. The authors in
fact stated that a major confound in human serial learning tasks – apart from the obvious
verbal skill possessed by those subjects – was that subjects often had encountered
previously “… thousands of lists [without which] they may have lacked essential skills
for performing the task…” (Swartz et al., 1991). This is further supported by studies of
chimpanzee working memory and visual discrimination tasks that have shown impressive
serial learning (Inoue & Matsuzawa, 2007): when humans are provided substantial
58
training on the task, performance rates approach that of the chimps (Cook & Wilson,
2010; Silberberg & Kearns, 2009).
Lastly, learning at the level of an individual list, holistically, is not explicitly
treated in our model (though see Supplementary material for Chapter 2). (Chersi et al.,
2011) employed neuronal chains to model sequential performance in a task. Their model
did not employ learning, but implied associative links between action elements, which
was explicitly refuted here, at least as a sufficient explanation of the results. It is likely
that item-item associations are made during the course of learning, especially as the
incremental shaping protocol emphasizes the mere appending of a new item to an
existing structure. Our solution was to model temporal context states to which
associative biases can be made, but this does not preclude parallel learning of item-item
associations. There are suggestions (Terrace, 2001) that chunking processes may be
employed by the monkeys, in which case chaining may be a plausible model to explain
these effects. Alternatively, (Botvinick et al., 2009) modeled higher-order
representations (‘options’) in a reinforcement learning framework. Instead of item-item
associations, consolidated sequences of actions were learned in relation to these
‘options’. (See the Supplementary Material section for preliminary results with chunking
for our model.)
59
Chapter 3: Modeling observational learning
Social learning and cognition has long been recognized as serving as strong
evolutionary pressures for primates (Byrne & Whiten, 1988), and it has been suggested
that the pressure to acquire manual skills through imitation, among other skills, has
served to ignite significant brain evolution in hominids, even to the point of facilitating
the development of the ‘language-ready brain’ (Arbib, 2012). Behaviorally, different
species of primates display a significant difference in ‘imitative’ skills, at least as
observed in controlled, experimental settings. For example, (Dean et al., 2012) have
shown that monkeys, apes and humans solve ‘puzzle box’ tasks differently, with only
human children showing both pedagogy and the ability to directly imitate the successful
solutions of ‘demonstrators’. In these tasks, rewards of increasing ‘desirability’ depend
on more complex manipulations of the box, with monkeys and apes limited, most often,
to solving the first ‘tier’ of complexity, while human children are capable of acquiring the
most desirable reward following a demonstrator’s performance. These results largely
comport with data from (Horner & Whiten, 2005) comparing similar tasks between just
apes and human children. However, this is not to say that monkeys are incapable of
‘imitation’, as naturalistic observations have shown palm nut cracking skill is maintained
within a particular population due to some manner of social learning (Fragaszy et al.,
2013). It is noted, however, that going from ‘naïve’ to skilled takes upwards of many
years, and substantial trial-and-error learning must contribute to the gradual acquisition of
these skills. Similarly, ape nettle processing, while predicated on observation of skilled
adults, requires years of failure before juveniles can master the appropriate procedure for
folding the leaves to avoid the harsh stingers (Byrne & Russon, 1998; Byrne, 2003) –
though see also (Tennie, Hedwig, Call, & Tomasello, 2008) for a critique. Thus, the
picture of primate social learning is more complex than either ‘non-human primates
cannot imitate’ – they appear capable, and they appear capable of learning through other
‘social learning means’ (e.g., emulation, facilitation, etc.) – nor ‘non-human primates
learn skills directly from observation’ – they may, though substantial trial-and-error and
other learning processes are likely involved.
The discovery of mirror neurons (di Pellegrino, Fadiga, Fogassi, Gallese, &
Rizzolatti, 1992; Gallese, Fadiga, Fogassi, & Rizzolatti, 1996) reinvigorated interest in
60
probing the neural mechanisms coordinating social learning and social cognition in
primates, and neuro-computational models have been offered to explain the formation of
these functional response types and their role in learning about one’s own actions, as well
as possible roles in re-mapping others’ performances onto one’s own motor repertoire.
Moreover, the models clarify the dependence of mirror neurons on interaction with neural
systems “beyond the mirror.” For example, the Mirror Neuron System, or MNS (Oztop
& Arbib, 2002), and MNS2 (Bonaiuto et al., 2007) models offered a scenario whereby
the parity of response could be learned; namely, through associative learning between
efferent copy of one’s own action, and the corresponding visual feedback of that action
(or, in the case of MNS2, the action and the additional corresponding acoustic feedback
should there be any). Thus, in the absence of acting oneself, visual (or acoustic) stimuli
from another’s actions may still drive the mirror neurons and facilitate action recognition.
The ACQ model subsumed the mirror neuron system while offering a functional
role for the module in learning new sequences of action: instead of being driven by the
observation of others’ actions, the mirror neuron system in the ACQ model responded,
not to what the intended action was, but to what it visually appeared to be (Bonaiuto &
Arbib, 2010). In this way, the mirror system was claimed to participate in learning novel
actions or novel sequences of actions based on analysis of what one’s actions appeared to
be, which was shown to facilitate performance in acquiring new sequences of motor
actions. (Chersi et al., 2011)’s model seeks to explain the dataset from (Fogassi et al.,
2005), whereby mirror neuron activities in inferior parietal lobule (IPL) appear ‘gated’ by
task goals, as indicated in the experiment by the presence or absence of some stimuli. The
model separates different pools of neurons within ‘chains’ for each of the two possible
conditions, and so upon observational conditions, the corresponding populations are
residually activated – as in the data – specific to the indicated goal. Thus, the mirror
neuron activities in IPL are ‘sequence-specific’. (Sauser & Billard, 2006)’s dual-route
model of imitation – in humans – suggests mirror neuron based mechanisms map others’
performances onto one’s own motor repertoire, thus having a ‘resonant’ activation of the
observed action, and so presumably the capacity to imitate. (Erlhagen et al., 2006)
similarly implemented a model to explain the role of mirror neuron (F5) populations and
PFC populations in imitation-versus-emulation behavior in robots.
61
Monkey neurophysiology, of course, goes beyond examinations of classic mirror
neuron type responses, and many networks of interest to physiologists are also important
in understanding and modeling social learning and cognition. For example, other tasks
have found neurons in lateral intra-parietal regions (LIP) that appear to be mirror neurons
for visual attention (Shepherd et al., 2009), while F5 mirror neurons have been found
sensitive to the reward potential of observed actions (Caggiano et al., 2012).
Additionally, the responses described by (Fogassi et al., 2005) above appeared more
complex than the classic responses specific to action: the mirror responses they described
were ‘sequence-specific’.
Furthermore, the task designs from many of the above were quite limited, in that
the observing animal did not have to differentially respond as a function of the
observation. Instead, the animals were ‘passively’ observing, and thus the recordings
were unable to say anything about how that information was used. In a departure from
these non-interactive designs, (Yoshida, Saito, Iriki, & Isoda, 2011) revealed 'other-
responsive' neurons in medial prefrontal populations, apparently important for managing
turn-taking during the task, and neurons encoding the errors of others (Yoshida, Saito,
Iriki, & Isoda, 2012). Other (medial) frontal structures have been implicated in the
monitoring of others' behavior, with anterior cingulate cortex (ACC) and orbito-frontal
cortex (OFC) neurons observed to be sensitive to the rewards accrued by others (Chang,
Gariépy, & Platt, 2013). Orbito-frontal (Azzi, Sirigu, & Duhamel, 2012), prefrontal
(Fujii, Hihara, Nagasaka, & Iriki, 2008) and striatal (Santos, Nagasaka, Fujii, &
Nakahara, 2011) populations have also been found modulated by social context. In all of
these experiments, we see that much more is involved in observing and/or interacting
with others than activating populations of mirror neurons (or that mirror populations
possibly encode more than just an observed action).
Thus, there are many open questions regarding the role of mirror neurons in
action observation, action understanding, and/or social learning, and computational
models may prove useful in exploring these open questions.
1) Few studies have explored what downstream targets mirror neurons project to –
that is, in what ways do the observed responses, either experimentally (as in (Gallese et
al., 1996)) or in model simulations (as in (Bonaiuto et al., 2007)), influence one's future
62
behavior? Elsewhere, we have suggested one possible answer, that action recognition
systems may modulate one's goal state to adapt one’s behavior as a function of what one
observes or recognizes from another (Arbib, Ganesh, & Gasser, 2014).
2) Few models offer insight into (possible) comparative differences in primates,
which should be important considering the apparent differences behaviorally observed
(Dean et al., 2012; Horner & Whiten, 2005). For example, the model of (Lopes, Melo,
Kenward, & Santos-Victor, 2009) (see Figure 3.1) shows how imitative versus emulative
learning – reproducing the means over the ends, or vice versa, respectively – can be
controlled by a parameter, which weighs 'means' and 'ends' information on a spectrum, so
that more or less imitative behavior can be achieved. Such a model, which is removed
from neural detail, can reproduce different observational learning tendencies, but cannot
address the differences in neural machinery that would correlate with these different
tendencies. (For example, consider (Hecht et al., 2013)’s data on differences in
macaques, chimpanzees and humans as revealed by diffusion tensor imaging.) Claims of
imitative behavior in the wild in apes, for example, suggest a role for long-term trial-and-
error learning, with social learning seen as 'parsing' the behavior of an adult and guiding
one's own attempts at reproducing the sequence of demonstrated behaviors, but with
learning occurring incrementally in time to both structure the performance, and learn new
motor skills (Byrne, 2003).
3) Learners need to pay attention to the effects of others' performances, lest they
imitate obviously harmful behaviors. Thus, social learning involves more than just
decomposing others' actions into one's own motor repertoire, but processing relevant
context and the feedback and/or reward implications of the actions (that is, perhaps that
what was observed was 'bad').
63
Figure 3.1. Modeling a ‘spectrum’ of social learning. The model of Lopes et al.
demonstrates differing social learning strategies, with a continuum from imitative /
‘means’ to emulative / ‘ends’. We seek a model with more neural detail, and with a role
for reward processing mechanisms. [Adapted from (Lopes et al., 2009).]
3.1 THE SIMULTANEOUS CHAINING PARADIGM
The Simultaneous Chaining Paradigm (SCP) (Terrace, 2005) violate the stimulus-
response (S-R) mappings implicit in many experiments probing serial behavior in
animals (see Figure 3.2). Terrace and colleagues probed monkeys’ skill at list-learning
and the extent to which they (i) can learn the serial structure of individual lists, (ii) can
gradually become more competent at the task, (iii) are sensitive to the serial structure of
learned lists, and (iv) are sensitive to observation of others’ actions (see companion
article). Items (photographs) intended to be learned as a list are presented simultaneously
on a touch-screen monitor for the duration of a trial, and a pre-specified order of selecting
the items must be discovered through trial-and-error (e.g., it is not given by their order of
presentation; compare with, for example, (Barone & Joseph, 1989), the basis for the
Non-social behaviour
Social behaviour
Follow baseline
preferences
λ
B
= 1
λ
E
= 1 λ
I
= 1
Mixed
imitative
behaviour
Imitation
Adhere to inferred
“intention”, replicate
observed actions and
effect
Emulation
Replicate observed
effect
64
model of (Dominey et al., 1995).) The locations of the items on the touch-screen monitor
are randomized across trials, and thus do not contribute to learning the ordering, making
the task quite difficult. During training, the animal is incrementally shaped on the list,
first mastering smaller sub-sequences of the list, before more items are presented on the
screen, until the full list is learned – giving implicit order information, at least at this
stage of learning. Once performance criteria are met for the current iteration – typically,
consecutive blocks of trials below a certain error rate, which may vary between
experiments – items are appended to the end of the list, such that lists are incrementally
presented – and learned – as follows:
List 1, Increment 1: A
List 1, Increment 2: A-B
List 1, Increment 3: A-B-C
List 1, Increment 4: A-B-C-D
Still, the data show full, 4-item-long lists can be learned in the absence of this
form of incremental shaping by monkeys that have already learned a number of such
incrementally-shaped lists (Swartz et al., 2000), in which case all 4 items of a new list are
shown simultaneously (essentially, “List X, Increment 4” as denoted above; other data
suggest naïve monkeys can actually learn 3-item-long lists – “List X, Increment 3” –
from the beginning without any prior incremental shaping (Terrace et al., 2003), but it
has not been demonstrated, so far as we are aware, that naïve animals can learn 4-item-
long lists without at least some incremental shaping.) Following selection of each correct
item – other than the last – the only feedback the animal receives is a briefly highlighted
outline of the item selected, while incorrect selections are immediately followed by a
‘time-out’ (Swartz et al., 1991) during which an overhead light goes out, eventually
followed by a new arrangement of the current set of items (e.g., new trial). Food delivery
(reward) follows successful completion of the current full increment. Repeat selections of
an item are tolerated, but are not ‘correct’. The configuration on the monitor does not
change during a single trial, and so differential feedback offers little to facilitate learning.
65
Figure 3.2. The Simultaneous Chaining Paradigm (SCP). Subjects must learn to press
items on the screen in a pre-specified order. Each item can appear in nine possible
positions on the screen, but these positions vary randomly across trials, thus removing
sensorimotor associations that might facilitate learning. During training, any error
immediately halts the trial, while successful selections are followed by a brief signal
confirming that the selection has been registered. In the above, each column is a new
trial for the same underlying sequence of patterns, with the first row showing the
appearance of the touch-screen monitor on a specific trial, while the bottom row shows
the proper sequence of motor responses the monkey must make. The lettering is included
for the reader, not for the monkey. For the current model, we only consider lists up to 4
items long. [Adapted from (Terrace, 2005).]
Having introduced the experimental paradigm the monkeys operated within, we
will now report the major behavioral findings as related to the observational learning
effects.
66
Behavioral Data
For the substantial background data important for contextualizing the following,
please see the Behavioral Data section from the previous chapter.
Observational Learning Effects
The protocol for the observation conditions is the same as in prior conditions,
with the addition that two monkeys, in two neighboring chambers, are involved. One
monkey, having had prior experience on the tested list -- that is, having reached criterion
performance levels on the list (~75% accuracy) -- is designated the 'teacher' and will
perform the list for one block. The other monkey, having had considerable prior
experience with the task, but not the current list, is designated the 'observer': the observer
can see through his/her chamber to the other monkey and to the other monkey's touch-
screen monitor (and to the slot where rewards are delivered), and so can observe all of the
teacher's performance and feedback. The teacher performs each trial and is rewarded or
punished as in the basic protocol. Following a block of 50 trials for the teacher, the
observer's touch-screen monitor is activated and is given an opportunity to master the list.
Control conditions involve a 'social facilitation' condition whereby the teacher is present
for the observer's block of trials, but the observer is given a different list than the teacher;
and a 'computer feedback' condition, whereby the tested list is performed by a computer
in the neighboring chamber -- selections being indicated by the brief highlight of each
item.
In the observational learning condition, the results suggest a modest but
significant increase in acquisition rate for the observation condition compared to baseline
rates, supporting a claim of 'cognitive imitation' (Subiaul, Cantlon, Holloway, & Terrace,
2004) (and see Figure 3.3). In particular, the relevant measure was total trials until first
correct performance, for each tested list. The authors argue this is the most sensitive
measure, as learning and performance rates after the first correct trial would 'mix' trial-
and-error processes with social learning processes. (Note, however, that 'implicit' trial-
and-error information still exists in this measure, in that 'wrong' selections are negatively
reinforced, and non-terminal correct selections receive null feedback – possibly implicitly
reinforcing that selection.) The data showed that, on average, the observation condition
facilitated a lower number of ‘responses until first correct trial’ (see Figure 3.3). It should
67
be noted, however, that there existed substantial variation between the two tested
monkeys, in that the overall effects for each appeared driven by a greater sensitivity to
only a subset of the actual items / serial positions. That is, one monkey appeared
exclusively facilitated by only selections 1 and 4 – fine-grained error rate analyses
showed these positions were exclusively facilitated, while positions 2 and 3 showed no
facilitation above baseline error-rates – and the other monkey was facilitated only by
selections 2 and 3. Thus, each displayed an idiosyncratic pattern of facilitation, though
importantly both were still significantly quicker to reach first correct trial than in the
baseline condition. The idiosyncratic pattern was inexplicable by the authors ("At this
stage of our research, we attach no importance to our subjects idiosyncratic selection of
the items while observing an expert execute the list.") – and no eye-tracking to measure
attention, for example, was employed.
68
Figure 3.3. Observational learning effects from the SCP. (A) The social learning
condition saw significantly improved performance relative to baseline, for both monkeys,
by the measure of ‘responses until first correct trial’. (B) There was no observed
facilitation effect, for either monkey, during the computer feedback condition, in which
no ‘teacher’ monkey was present, and instead the computer selected the items (with
briefly highlighted borders around each item signifying to the observer the selection
made). (C) There was no observed facilitation effect, for either monkey, during the social
69
facilitation condition, in which the mere presence of a conspecific – without any
interaction with the monitor – was measured. [Adapted from [(Subiaul et al., 2004).]
3.2 MODEL DESIGN
The model presented here is based largely off of the machinery described in the
previous chapter (and see Figure 3.4). Building off of that model, which we briefly
review below – though see the primary article for the technical details left out here – we
also implement an action recognition system and additional reward processing machinery
to respond to reward received by others (vicarious reinforcement). Most importantly, we
implement the uni-directional flow of information: from teacher to student, in the case of
processing others’ action; or, from feedback (directed towards teacher) to student, in the
case of processing the outcomes of others’ action. Methodologically, this is a simple case
of dyadic brain modeling, which seeks to describe the neural and cognitive systems
involved in social learning, cognition and interaction (Arbib et al., 2014; Gasser,
Cartmill, & Arbib, 2014).
We briefly tour the basic processing and learning mechanisms and then detail the
extensions elaborated upon here to model the observational learning effects. As seen in
Figure 3.4, the model – for both the teacher and observer – processes the touch screen
monitor, modeled as a 3x3 image. Visual processing streams extract the spatial
configuration of the items (dorsal pathway) and recognize each individual item as a
unique photograph (ventral stream). Eventually, a composite representation is achieved
which maintains both item representations, and spatial configuration. Visual working
memory activates a temporal memory layer that maintains a representation of the state of
execution of the sequence. Since the temporal memory layer receives differing inputs
according to the distribution of visual patterns on the screen, and since state transitions
are deterministic, the ‘trajectory’ of activation states within this layer of neurons must be
learned across many trials (where the features are randomly configured), and indeed lists
(where the features themselves change as new items are presented). Learning processes
generalize from visual working memory, to temporal memory, and so essentially select a
particular trajectory of states so that learning of biases, and then retrieval of these biases
as competence increases, remain consistent. Additionally, each state is then associated
70
with an item on the screen, such that as new temporal context states are reached, new
biases are computed and selection follows. These context-based signals complement
value-based signals, learned through temporal difference learning. Reward and feedback
are processed specifically to influence the learning processes to achieve more adaptive
performance.
Figure 3.4. High-level model schematic. Visual input in the form of a 3x3 array
provides input to the model. Immediately, a dorsal and ventral path process the input,
computing a binary representation of target locations preserving spatial relations
between items (dorsal), and visually discriminating the items based on their unique
feature-vector representation (ventral). Integrated visuo-spatial information is
represented in a visual working memory layer. This layer informs downstream structures
of the content and visuo-spatial features of the monitor. The temporal memory layer
maintains the internal state of execution for the current list and projects, complementary
to the value-based signal provided by the current motivational state, a temporal context
signal that informs the behavioral priority map, which is composed of multiple layers not
shown here. The major representations used in the model are visible, as are the learning
pathways (dotted lines) that contribute to model performance. Learning itself is managed
71
by an outcome processing module that manages the feedback of the environment and
interacts with relevant learning pathways (circled cross-sections).
In the specific simulations for the observational learning condition (see Figure
3.5), following each selection by the teacher, the action recognition system of the
observer processes the selection and activates its own representation for that item. In
parallel to this, the observer model must maintain its own evolving representation of the
temporal context, so that the context+item association can be made. That is, the observer
model must both recognize each selection made by the teacher, but then also update its
temporal context state vicariously. Finally, the observer model must process the outcome
of the trial to determine whether the selections were successful (rewarded) or not (time-
out), in order to appropriately upgrade (reward) or downgrade (time-out) the relevant
context+item association.
Figure 3.5. Dyadic brain simulation method for observational learning condition.
The monitor serves as the visual input to both teacher and observer brain models,
resulting in a representation in visual working memory, in each brain model, of the
monitor’s composition (the dashed lines represent intervening data processing not
shown). The teacher ultimately makes a selection based on the visual input and learned
associations. The mirror neuron action recognition module processes the outputted
teacher action (in particular, the selected visual pattern), computes the appropriate
internal representation for that selection (note that each model represents features and
72
states differently), and outputs the representation to the learning module. (In addition,
the action recognition module manages updating of the temporal memory layer for the
observer, since efferent feedback is not available.) The resulting representation, the
current temporal memory state, and any rewarded outcomes (positive or negative) are
translated to eventually enact synaptic weight changes.
Recall that in the base model from the previous chapter, the signal that determines
item selection is the behavioral priority signal, which itself is composed of a value-based
component and a context-based component. In order to succeed at this task, each
successive state of the temporal memory layer must be associated with the next item of
the list. When observing others perform this task, we hypothesize that it is this
association that is extracted from correct performances. Firstly, though, successful
recognition of each selection must be made by the observing model. Mirror neurons (di
Pellegrino et al., 1992; Gallese et al., 1996; Rizzolatti, Fadiga, Gallese, & Fogassi, 1996)
are hypothesized to participate in the processing and recognition of others’ actions, and
several models have been advanced to explain the ontogeny of particular responses
correlated with manual reach-to-grasp actions (Bonaiuto et al., 2007; Oztop & Arbib,
2002). We do not model mirror neuron action recognition networks here but instead
assume some circuitry – more or less similar to the models above – are capable of
processing the reaching actions of the teacher to deliver a representation of that particular
selection. (We review mirror neuron data and models in the Discussion.) The observing
model takes as an input the visual pattern selected by the teacher at time “t”, and
transforms it into the internal item representation “o(t)” correlated to that visual pattern.
In the base model, the agent performing the task learns the temporal context
associations according to the following:
(1)
where “ ” is the learning rate, and “ ” is the effective reinforcement, which is equal
to “1” unless the selection is incorrect, in which case it is equal to “-0.1” – which also
leads to trial termination. The learning in these weights is also competitive, in that
learning of one item+context association comes at the expense of other possible
item+context pairings, for that given temporal context state.
ΔW
c
= α
c
× r
e
(t)
α
c
r
e
(t)
73
The learning rule that manages the observational learning is based off of the
equation in (1), with two exceptions: an ‘agency’ term to gate the learning when the
teacher is non-biological is added, and a term to gate for explicit reward feedback…
Updated, the learning rule looks like:
(2)
where “ ” is the observational learning rate, “ ” is a Boolean representing the agency
of the demonstrator, “ ” is primary reward receipt, and “ ” are the set of
selections observed (thus, the weights for all selections in a trial are appropriately
updated).
3.3 METHODS AND SIMULATION RESULTS
We ran 10 total simulations of our model to arrive at simulation results we could
test for statistical significance (since random noise and random initialization of, e.g.,
weight matrices, were involved). To test if our model could learn from observation, we
trained two copies of the model as in the previous chapter, so both performed ‘baseline’
lists at high rates. Subsequently, a novel list was constructed and one model was trained
on that list and designated the ‘teacher’. Following a block of 20 trials the performance of
which was available to the observer model, the observer model was then tasked with
performing this new list. We measured both the time to first correct trial, and then overall
performance levels as in Part 1.
ΔW
o∈S
c
= α× r
e
×(A
b
× r
p
(t))
α
o
A
b
r
p
(t) o∈S
74
Figure 3.6. Model behavioral results. Performance measures across list types are
shown. For all lists and for all performance measures, the 4-long increment only is
shown. The incrementally-shaped lists (Lists 1-4) and the simultaneous lists (List 5) are
more difficult to master than the observation condition preceded by substantial training
(List 6), which shows facilitation following a teacher’s performance. However, there is
no facilitation effect when the observation condition is not preceded by training, and is
instead task naïve (List 7). In this condition, the model is unable to master any 4-item-
long list. (Note that for the figures on the left, List 7 is neglected to not skew the figure’s
scaling.)
For the measure of ‘time to first correct trial’, the observer model, on average,
achieved a correct trial in approximately 3.9 trials, statistically more rapid acquisition as
compared to the baseline rate of 21.3 total trials to first correct trial (p<0.01). By other
measures as well it is clear the observation condition can lead to facilitation: blocks until
criterion and percent correct after 50 or 100 trials (see Figure 3.6).
Lastly, we tested whether the task experience each animal had prior to the testing
of the effects of observation influenced the behavioral results. To do so, we trained a
single model to criterion levels (the teacher), but for the observer model, we provided no
training – so that it was not only ‘list naïve’, but ‘task naïve’ as well. We then followed
75
the above procedure and tested for possible facilitation. The observer model in this
condition was unable to ever acquire the list, showing no facilitation from observation
(see Figures 3.6 and 3.7).
Figure 3.7. List-naïve versus task-naïve models during observation. (Top) Weight
changes following observation period for both observation conditions. The condition
preceded by training (left) shows focused weight modifications following an observation
blocks (as indicated by the subset of possible temporal memory states affected). The
condition without any prior exposure to the task (right) shows random weight updates.
(Bottom) The pattern of initial activation in the temporal memory layer (x-axis) over a
block of trials (y-axis) immediately following the observation block. On the left, the
pattern is consistent throughout, owing to prior learning. On the right, the activations
are random.
Predictions
Our model suggests the observational learning effects from (Subiaul et al., 2004)
are dependent on prior experience on the part of the observing monkey. Our tests under
different training regimes – or lack thereof – showed that at least some prior experience
with the task was necessary for facilitation from observation to occur. The simulation
76
conditions wherein no training was provided to the observing model prior to observation
of the teacher model’s performance showed no facilitation. This is easily understood by
reflecting on the fact that, as shown in Chapter 1, the temporal memory layer – the layer
of neurons managing the internal temporal signals, i.e. the evolving state of execution – is
at first randomly activated by the varied visuo-spatial inputs it receives, and only over
substantial learning does this layer stabilize to facilitate both the establishment of
associative biases, and then later the retrieval of these biases. Since the naïve model has
not generalized across this varied input-space, any associations it is able to establish on a
particular trial are not likely retrievable in future trials – for both future observation trials
and for the trials wherein it is tested.
Additionally, the model suggests that some neural circuit must integrate
observation of another’s action with at least information on the value feedback of the
action (was the action good, or bad?) – and also, as in our case, subtle contextual
information. Little is known about how such information may be integrated or what
circuits are involved, though there are beginning to be studies examining particular
circuits during interactive or otherwise social experimental setups. We discuss these in
the Discussion.
3.4 DISCUSSION
Our model incorporates action recognition and reward processing elements in
order to reproduce the pattern of results from (Subiaul et al., 2004) showing sensitivity to
observational conditions. The MNS model of (Oztop & Arbib, 2002) showed how mirror
neurons can be trained to associate patterns of visual feedback representing the evolving
‘hand-state’ within dorsal visuo-motor streams with efferent copies of one’s own reach-
to-grasp motor program, while the extension to that model, MNS2 (Bonaiuto et al.,
2007), showed how these responses could persist in the absence of explicit visual
information, such as reaching behind an opaque partition (Umiltà et al., 2001). In both
cases, the system was ‘boostrapped’ through training occurring during self-observation –
the models would not ‘begin’ with mirror neurons, but would show how internal learning
mechanisms could facilitate the associations that are indicated by their physiological
response. However, few studies have examined the downstream effects of mirror neuron
77
activity, or what function such responses serve for the animal (though for hypotheses
from modeling work, see (Bonaiuto & Arbib, 2010; Chersi et al., 2011; Erlhagen et al.,
2006; Oztop, Wolpert, & Kawato, 2005). The hypothesis of ACQ for the role of mirror
system activity was to grant access to learning centers to adaptively modulate estimates
of the ‘executability’ of actions, as determined through comparisons of the efferent copy
of an intended action to MNS-registered recognition of the visual form of the action, or
what the action ‘looked like’. We take a similar perspective here and claim that action
recognition in the observing monkey grants access to the learning centers for the
representations of what it ‘looks like’ the teacher monkey is doing. This differential
access of representations to the learning centers can modulate the item-rank associations,
improving the effectiveness of subsequent trial-and-error learning.
Additionally, ‘mirror neurons’ have been reported beyond the classic ‘grasping-
type’ neurons first reported (Rizzolatti et al., 1996). For example, ‘sequence-type’
neurons have been reported by Fogassi et al (Fogassi et al., 2005), which seem to be
sensitive to contextual environmental cues suggesting which goal an actor monkey will
fulfill (placing a reward in one’s mouth, or in a bin – to receive a larger reward). These
responses, in fact, appear predicated on ‘task knowledge’. Interestingly, it is also
important to point out the existence of ‘ingestive-type’ neurons in monkey F5 (Ferrari,
Gallese, Rizzolatti, & Fogassi, 2003), which exhibit mirror properties to mouth
movements, including sucking juice from a canulla. This suggests that mirror systems
may participate in the various necessary action recognition processes mediating linking
outcome feedback and the appropriate motor acts, as our model shows. Lastly, ‘social
gaze’ mirror neurons have been described in LIP (Shepherd et al., 2009), suggesting links
between objects of attention can be made in PPC – and with implications for extending
the notion of BPMs to social interaction.
However, we see two important points to make here: (i) the relevance of
considering task knowledge, and (ii) the importance of being in the right state. As our
simulations showed, task-naïve monkeys showed no improved learning during
observation blocks, only list-naïve monkeys did. This is not because learning centers
were not engaged during these blocks for the task-naïve observer (in fact, granting this
may even be a generous assumption); instead, what was ‘learned’ was random and not
78
consistent across trials, as the temporal context memory would begin in random states for
each new trial. In the case of the list-naïve observer, however, the model was previously
able to train the mapping from visual features (even those features that are novel) to the
temporal context memory and to facilitate subsequent performance. In this way, the
monkeys are more likely learning through facilitation – with the recognition that such
effects are expertise-dependent. (In fact, in Subiual et al., the tested monkeys had been
trained on upwards of 40 novel lists prior to the observational conditions.)
The second point concerns the importance of being properly motivated to attend
to the actor monkey. Subiaul et al. in fact sought to test against facilitation, by testing two
additional manipulations: (i) a social component without demonstration, and (ii) a
demonstration, without a social component. The first manipulation tested whether
another monkey being merely nearby could explain the better performance in their list-
naive monkeys. (The thought is that mere social motivation may contribute to better
learning and performance.) In this condition, a monkey was in the next chamber as
before, but no demonstration occurred, and the list-naïve subjects started ‘from scratch’,
as in regular SCP Lists (see above). In this manipulation, no facilitation effects were
seen. The second manipulation replaced the actor monkey’s demonstration with a
computer-generated performance. This eliminates the potential for an obvious socially-
based motivation, but preserves the information available to the observing monkey.
Interestingly, when the computer was ‘correct’, rewards were still presented, but to an
empty chamber. However, in this manipulation the effects were again null.
What these additional manipulations – tested in Subiual et al., and simulated here
– show is that social learning is more complicated than being capable of processing
particular information. The animal must be in the appropriate ‘state’ – motivationally
and cognitively, and most be able to glean the value-outcome of observed actions. Few
direct studies have investigated this, though the data that is available seem to converge on
this assessment as well. (Azzi et al., 2012) showed OFC neurons that differentially
responded to reward allocations exclusively to oneself, and when shared with a ‘partner’
monkey, apparently encoding the motivational salience as a function of social context.
Interestingly, there have also been tests in macaques assessing how LPFC responses
engaged during social interaction conditions vary when interacting with a conspecific, or
79
a computer, during competitive video games (Hosokawa & Watanabe, 2012), showing
modulation correlated with either partner. The modulations also correlate with
behavioral patterns, with the monkeys apparently more engaged and motivated when
competing against a conspecific. Other ‘interactive’ task designs have elucidated
response profiles of neurons in other regions, including striatal neurons signaling the
value of social information (Klein & Platt, 2013) and medial PFC neurons processing
others’ actions (Yoshida et al., 2011) and errors (Yoshida et al., 2012).
As we discussed above, socio-cognitive skill has been argued to be the prime
driver of human neural and cognitive growth. Despite this central role these skills likely
have played in human evolution – including in linguistic skill – few computational
models have explicitly explored social behavior in primates, nor the neural mechanisms
which coordinate social learning and behavior. Steels and colleagues have simulated
multi-agent systems in language learning games, though with little emphasis on actual
neural processing (Steels, 2003). (Oztop et al., 2005) have built off of the MNS models
described above to examine neural mechanisms involved in intention decoding, though
they did not offer an account of how these decoding processes affected downstream
processing. (Chersi, 2011), building off of the ‘sequence-dependent’ mirror responses
described by Fogassi et al. (Fogassi et al., 2005), offered a model of ‘joint action’, but did
not simulate learning routines. Here, we simulate two brains in ‘uni-directional’
information flow, as in the Chersi model, but also include learning processes that
transform mirror system output – along with recognition systems for processing feedback
provided to others – into learning signals. Elsewhere, we have described the potential for
this setup in truly ‘interactive’ social behaviors (Arbib et al., 2014; Gasser et al., 2014), to
which we now turn.
80
Chapter 4: Dyadic brain modeling of ape gestural learning
Gestural communication is widely accepted to be more flexible and expressive
than vocal communication in the apes (chimpanzees, bonobos, gorillas and orangutans)
with the implication that these data inform and support theories on human language
evolution (Aboitiz, 2012; Arbib, Gasser, & Barrès, 2014; Arbib, 2012; Gillespie-Lynch,
Greenfield, Lyn, & Savage-Rumbaugh, 2014; Pollick & de Waal, 2007). Additionally,
social learning behavior in general in the apes is often cited as more extensive (Dean et
al., 2012; Horner & Whiten, 2005) than in other primate relatives (besides humans), with
‘cultural’ traditions amongst populations of individuals having been cited (Whiten,
Hinde, Laland, & Stringer, 2011; Whiten, 2005) – though note that certain species of
monkeys have been shown to have idiosyncratic behaviors transmitted through some
social means too (Fragaszy et al., 2013). Moreover, comparative neuroanatomical (Hecht
et al., 2013) and functional (Hecht et al., 2013) studies have even found suggestions that
particular connectivity and gross response profiles, respectively, are more similar
between apes and humans, than compared between apes and macaques. Together, study
and analysis of the social and especially communicative behaviors of ape species greatly
impact our understanding of both human language evolution, but even human social
cognitive skill (Byrne & Whiten, 1988). It is thus important for integrative accounts
spanning neural, cognitive and behavioral analyses to be put forward to assist in the
understanding of these data.
Previously, we have described computational models of observational learning in
macaques, and of direct, dyadic interaction and mutually-shaped behavior in apes
(ontogenetic ritualization – OR – believed to be a process through which gestural signs
can be learned; see Figure 4.1) (Arbib, Ganesh, & Gasser, 2014). Our goal for this
chapter is to build off of the model of ape OR from (Arbib et al., 2014) and offer a more
integrative account of ape gestural learning and production by considering alternative
hypotheses (beyond just ritualization) on the ontogeny and use of intentional gestural
signs and showing how our model can handle these accounts. We offer model extensions
that define a ‘base’ model for both gestural signaler and gestural recipient and show how
differing roles and differing motivational and learning states can contribute to variation in
behavior, both in terms of how gestures are learned and how they are differentially
81
responded to. We then offer conceptual analyses of other cases of possible gestural
learning not confronted by our model here, as well as consider data on social influences
in learning manual tasks, vocalization behavior and other social behaviors long neglected
by computational modeling analyses. We consider how the model developed here and
the model of macaque observational learning – wherein learning is in one-direction, and
not interactive – can be leveraged to better understand these data and to offer some key
questions for future modeling and experimental / observational field work.
Figure 4.1. Dyadic brain modeling of apes. We have previously described a modeling
goal to simulate social interaction and communication between apes (Arbib et al., 2014;
Gasser, Cartmill, & Arbib, 2014). The goal is to model an ape brain based on analyzing
relevant neurophysiological, neuroanatomical and behavioral data and to implement the
brain models in such a way as to drive interactive behavior from which both agents can
learn. Here, we describe preliminary efforts to expand the previous model and to offer a
view towards an integrative understanding of gestural learning. [Figure adapted from
(Arbib et al., 2014).]
4.1 SOCIAL LEARNING INFLUENCES ON GESTURAL USAGE IN APES
As is now well known, apes are generally thought to communicate more flexibly
through manual gestures than through vocal communication. Manual gestures are
flexible, intentional acts that seek to influence the behavior of others. The production of
82
gestures takes into account the attentional state of the recipient, the gesturer monitors the
recipient’s comprehension, and future gestural bouts are often a function of past success
or failure with using a particular gesture (Arbib et al., 2008; Cartmill & Byrne, 2007;
Hobaiter & Byrne, 2011, 2014). When species-level and group-level population analyses
are done on the variation of gestural usage – according to a particular operational
definition of gesture, though there is no standard definition of ‘gesture’ or ‘goal’ in the
community and other primatologists define these terms differently – it is suggested that
the patterns of variation largely support an hypothesis that gestures are innate, and in
which variation is explained as differences in learning at the individual level (Genty,
Breuer, Hobaiter, & Byrne, 2009; Hobaiter & Byrne, 2011a). It is observed, for example,
that significant variation exists in gestural repertoires, both between individuals and
across individual developmental periods (Hobaiter & Byrne, 2011b; Tomasello, Gust, &
Frost, 1989). Not surprisingly, too, variation exists across ape species, though significant
overlap for ‘family-typical’ gestures are observed (Genty, et al., 2009; Hobaiter & Byrne,
2011a). Nonetheless, these gross patterns of variation are said to largely support an
hypothesis that gestures are innate, and in which variation is explained as differences in
learning at the individual level – ‘repertoire tuning’, for example (Hobaiter & Byrne,
2011b).
Even accepting the notion that all or most gestures are innate, it is clear that some
manner of learning is involved, but what those processes are, how the role of the agent
factors in (gesturer, recipient) and what that implies for the neural and cognitive
machinery managing their learning and use remains unclear. As noted above, past
success or failure of a particular gesture in a particular context predicts its future usage,
showing modulation as a result of feedback – specifically, the others’ actions. This
shows that monitoring others’ behavior is involved in the learning process, thus must be
‘social learning’ (Gariépy et al., 2014). More precisely, however, detailed studies clearly
show that social learning of some kind is involved.
(Luef & Liebal, 2012) describe how non-infant gorillas – when engaged with
infants – adjust their communicative strategies in a sort of ‘motherese’ in which gestural
sequences are extended and use of tactile gestures increases as a way to facilitate
comprehension. The interactions are infant-specific in that these elaborated sequences
83
are not employed when communicating with non-infants. Additional data from ape
infant development suggests a general progression in the patterns of initiations of, for
example, play behavior. (Bard et al., 2014) showed that chimpanzee infants proceeded
through a stage where, at first, their interactions were initiated by others (in this study, the
human caregivers), and only later did the chimp infants themselves initiate, and then
request, particular interactions, like tickle play. (Schneider, Call, & Liebal, 2012a) also
showed a developmental pattern across ape species (though not orangutans) where tactile
gestures preceded visual-only gestures in usage, suggesting that at first, infants that
remain at close proximity to mother used tactile gestures accordingly, but as
independence from mother increased, use of gesture from a distance – that is, visual-form
gestures – gained prominence. In general, the emerging notion seems to be that a
multitude of factors impact socio-cognitive development (Bard & Leavens, 2011), and
that while innate programs for some gestures undoubtedly influence the developmental
pattern for infant gesturing, the very fact that communication involves social partners, the
learning processes, as a result, involve processing social variables to adapt one’s own
behavior (Schneider, Call, & Liebal, 2012b). Disentangling what these social influences
consist of, how these vary with development, and how others’ responses during
development – actively via ‘scaffolding’ behavior, or otherwise – influence one’s future
communicative behavior become important questions.
4.2 MODEL DESIGN
We have previously made the case for the emergence of ritualized gestures from a
particular scenario of ape mother-child interactions (Arbib et al., 2014; Gasser et al.,
2014). Above, we have demonstrated that it is clear that a number of learning processes
influence gestural usage in ape populations, and further above we suggested that social
learning influences are found more widely within ape behavior. It is clear, then,
modeling efforts must seek a more integrative account of gestural learning processes and
in time, accounts more generally of social learning of various skills. Here, we show our
integrative model of gestural learning, learning both to ritualize gestures as a function of
learning in a pair of agents concurrently, but also learning about usage of innate gestural
schemas.
84
There are a number of variations to the model that we simulated to get a sense of
learning processes under different circumstances, but the architecture of the model is the
same (see Figure 4.2). Indeed, both agents inherit the same architecture, but each
instantiation differs according to their respective roles, motivational states,
action/gestural repertoire, etc. Inputs to the model consist of visual and haptic
information from the environment, and internal proprioceptive information. (Haptic
information is computed since there is no mechanical sensation in the simulations.)
Visual and proprioceptive information is given for each of: shoulder joints, elbow joints,
wrist position and head position in the form of absolute (x,y,z) coordinates . Appropriate
coordinate transformations are made when needed.
For each episode of interaction, the agents enter a perception-action loop wherein
perceptual information is translated into updates of action priority and behavioral output
is computed, before re-assessing perceptual data and so on. Visual information is
processed by two streams: one for action-recognition and another to assess affordances
for action (or, to determine those actions that are available to the agent given the context).
The action-recognition system consists of a recurrent neural network for analyzing
reaching movements and an algorithmic module for assessing attentional states and other
actions like walking. The recurrent neural network, implemented through the pybrain
library, consists of 6 linear input units corresponding to (x,y,z) coordinates of wrist and
elbow, 8 hidden, looped sigmoidal units and 3 linear output units, “ ”. Output unit
activity is normalized via the hyperbolic tangent function:
(1)
Training of the recurrent network is achieved through backpropagation-through-
time (bptt), and consists of 3 distinct actions, one for each output unit. (The specific
actions vary according to simulation protocol, but are provided for each scenario below.)
Thus, the network can output responses that correlate with arm motions that have been
previously provided as training data. The normalized output response “ ” is mapped
one-to-one via a weight layer “ ” to a 3-unit layer representing social context. The
units in this layer integrate activity from the recurrent neural network over time, and if
they cross an activation threshold they can toggle goal states off and on, which then leads
m
o
(t)
y
r
(t)= tanh(m
o
(t))
y
r
(t)
w
s
85
to a new distribution of activity across motor schemas, as explained below. The units’
membrane activations are given by:
(2)
where “a” and “b” are constants that sum to 1, and “ ” is any haptic input. The
activation threshold is given by “ ”, and so the firing rate for this layer can be given as:
(3)
Crossing threshold activates the corresponding unit in the goal layer, because the
activation in “ ” is transient. Thus, crossing threshold even briefly activates a new
permanent goal (permanent until achieved).
Learning can modify “ ” to facilitate more rapid goal switching following
activations in “ ”. The weights are updated according to:
(5)
where “r” is the feedback and is 1 if the outcome is successful, and -1 if unsuccessful, “
” is the learning rate for updating these weights, and where “i” indexes the relevant
unit in “ ” that passed threshold.
Action selection is performed by selecting the maximally-active motor schema in
a 6-unit layer that receives value-related and context-related information, which together
combine to give a behavioral priority signal. Context-related information is a function of
the affordances of the world, such that, for example, reaching cannot be made unless
within reaching distance, and gestures cannot be performed if the others’ attention is not
engaged. This socio-cognitive layer’s activity is given by:
(6)
where “j” are the relevant affordances. Activation here is transmitted via modifiable
weights to the action selection layer, which is given by:
(7)
m
s
(t)=a×m
s
(t-1)+b×(y
r
(t)×w
s
)+c
h
c
h
h
s
y
s
(t)=
1
0
if m
s
(t)> h
s
if m
s
(t)<= h
s
m
s
(t)
w
s
y
s
(t)
Δw
k
c
=α
g
×r
α
g
m
s
y
j
c
(t)=
1 for all j that obtain
0 for all j that do not obtain
y
p
(t)= y
c
(t)×w
c
+y
g
(t)×w
v
86
where “ ” is the goal layer activation, and “ ” is the modifiable weight matrix
mapping goal states to the action selection layer, which gives the value signal for each
action. Learning for both context-related and value-related weights is computed
following action selection and the evaluation of the outcome. Associative learning
modifies the context-related weights:
where “ ” is the relevant learning rate. Temporal-difference reinforcement learning
(Sutton & Barto, 1998) updates the value-related weights according to:
where “ ” is the predicted value of the current action, “ ” is the predicted
value of the previous action, and “ ” is the difference term. “ ” updates the value-
related weights by:
where “ ” is the relevant learning rate.
Motor coordination is managed by an inverse-kinematics solver but the model
computes a target for reaching (and pulling, and walking). Reach target “ ” is a
function of a weighted average between praxic (affordance-driven) and postural
(proprioceptive-driven) signals:
where “ ” is the (x,y,z) coordinate given by the postural control mechanism, “ ” is
the (x,y,z) coordinate given by the praxic control mechanism (i.e., the (x,y,z)
corresponding to the location of the mother’s wrist), and “ ” controls the influence of
either signal. “ ” can be increased following successful sequences in which the mother
anticipated the child’s behavior: namely, those episodes where the child did not have to
mechanically act on the mother. In this way, learning is only possible for the child if the
mother is responsive enough to anticipate the reach-to-grasp. The relative weighting is
updated by:
y
g
(t) w
v
Δw
k
c
=α
c
×r
α
c
δ=γ×v(T−1)−v(T)+r
v(T) v(T−1)
δ δ
Δw
v
=α
v
×δ
α
v
x
r
x
r
=a
p
× p
po
+(1−a
p
)× p
pr
p
po
p
pr
a
p
a
p
a
p
=a
p
+α
a
×r
87
where “ ” is a learning rate. In each of these instances, the postural control mechanism
records the target posture of the successful action and seeks to reproduce that same
posture – in shoulder-centered coordinates – in future episodes. Finally, following
substantial learning wherein “ ”, where “ ” is a parameter controlling gesture
consolidation, a gesture is consolidated and can be selected by the action selection
module. The process to spontaneously consolidate a gesture in this way is managed
algorithmically.
Figure 4.2. Model of gestural learning, representation, production and
comprehension. Model inputs are given by external (visual, haptic) and internal
(proprioceptive) information. These influence action selection, along with motivational
state influences, and action performance. Learning processes can modulate connection
weights in multiple areas, and are dependent on processing relevant feedback. See also
Figure 4.5.
4.3 METHODS AND SIMULATION RESULTS
We performed a multitude of simulations under differing conditions, varying
parameter values, distribution of goal state activation, postures of the agents and presence
of third-party observers. The results show the model is highly adaptive to circumstances
and reacts according to variation in their internal states, but also the others’ behavior,
α
a
a
p
>=h
g
h
g
88
which can be shown to influence how each agent learns. We discuss how to interpret
these results in terms of their implications for primatological and neuroscientific research.
(Note too that the names or descriptions of the gestures used in the simulations below do
not necessarily correspond to what is described in the primatological literature (e.g.,
Hobaiter & Byrne, 2011a). In time, it would be useful to have a close correspondence
between what is able to be simulated and what is actually meant by the terms, but for
many reasons that is not possible here.)
Child learning rate .2 .3 .2 .3 .4 .2 .4 .3 .3 .3 .3
Child threshold .8 .8 .8 .8 .8 .8 .8 .8 .7 .7 .8
Mother learning rate .2 .2 .3 .3 .4 .4 .2 .3 .3 .3 .3
Mother threshold .42 .42 .42 .42 .42 .42 .42 .45 .35 .35 .55
Emergence of gesture 12 10 10 9 6 10 9 7 7 5 X
Figure 4.3. Parametric variation leads to varied gestural learning. By varying certain
parameters (learning rates, thresholds for activation changes, and priori training data,
not shown) the progression of gestural learning changes and influences when, or even if,
a gesture can emerge. It is clear above that both agents influence the gestural learning
process, and so gestural learning in this way is not driven solely by the gesturer (here,
the child). Parameters that vary the responsiveness of an agent to react to others’
actions greatly influences the emergence of a ritualized gesture, as do the learning rates
involved. It is also evident that under particular circumstances – such as when the
mother is unresponsive to others’ actions, as indicated by a higher threshold for goal
switching – a gesture will not become ritualized (indicated by an ‘X’), providing
suggestions of how dyadic interactions may subtly influence gestural behavior.
1) Ritualizing a reach-to-grasp gesture through repeated interactions
Because the model described here is updated in relation to the previous iteration
(Arbib et al., 2014), we reproduced the general result of successfully ritualizing a reach-
to-grasp gesture through mutual shaping of behavior. We then performed similar
parametric variation of both models and recorded the number of discrete episodes until a
gesture was spontaneously used by the child (see Figures 4.3 and 4.4). As shown, there is
89
a wide range in the progression of learning to ritualize, and this effect can be driven by
either agent, mother or child, thus demonstrating that both are key actors in the process
and that the learning involved cannot be attributed to one alone.
Figure 4.4. Dyadic interaction across time. Mock activation in the child model is on the
left, and mock activation in the mother model is on the right (only relevant portions of
each model are shown). (Top) The child is in a motivational state that seeks direct
contact with the mother for social bonding, leading to selection of a sequence of actions
(here, reaching towards her arm) known to lead to the goal state (an appropriate gesture
is not available, as indicated by the dotted circle in the action selection layer).
90
Meanwhile, the mother is in a different motivational state, and so her actions only reflect
that goal state. However, action-recognition activation tracks the child’s reaching
performance, but is not strongly associated enough to cross threshold for toggling the
goal state. (Second panel from top) The child grasps the mother’s arm, leading to haptic
sensation in the mother that is only provided as input the following time-step (feedback
leading to the panel below). (Second panel from bottom) Haptic information, combined
with visual analysis from the recurrent network, is enough to cross threshold and flip the
motivational state of the mother, leading to re-prioritization of behavioral plans and,
finally, to actions that mutually-satisfy each agent’s goal state. Upon recognizing the
mother’s response, the child aborts his grasp-and-pull action. Each agent’s learning
system is then engaged and relevant connection weights are updated. (Bottom)
Following a number of interactions, a gesture is consolidated by the child, as indicated
by the dotted circle now filled and activated, which leads to new behavior for the same
goal state. Meanwhile, the mother has also learned the child’s behavior and
appropriately responds to his gesture.
We additionally can disrupt the mother’s recognition of the child’s gesturing
behavior, to observe how the child responds to failed communicative acts. Following a
variable number of bouts by producing the ritualized gesture, the child can revert back to
the original action sequence and mechanically interact with the mother. Failed
communicative attempts are met with a learning signal to downgrade the new schema for
the gesture, but the original action sequence is unaffected. We will return to
circumstances where ritualization fails below.
91
Figure 4.5. RNN and integrator activation in mother model. (top) Normalized firing
rates for the output neurons of the recurrent neural network are shown. In both cases,
rapid recognition of the arm movement of the child is reached: in (A), the reach-to-grasp,
and in (B) an arm-raise gesture. (bottom) Membrane activations of the integrator layer
neurons are shown. In (C) recognition of the reach-to-grasp action is rapid, but weakly
associated with goal-switching, and so not until haptic input is also provided – as shown
by the rapid spike – is threshold crossed. In (D), we see integrator layer activity
following learning, so that RNN-output neurons quickly toggle goal state neurons,
leading to changes in behavior.
Lastly, we can demonstrate successful recognition of the reach-to-grasp action via
the mother’s recurrent neural network’s output neurons, and via the activations of the
integrator neurons (Figure 4.5). Output units respond correlated to the child’s actions
(top row), both reach-to-grasp actions (left) and innately-specified gestures (right; and see
below). Integrator neurons (bottom row) respond as a function of these activations, and
92
the strength of the connection weights. If the associations are weak, as they are initialized
in these simulations, mechanical interaction by the child is necessary, which then cause
haptic input that finally drive these neurons above threshold (bottom left). When
associations have been strengthened, the mapping from recognition neurons to goal-state
switching is rapid.
2) Ritualizing a variant reach-to-grasp gesture through repeated interactions
Having shown that reach-to-grasp gestures can be ritualized under at least some
circumstances, we now show that the particular posture of the mother, and the particular
target of the child’s grasp, can be varied and still result in a ritualized form of a gesture.
Indeed, the gestural form ritualized in these circumstances takes a different posture owing
to its causal relation with the ‘seeded’ reach-to-grasp gesture of the child: since the
child’s praxic form varies as a function of the mother’s posture, the resultant gestural
form inherits the varied postural form from which it derives. The mother is initialized in
a different posture, with her arm above the child’s head, thus forcing the child to both
move closer to be within reaching distance, and to reach higher to grasp her arm.
Following a variable number of interactions as above, the child is capable of
spontaneously using the gesture communicatively, but since his motor learning is
dependent on the posture of his reaching movements, the form of this ritualized gesture is
varied as compared to (1). In fact, a wide range of variable, ritualized gestural forms are
possible in this way, and so there is no correlation with the exact initialized posture of
either agent. Indeed, this is just further proof that such a simple form of learning is
plausibly involved in a wide range of interactive behaviors.
3) Ritualization fails when mother too unresponsive
As Figure 4.3 showed, there are circumstances where ritualization does not occur,
and analyzing these circumstances may potentially be informative. Two parameters can
be varied that result in, effectively, an inability for the child to ritualize the reach-to-grasp
action, though the child remains capable of achieving his goal state through praxic
means. By increasing the parameter for how responsive the mother is to the child – the
threshold for switching goal states as a function of the child’s behavior – the mother will
never complete the action of the child, and can only be mechanically acted upon for her
to bond with the child. In fact, too high of a value and even mechanical interaction fails
93
to result in goal-state switching in the mother. Additionally, if the mother is a slow
learner in this respect she fails to adapt to the child, and so to anticipate his behavior,
leading to a lack of responsiveness. For the child, by increasing the parameter which
controls consolidation of the learned postural form of the reach-to-grasp action the child
can effectively be unable to ever ritualize in this way.
4) Ritualization fails when mother too proactive in bonding with child
Above showed that an unresponsive mother may prevent ritualization from
occurring. More interestingly, we can assess how varying the mother’s proactivity – by
varying her initial goal state activation – influences the progression of gestural learning.
When the mother is initialized in the appropriate motivational state – to achieve physical
bonding with the child – she immediately moves to embrace the child, with the result that
no learning about the reach-to-grasp occurs in the child. While this seems trivial, it is
important to note that in limited numbers of interactions, there may be influences on the
engagement of mothers to infants, which changes how or whether the child has the
opportunity to ritualize a gesture. There have been recent longitudinal studies on mother-
infant dyads and how interactions influence future socio-emotional and gestural
behaviors (Bard et al., 2014; Schneider et al., 2012), though we leave it for the
Discussion.
We can now preliminarily show that when the child model is initialized with a set
of innate gestural schemas, it can develop patterns of gestural usage as a function of
learning experiences. These learning experiences can be driven by either agent, as above.
In the following simulations, we innately program three schemas for different visual,
arm-based gestures that the child model has access to. The three gesture include an arm
raise (the arm is raised above the shoulder, near the head of the gesturer), an arm-to-
ground (the arm moves downward and the hand makes contact with the ground), and an
arm swing (the arm is rapidly moved back-and-forth at the side of the gesturer).
(Hobaiter & Byrne, 2011b) suggested that infants, whom they found to gesture
more frequently in sequence bouts than other ages, produce a wide variation in gestures
at first, only to prune their gestural selection to fit contexts appropriately, thus becoming
more efficient. We can preliminarily show the same effect, by varying both the innate
biases of the child gesturer – to prefer one or another gesture, or to select amongst them
94
randomly – and by varying the mother’s recognition of these gestures (i.e., whether the
gesture will be a success or not) as a function of training experience. From this, we can
see the child, for example, be biased to select a gesture (e.g., arm raise) that will not
result in a response from the mother. Upon getting no response, the child downgrades
that gesture schema and selects the new highest schema (e.g., arm-to-ground). If this
gesture is a success, the child will produce that gesture in future circumstances when in
that goal state. However, when the mother is unresponsive to this new gesture as well,
the child is still capable of reverting to a praxic interaction. And under the appropriate
circumstances (see above), the child can still spontaneously ritualize from this
interaction. Thus, we can see the child explore multiple possible (communicative)
actions to achieve a goal and proceed through multiple learning opportunities to arrive at
a successful gestural form.
4.4 DISCUSSION
We have presented an integrative model of ape gestural learning demonstrating
gestural acquisition as a function of multiple learning processes across innate and
ritualized gestures. The gross behavioral results of the model comport with data on ape
gesturing behavior, though necessarily are of a level abstracted away from certain details.
We summarize how our model relates to existing data below, and how the model may be
extended to make further contact with available data. We follow with a short review of
relevant neurophysiological results from monkeys and suggest ways these data, too,
could be brought within the modeling efforts here. Finally, we suggest linkages with the
model of macaque observational learning in the previous chapter and analyze a few
salient instances of social learning in non-human primates and how models of uni-
directional and of direct dyadic interaction may assist in understanding these behaviors
Social influences on gestural learning in apes
Based on analyses of the gestural repertoires of gorillas (Genty, et al., 2009) and
chimpanzees (Hobaiter & Byrne, 2011a), Byrne and colleagues argue that the data
suggest a negligible role for social learning, and instead offer that apes have access to
family-typical and species-specific gestural repertoires from birth, and that the analyses
above of the variation of gestural forms across isolated geographic sites support this. (We
95
again note that other studies do claim to observe group-specific or idiosyncratic gestures
(Halina, et al., 2013; Liebal, et al., 2006; Pika, et al., 2003).) As an example, (Hobaiter &
Byrne, 2011a) recorded 66 unique gestural types, as operationally defined in their study.
(It is important to note that researchers do not necessarily agree on, or at least
operationally define, what are considered unique gestures. Is ‘beckoning’ with the left
hand different than ‘beckoning’ with the right?) Of these 66, the pattern of variation of
their usage across subjects did not match criteria to be considered idiosyncratic, nor did
the two ‘potentially-ritualized’ gestures observed sufficiently match the physical acts OR
may have predicted. (The conclusions of (Genty, et al., 2009) on gorilla gesturing are
similar.) Still, the granularity of the definition of “gesture”, the potential for ritualizing
‘common’ gestures, and the results of other studies do not definitively answer this
question. Regardless of the claims of Byrne and colleagues, it seems apparent that an
analysis of the role of learning is nonetheless important here.
Elsewhere, Hobaiter and Byrne argue that ‘pruning’ occurs over development as
infants transition from gesturing in sequence rapidly, to being more efficient in their
sampling and use of gestures, as observed in non-infant chimps, though especially adults
(Hobaiter & Byrne, 2011b). For instance, they observed that as age increased, the number
of sequence ‘bouts’ decreased, while the percentage of successful communicative bouts
increased. Longitudinal data support this trend (Schneider, Call, & Liebal, 2012). Other
data support the notion that these interactions are critical not just to gestural development,
but more generally to motor, cognitive and socio-emotional development (Bard, et al.,
2013; Schneider, et al., 2012). (Luef & Liebal, 2012) presented data suggesting gorillas
interacting with infants modify their expressions in infant-specific ways. (Schneider, et
al., 2011) confirmed that infants are more alike in their gestural repertoire than adults,
while the adults are more alike themselves than to the infants. Most interestingly, (Bard,
et al., 2013) showed how infants first engage in, then initiate, and finally request
particular social interactions as gestures are learned specific to each context.
Since gestures can be used in different contexts, and different gestures may ‘mean’ the
same, we will address what socio-ecological conditions, and/or unique prior experiences,
explain the individual variation in gesture use. The claim by Byrne and colleagues seems
to support the notion that motor pattern generators (MPG), specific for each gesture, are
96
acquired genetically, and that appropriate environmental ‘releasers’ are also encoded to
give the gesture its meaning. But this doesn’t address whether social learning processes
are obscured by the ubiquitous nature of certain gestures in that observation ‘primes’ the
execution of a gesture, over an alternative, semantically-similar gesture (Tomasello, et
al., 1989). Also, in what ways is the learning ‘social’ – due to the unique nature of the
interaction – and in what ways is the learning non-social (e.g., motor, visual)? As an
example, (Oztop, Bradley, & Arbib, 2004) showed that universal grasp types for humans
need not be encoded genetically, and that simple learning mechanisms can explain the
rise of common behaviors. Similarly, (Oudeyer, 2005) offered a model of phoneme
learning, suggesting that self-organizing principles help explain how speech sounds are
learned and organized, though in our case there may not be a ‘culture’ of existing
gestures of which infants can learn from directly, and thus requiring novel learning
during repertoire pruning.
Social learning of manual tasks in primates
Comparative behavioral task designs – comparing either apes and human children
(Horner & Whiten, 2005), or macaques, apes and human children (Dean et al., 2012) –
have found species differences in the extent to which learning from others is possible,
and in what learning strategies are employed. For example, (Horner & Whiten, 2005)
tasked apes and human children to learn from an adult demonstrator how to open,
through a series of manipulations of lever bars and small ‘doors’, an ‘artificial fruit box
(attempting to simulate a naturalistic behavior involving extracting a food item, for
example). Whereas human children appeared to imitate more directly the actions of the
demonstrator including through the imitation of the action ‘means’ – more closely
reproducing the actual demonstrated action, like the turning of a bar – chimps appeared to
‘emulate’ the ‘ends’ of the demonstrated action, achieving more or less the same result
but through different, but functionally identical, actions. (Dean et al., 2012) found
differences across all species, but with humans again achieving much closer
correspondence to the demonstrated set of actions, as well as uniquely being capable of
pedagogical instruction amongst the tested subjects: children would assist others’ in
solving the complex task.
97
In the wild, (Byrne & Russon, 1998) described gorillas capable of hierarchical
action procedures to process nettles in such a way as to avoid harsh stingers. Infant
gorillas observed adults processing the leaves and eventually – over years of observation
and trial-and-error practice – became capable of avoiding the stingers too. Importantly, it
was contended that the relevant features learned by the young gorillas were not the
actions themselves, but their organization. Trial-and-error, too, contributed to their
competency, and so we can see it is not that observation across a limited time is enough
for learning from others, but a combination of observation – over long periods of time –
and individual trial-and-error that assists in the learning of a complex, hierarchically-
structured task. This is further supported by studies of monkey palm nut cracking
(Fragaszy et al., 2013). Due to the use of stone anvils to assist in the cracking of hard nut
shells, artifacts endure that young capuchins can interact with and learn from, facilitating
future trial-and-error experiences and the gradual accumulation of the skill required to
extract the nuts. These studies show that it is not as simple as imitation following from
observation of others – at least in difficult tasks – but rather that social learning can
follow from a number of possible influences (Gariépy et al., 2014).
Conclusion
Modeling trying to show integrative model for gestural acquisition, showing how
mutual shaping is supported, and pruning of innate gestures is supported. However,
model is preliminary and more work would need to be done to examine, for example,
finer motor learning on the part of the child, finer perceptual discrimination on the part of
the mother (or third-party observers, for example). And as shown by (Rossano, 2014),
for example, these interactions can be highly structured and complex and the simple
scenarios explored here are useful first steps, but would need to be more fleshed out in
future work. Additionally, expanding the model – in number of goal states, number of
actions and number of ‘different’ individuals to begin simulations of ‘populations’ of
agents – would be beneficial. More theoretical and conceptual thought needed to unite
this model and the uni-directional dyadic model of SCP observational learning, and then
to apply this united framework to analyze other instances of social learning, for example
learning of vocal communicative signals in vervets (Seyfarth & Cheney, 1986).
Additionally, artificial fruit tasks of (Horner & Whiten, 2005) – showing social learning
98
effects, but also the over-imitation effect – and nettle processing from (Byrne & Russon,
1998), each involving goal-subgoal structures, would are problematic for models
currently. Nut-smashing behaviors (Elisabetta, Haslam, Spagnoletti, & Fragaszy, 2013)
and other behaviors involving tool-use are apparently influenced by social learning
effects, though whether these can be said to be ‘imitative’ or not is unknown. In some
respects, this is irrelevant until more is understood about these putatively different social
learning effects are managed by the neural and cognitive machinery – which is why
modeling is important. And in developing and assessing our models, it is important to
recognize the emergence of neurophysiological techniques examining computation of
social variables in interacting primates (Azzi et al., 2012; Klein & Platt, 2013; Santos et
al., 2011; Yoshida et al., 2011, 2012). Finally, though the efforts detailed here are novel
for brain modeling, it is important to at least be aware of the contributions from others’
work, including for example (Spranger & Steels, 2014; Steels, 2003).
99
Chapter 5: Conclusion: a computational approach to neuro-primatology
We have reviewed important data that both challenge and constrain our models,
from macaque behavioral and neurophysiological experiments, to ape behavioral
observations. We have also reviewed past computational models involving primate motor
control, decision-making, and action recognition among others. We have shown the need
for more modeling work to address questions of primate behavioral planning and
sequential learning and representation, including (i) how complex sequences are
acquired, (ii) how the structure of tasks are learned and represented, and (iii) how
learning and planning at multiple levels are coordinated. Whereas much modeling work
on social learning processes has neglected key data regarding mirror system responses
and reward processing and (social) motivation, we offer a framework to tackle these data
and simulate conditions in which macaques are able to learn from others’ performances.
In complementary work we have made the move ‘from monkey to ape’ to consider how
the models from Chapters 2 and 3 can offer a basis for ape models (Chapter 4) which
suggest where new mechanisms (or novel assemblages) must have evolved since the last
monkey-ape common ancestor, including how intransitive actions are learned and
recognized, as in ape, but not monkey, gestural behavior. The models of ape gestural
acquisition address both ontogenetic and phylogenetic ritualization claims. Our models
and methodology for dyadic brain modeling promise to contribute to many other efforts
characterizing primate general skills, and skills unique to humans, including
neurolinguistics and social learning.
To aid in this research project, it is necessary for informatics and other tools to be
available for researchers to store, share, manage, and analyze data across a number of
disciplines (Gasser et al., 2014). Modeling is only possible by integrating across wide
bodies of data and making those data available, and the resultant models, should be a
primary focus going forward. Only through a computationally-anchored and integrative
approach spanning disciplines – and aided by informatics tools and resources – will we
resolve the key questions of comparative neuro-primatology.
100
Chapter 6: Supplementary material for Chapters 2 and 3
We report preliminary simulation results leveraging consolidation of well-learned lists
and the leveraging of higher-order representations to facilitate list acquisition and
execution.
6.1 MODEL DESIGN
We begin by detailing preliminary model extensions to support list consolidation
and chunking of lists and list sub-sequences (in the case of 7-item-long lists: recall from
Chapter 1 that monkeys are capable of learning lists of this length (Terrace et al., 2003)),
short-term memory structures for managing learning, and integrating mirror neuron
action recognition systems to both assist in learning about one’s own actions (as in
Bonaiuto & Arbib, 2010) and learning about others’ actions (e.g., observational learning).
The extensions do not alter the architecture of the model as detailed in the main paper,
but instead are built on top of that structure or else would tentatively replace certain
modules (as indicated).
The model can be made to consolidate well-learned lists by both training a
dedicated chunk node specific to that list (or list sub-sequence, see below), and by
training that node to generate patterns of activation across the planning layer of the
competition network to facilitate list acquisition and execution (in terms of both
performance and timing). Firstly, for a given list, the visual features of the monitor are
trained to map to a given chunk node such that that node, over time, becomes specifically
activated upon any arrangement of those sets of items. (Note that we ‘allow’ chunking
here only during 4-long or 7-long increments, and in fact, only during simultaneous lists,
not incrementally-shaped lists. We discuss the limitations of this below.). The
associative learning here, operating on the weight matrix “W
l
” which maps the visual
features “
y
v
(t)
” to chunk nodes “l”, is similar to the associative learning that maps these
features to the temporal memory layer:
(1) ΔW
r,l
l
=α
l
× r
p
(t)
where “r” is the set of features that obtain for that trial, “l” is the relevant chunk node,
“α
l
” is the learning rate for these weights, and “r
p
(t) ” is the primary feedback. In this
101
way, chunk node “l” becomes strongly associated with a particular list following many
correct trials.
In parallel to this learning, there are two additional learning processes that
facilitate consolidation. Over time, the model will weigh the activations emanating from
this chunk node more so than the ‘default’ priority computation discussed up through
now. We model a competitive input space wherein a weighted average of activations
from the default priority computation, and from the chunk node, are inputted to the
planning layer of the competition network. We recall that the behavioral priority signal is
the currency of item selection: items with a higher behavioral priority are selected in lieu
of items with a lower behavioral priority. The equation (11) from Chapter 2 gives this
signal as:
(3) y
p
(t)=a
p
×y
p
(t-1)+b
p
×(c
p
× p(t))+σ
p
where “p(t)” is the computed priority which is a function of value-based “v(t)” and
context-based “c(t)” inputs (equation (8) from Chapter 2):
(4) p(t)=v(t)×c(t)+σ
b
(t)
We expand this integration of differing signals to also involve inputs from the chunk
node. We show what the inputs from the chunk node “l” look like below, but here show
how arbitration between these two sources of ‘priority’ are managed:
(5) p(t)= A(v(t)×c(t))+(1-A)(l(t))+σ
b
(t)
where “A” is a variable that weighs the contributions from both sources of input. At the
start of a new list, “A=1” but through learning processes to increase the influence of the
chunk node, “A” is learned as follows:
(6) A= A−α
a
×r
p
(t)
where “α
a
” is the learning rate for this learning process.
Up through here, we can see how a chunk node becomes selectively activated
upon presentation of a particular list – at least following enough training. Additionally,
we can see how that node can come to dominate the inputs to the planning layer, at the
expense of the ‘default’ priority computation. Now, we show what the inputs projected
from this chunk node look like. To begin, during the acquisition of a list, we model a
competitive buffer that takes as input the efferent copy of each selection, over time (see
102
Figure 6.1). The buffer contains ordered slots that inhibit each other proportionally to
their activation. The first selection’s efferent copy is inputted into the first slot, which –
due to it being first – rises in activation to maximal levels. Each successive selection gets
inputted, but must compete against the pre-existing inhibition and so only reaches an
activation level proportional to the total activation in the buffer. The buffer is modeled
as:
(7) b(T)=1+a
b
(b(T−1)×W
b
)
where “T” is discrete-level timing (i.e., ‘selection 1’, ‘selection 2’, etc.), “a
b
” is a
constant, and “W
b
” is a weight matrix with inhibitory one-to-all connections and no self-
recurrence. For example, for 4-items, it would be:
(8) W
b
=
0 −1 −1 −1
−1 0 −1 −1
−1 −1 0 −1
−1 −1 −1 0
"
#
$
$
$
$
%
&
'
'
'
'
The resultant activations in “b(T)” at the end of a single list encode the order of selections
in the relative activations of each item, just as in CQ models (e.g., Bullock & Rhodes,
2003; Davelaar, 2007). Finally, it are these activations that determine the behavioral
priority inputs that are projected from the chunk node “l”:
(9) l=α
t
(l−b(T))
where “α
t
” is the learning rate for this process.
We algorithmically manage chunks and chunk sub-sequences when list length
exceeds 4 items. Thus, for a 7-item-long sequence, the model generates a list-level chunk
and two chunk sub-sequence nodes, the first to manage the list items ‘A-B-C-D’ and the
second to manage items ‘E-F-G’ (see Figure 6.2). The list-level chunk node projects
activations to both sub-sequence nodes with the first sub-sequence node projecting
inhibition to the second sub-sequence node, with the effect only one sub-sequence node is
activated at a time. Upon selection of a sub-sequence’s list items, the sub-sequence node
is inhibited: in the case of the 7-item-long list, the first sub-sequence node becomes
inhibited after item selection ‘D’, which then releases the second sub-sequence node from
inhibition and so selection of items ‘E-F’G’ immediately follows. In this way a nested
set of chunk nodes manages a sequence of seven items fluidly.
103
Figure 6.1. Chunk buffer and learning to plan sequential actions. As selections are
made based on computing a behavioral priority for each visual pattern (‘default’
priority, left column), efferent copy from each selection enters into a competitive
chunking buffer (middle column). The units in the buffer laterally inhibit other items
proportional to their activation. As new items enter the buffer, a gradient emerges as a
function of each item’s order: more recent entries to the buffer must compete against
strong inhibition, and so their resultant activation is relatively reduced. This gradient
encodes, spatially, the temporal information pertaining to list execution. Following
sufficient training, when a chunk node is selected it projects this learned gradient of
activation onto the competition network, facilitating more efficient, and more rapid (see
below) list execution.
104
Figure 6.2. Chunk hierarchy in the service of a 7-item-long list. Lists upwards of 7
items long can be handled by the chunk network. Upon selection of the appropriate list
chunk, if the unit is a higher-order list chunk (as here) it in turn activates its set of sub-
105
sequence chunk nodes (top right). Because of lateral inhibition from the first sub-
sequence node to the second, only the first is activated to begin. Finally, this sub-
sequence node projects to the competition network (not shown). The chunk ‘loads’ in
parallel (top left) the sub-sequence, encoding in the spatial gradient the temporal
information. As the model makes selections (successive item selections, as shown by the
discrete timing “T”), and as the sub-sequence is executed, feedback inhibits the sub-
sequence chunk node, releasing from inhibition the second node. Finally, this node
similarly projects to the competition network, loading in parallel the sub-sequence it
manages and driving the indicated items to selection, until the full sequence is executed.
We now show how we can situate the action recognition system in our observing
model amongst the modules during individual learning. The ACQ model sought to
define a role for the mirror neuron action recognition system during self-action: while
mirror neurons are often described to assist in the ‘understanding’ of others’ actions
(Rizzolatti, Fogassi, & Gallese, 2001; Rizzolatti, Fogassi, & Rizzolatti, 2014), it was
advanced by this model that such representations can facilitate learning about one’s own
actions, by differentially processing what action was intended, versus what action it
appeared (visually, for example) was actually performed. In this way, more rapid
reorganization of behavior can follow, since mirror systems processing just the visual
form of one’s own action can discover a novel sequence for achieving some goal.
Whereas in the ACQ model the intent of an action may diverge from ‘what it
looked like’ the action actually was, we do not model such discrepancies here.
Nonetheless, we can situate the action recognition system to process the visual form of
both another’s action (when in the observation condition) and one’s own action (see
Figure 6.3).
Additionally, the temporal credit assignment for this task, where explicit reward
feedback is removed from the initial selections of list items, may demand short-term
memory resources. We can model a short-term memory buffer for both individual
learning conditions and observational learning conditions, holding in the buffer the order
of selected items, the temporal context state associated with that selection, and once
reward is finally delivered (to self, or to another) use that feedback information to
106
modulate the relevant weights (see Figures 6.3 and 6.4). The learning algorithms are as
described in their respective Chapters.
Figure 6.3. Supplementary model extension for action recognition system. We show
here how the action recognition system is to be situated during both observational
learning, and individual learning. Whereas (Bonaiuto & Arbib, 2010) modeled such a
system participating in the trial-and-error learning of instrumental action sequences, we
do not offer a principled role for the network here, but still show that it is capable of
being conceptualized amongst the diverse modules of our model. Additionally, due to the
temporal credit assignment problem, we suggest a short-term memory buffer can
maintain past state+item associations – for both individual learning and observational
learning – until explicit feedback is received before modulating the correlated synaptic
weights as a function of the feedback.
107
Figure 6.4. Short-term memory buffer for individual and observational learning.
During individual learning, as selections are made, efferent copy and visual form
analysis of the action are stored in conjunction with the coincident temporal context state
until explicit feedback can upgrade or downgrade the relevant connections. During
observational learning, the same holds, except no efferent copy is available and so only
the action as recognized by the action recognition module is available for learning.
6.2 METHODS AND SIMULATION RESULTS
We simulated 4-item-long and 7-item-long lists with chunk nodes available to
assist in the acquisition and execution of learned lists. We assessed the differences
between timing and simulated neural responses between ‘default’ conditions as presented
in the primary text, and here. Figure 6.5 shows differences in reaction-times and inter-
response intervals. Figure 6.6 shows differences in simulated neural activity. Both
results comport with the notion that chunking in this way can facilitate planning of
actions in advance of execution, through the use of parallel spatial codes translated
through competitive interactions into a temporal series of actions. The activations reveal
planning of actions in advance (and even reveal ‘chunk boundaries’ between list items
‘D’ and ‘E’), and the timing shows that, despite slower reaction times, more rapid overall
execution of lists is achieved – driven by shorter inter-response intervals.
108
Figure 6.5. Timing results between the ‘default model’ and the ‘chunk-based’ model.
(Top) The average simulation time at each item selection is shown. The time taken until
the selection of the first item, “A”, is the reaction-time. (Bottom) For each selection after
the first item, the inter-response intervals are given by subtracting the values in the top
row by the reaction-time from the appropriate column. The reaction-time for the
‘default’ model is significantly shorter than for the ‘chunk-based’ model, though the
inter-response intervals (the bottom row’s values) are significantly shorter for the
109
‘chunk-based’ model as compared to the ‘default’ model. We suggest that due to early
strong competition as a result of significant parallel activations in the competition
network when driven by a chunk node, the ‘chunk-based’ model is slower to make an
initial selection, but is more rapid in driving successive selections due to parallel,
planning-related activations. This results in overall more rapid list execution, and
shorter inter-response intervals.
110
Figure 6.6. Simulated neural responses between the ‘default model’ and the ‘chunk-
based’ model. (Top) The top set of figures correspond to the ‘default’ model based on
the computation of behavioral priority outlined in Chapter 1. (Bottom) The bottom set of
figures correspond to the ‘chunk-based’ model we have detailed here. (Top rows) The
activations in the priority map are here visualized, with time along the x-axis and the 2-d
surface of the map rendered as a 1-D surface here, arranged along the y-axis. Warmer
colors indicate greater activation, and cooler colors indicate less activation. The
planning-related activations in the ‘chunk-based’ model are clearly seen, while the lack
of planning-related activations in the ‘default’ model are also observed. (Middle rows)
Firing rates for the competition layer neurons, with time along the x-axis, and firing-rate
along the y-axis. (Bottom rows) Firing rates for the planning layer neurons, with time
along the x-axis and firing rate along the y-axis. In both of these traces, more planning-
related activations are observed in the ‘chunk-based’ mode than the ‘default’ model.
NOTE: the priority map activations JUST ARE the planning layer activations, but
identified in a topographically-defined map.
6.3 DISCUSSION
Planning and Serial Behavior
The notion that the brain can ‘plan in parallel’ is composed of two related, but
distinct, hypotheses. On the one hand, multiple potential actions may be planned
simultaneously, with a decision process selecting only one for execution. On the other
hand, a learned action sequence may be planned ‘in advance’, as the CQ models above
demonstrated, and as (Lashley, 1951) predicted, with ‘action elements’ executed serially.
It is now evident that both of these appear to be true. For example, (Cisek & Kalaska,
2002, 2005) showed that dPM can simultaneously encode in a directionally-specific
manner the possible location of a target when a monkey must move a manipulandum,
while (Klaes, Westendorff, Chakrabarti, & Gail, 2011) also showed that dPM and PRR
code possible goal locations for reaching targets. In both of these instances, spatially-
tuned neurons show activities correlating with these possible goal locations only for a
single selection to be made following task cues. These activities are anticipatory, and
facilitate increased reaction times and performance, and have been the focus of a recent
111
computational model(Cisek, 2006). Complementing the parallel ‘anticipatory’
activations above, (Baldauf, Cui, & Andersen, 2008) has shown that PRR can encode
(e.g., plan) in parallel well-learned multi-movement sequences prior to cue onset. These
findings from PRR during reaching tasks are similar to findings with planning of saccadic
eye movements, as demonstrated in LIP populations concurrently with PRR recordings
(Cui & Andersen, 2007). Similar patterns of results have also been shown in M1 (Lu &
Ashe, 2005) and LPFC (Averbeck et al., 2002; Mushiake et al., 2006; Saito, Mushiake,
Sakamoto, Itoyama, & Tanji, 2005) (for example, see 76.5).
Figure 6.7. Planning of multi-movement sequences. LPFC neurons in macaques were
isolated during a multi-movement task performed with a joystick. The neurons encoded a
motor act and their relative activations were predictive of their order in the unfolding
sequence of movements. [Adapted from (Averbeck et al., 2002).]
Overall, our framework for understanding these results is similar to the
Affordance Competition Hypothesis (Cisek, 2007) that holds that multiple plans for
action can be maintained in parallel across spatially-anchored behavioral maps (and see
(Cisek & Kalaska, 2010; Cisek, 2006; Hikosaka et al., 1999) for related
neurophysiological, behavioral and modeling data). (See Figure 6.8.) Representations
coding for possible plans of action, given (i) the affordances in the world, (ii) the goals of
112
the animal, and (iii) possible contextual (task) information, may be combined in spatially-
tuned behavioral priority maps throughout parietal and frontal cortices. What remains to
be seen is whether and to what extent parallel plans are maintained prior to having well-
rehearsed sequences encoded in memory – that is, how much ‘planning’ activity in these
areas are seen before the level of task expertise often developed in the trained laboratory
animals. As we show in our simulation results, and predicted by (Hikosaka et al., 1999),
we expect the gradual accumulation of planning related activity as competency increases
and ‘chunks’ for executing sequences become accessible (see below).
Figure 6.8. Affordance Competition Hypothesis. Overall, we take a position similar to
Cisek and others of multiple, competing plans distributed across large cortical areas
representing features of the environment, expectations of reward and structure in the
world, and possible action or action sets that can achieve goals (motivated states). Note
that distributed regions of the brain interact in serial, parallel and recurrent ‘looped’
pathways. [Adapted from Cisek (2007).]
Chunking and hierarchical representations
We show how learning higher-order features of a task can initialize, and then
evolve, a temporal signal that facilitates learning and performance of complex tasks.
NE33CH13-Cisek ARI 19March2010 20:9
Selection
rules
Object
identity
Attention
Behavior
biasing
Potential actions
Motor
command
Visual feedback
Predicted
feedback
Specifi cation
Selection
Dorsal stream
Ventral stream
Premotor cortex
Parietal cortex
Temporal
cortex
Basal
ganglia
Prefrontal
cortex
Behavioral
relevance
Payoff
Cerebellum
Figure 1
Sketchoftheaffordancecompetitionhypothesisinthecontextofvisually-guidedmovement.Theprimate
brainisshown,emphasizingthecerebralcortex,cerebellum,andbasalganglia.Darkbluearrowsrepresent
processesofactionspecification,whichbegininthevisualcortexandproceedrightwardacrosstheparietal
lobe,andwhichtransformvisualinformationintorepresentationsofpotentialactions.Polygonsrepresent
threeneuralpopulationsalongthisroute.Eachpopulationisdepictedasamapwherethelightestregions
correspondtopeaksoftunedactivity,whichcompeteforfurtherprocessing.Thiscompetitionisbiasedby
inputfromthebasalgangliaandprefrontalcorticalregionsthatcollectinformationforactionselection(red
double-line arrows).Thesebiasesmodulatethecompetitioninseveralloci,andbecauseofreciprocal
connectivity,theirinfluencesarereflectedoveralargeportionofthecerebralcortex.Thefinalselected
actionisreleasedintoexecutionandcausesovertfeedbackthroughtheenvironment(dotted blue arrow)as
wellasinternalpredictivefeedbackthroughthecerebellum.ModifiedwithpermissionfromCisek(2007).
Ifacompetitionbetweenrepresentationsof
potential actions exists in frontoparietal cir-
cuits, then intelligent behavior requires a way
toinfluencethatcompetitionbyfactorsrelated
to rewards, costs, risks, or any variable per-
tinent to making good choices. A variety of
brain systems can contribute their votes into
thisselectionprocesssimplybybiasingactivity
within the ongoing frontoparietal competition
(Figure 1,reddouble-linearrows).Thisin-
cludes influences from subcortical structures
suchasthebasalganglia(Mink1996,Redgrave
et al. 1999, Schultz 2004) and cortical regions
suchastheprefrontalcortex(Miller2000,Sak-
agami&Pan2007,Tanji&Hoshi2001,Wise
2008). In turn, the prefrontal areas receive in-
formationpertinenttoactionselectionthatin-
clude object identity from the temporal lobe
(Pasupathy&Connor2002,Tanakaetal.1991)
andsubjectivevaluefromtheorbitofrontalcor-
tex (Padoa-Schioppa & Assad 2008, Schultz
et al. 2000, Wallis 2007). In summary, the
hypothesis is that interaction with the envi-
ronment involves continuous and simultane-
ous processes of sensorimotor control and ac-
tionselectionfromamongthedistributedrep-
resentations of a limited number of response
options. This perspective is consistent with a
278 Cisek
·
Kalaska
113
(Rougier, Noelle, Braver, Cohen, & O’Reilly, 2005) have already shown how PFC
networks may learn, represent, and generalize the different rules of a WCST, and how
error feedback can re-organize these networks to implement a different rule – and so a
different response strategy. This model was followed by a more detailed model of Basal
Ganglia (BG)-PFC interactions involved in learning hierarchical tasks, and how BG loops
and reinforcement learning can ‘train’ PFC networks through leveraging fortuitous
associations (O’Reilly & Frank, 2006). Hierarchical behavior has also been treated by
(Cooper & Shallice, 2006; Cooper & Shallice, 2000), and earlier by (Dehaene &
Changeux, 1997), using localist, trajectory-level networks to model routine, hierarchical
behaviors like ‘making coffee’, or to solve hierarchical tasks like the Tower of Hanoi,
respectively. These localist, schema hierarchies have been criticized by Botvinick and
colleagues (Botvinick & Plaut, 2006; Botvinick & Plaut, 2004), who have reported
results using distributed connectionist networks in an event-level model. Among other
results, they show that representations for actions are modulated by contextual signals
(e.g., distal goals), whereby an action like ‘add sugar’ may be represented by
overlapping, but partially-separable populations when in the context of ‘make coffee’
versus ‘make sugar’. Elsewhere, however, Botvinick and colleagues have taken a more
‘constitutive’ hierarchical approach (Botvinick, 2007), examining Hierarchical
Reinforcement Learning (HRL) approaches to structured behavioral problems, and have
even examined the learning and use of ‘chunks’ (Botvinick, Niv, & Barto, 2009). They
implicate several PFC regions, including OFC and LPFC, involved in the learning and
management of hierarchical representations. Lastly, some modelers have approached
these questions starting from more mathematically rigorous positions, including from
Bayesian inference (Braun, Waldert, Aertsen, Wolpert, & Mehring, 2010) and from the
formalisms of Reinforcement Learning (Doya, 2002).
114
References
Aboitiz, F. (2012). Gestures, vocalizations, and memory in language origins. Frontiers in
Evolutionary Neuroscience, 4(2), 1–15.
Andersen, R. A. (1997). Multimodal integration for the representation of space in the
posterior parietal cortex. Philosophical Transactions of the Royal Society of London.
Series B, Biological Sciences, 352(1360), 1421–1428.
Andersen, R. A., Snyder, L. H., Bradley, D. C., & Xing, J. (1997). Multimodal
representation of space in the posterior parietal cortex and its use in planning
movements. Annual Review of Neuroscience, 20, 303–330.
Arbib, M. A. (2012). How the brain got language: the mirror system hypothesis. Oxford:
Oxford University Press.
Arbib, M. A., Gasser, B., & Barrès, V. (2014). Language is handy but is it embodied?
Neuropsychologia, 55(1), 57–70.
Arbib, M. A., Liebal, K., & Pika, S. (2008). Primate vocalization, gesture, and the
evolution of human language. Current Anthropology, 49(6), 1053–1076.
Arbib, M. A., Ganesh, V., & Gasser, B. (2014). Dyadic brain modelling, mirror systems
and the ontogenetic ritualization of ape gesture. Philosophical Transactions of the
Royal Society of London. Series B, Biological Sciences, 369(1644), 20130414.
Arcizet, F., Mirpour, K., & Bisley, J. W. (2011). A pure salience response in posterior
parietal cortex. Cerebral Cortex, 21(11), 2498–2506.
Averbeck, B. B., Chafee, M. V, Crowe, D. A., & Georgopoulos, A. P. (2002). Parallel
processing of serial movements in prefrontal cortex. Proceedings of the National
Academy of Sciences of the United States of America, 99(20), 13172–13177.
Azzi, J. C. B., Sirigu, A., & Duhamel, J.-R. (2012). Modulation of value representation
by social context in the primate orbitofrontal cortex. Proceedings of the National
Academy of Sciences, 109(6), 2126–2131.
Balan, P. F., & Gottlieb, J. (2009). Functional significance of nonspatial information in
monkey lateral intraparietal area. The Journal of Neuroscience, 29(25), 8166–8176.
Baldauf, D., Cui, H., & Andersen, R. A. (2008). The posterior parietal cortex encodes in
parallel both goals for double-reach sequences. The Journal of Neuroscience,
28(40), 10081–10089.
115
Bard, K. A., Dunbar, S., Maguire-Herring, V., Veira, Y., Hayes, K. G., & Mcdonald, K.
(2014). Gestures and social-emotional communicative development in chimpanzee
infants. American Journal of Primatology, 76(1), 14–29.
Bard, K. a., & Leavens, D. (2011). Socio-emotional factors in the development of joint
attention in human and ape infants. Journal of Cognitive Education and Psychology,
10(1), 9–31.
Barone, P., & Joseph, J. P. (1989). Prefrontal cortex and spatial sequencing in macaque
monkey. Experimental Brain Research, 78(3), 447–464.
Bendiksby, M. S., & Platt, M. L. (2006). Neural correlates of reward and attention in
macaque area LIP. Neuropsychologia, 44(12), 2411–2420.
Berdyyeva, T. K., & Olson, C. R. (2009). Monkey supplementary eye field neurons
signal the ordinal position of both actions and objects. The Journal of Neuroscience,
29(3), 591–599.
Berdyyeva, T. K., & Olson, C. R. (2010). Rank signals in four areas of macaque frontal
cortex during selection of actions and objects in serial order. Journal of
Neurophysiology, 104(1), 141–159.
Berdyyeva, T. K., & Olson, C. R. (2011). Relation of ordinal position signals to the
expectation of reward and passage of time in four areas of the macaque frontal
cortex. Journal of Neurophysiology, 105(5), 2547–2559.
Bhutani, N., Sureshbabu, R., Farooqui, A. A., Behari, M., Goyal, V., & Murthy, A.
(2013). Queuing of concurrent movement plans by basal ganglia. Journal of
Neuroscience, 33(24), 9985–9997.
Bisley, J. W., & Goldberg, M. E. (2003). Neuronal activity in the lateral intraparietal area
and spatial attention. Science (New York, N.Y.), 299(5603), 81–86.
Bisley, J., W., & Goldberg, M. E. (2010). Attention, intention, and priority in the parietal
lobe. Annual Review of Neuroscience, 33, 1–21.
Bonaiuto, J., & Arbib, M. A. (2010). Extending the mirror neuron system model, II: What
did i just do? a new role for mirror neurons. Biological Cybernetics, 102(4), 341–
359.
Bonaiuto, J., & Arbib, M. A. (2015). Learning to grasp and extract affordances: the
integrated learning of grasps and affordances (ILGA) model. Biological
116
Cybernetics, 109(6), 1–31.
Bonaiuto, J., Rosta, E., & Arbib, M. A. (2007). Extending the mirror neuron system
model, I: Audible actions and invisible grasps. Biological Cybernetics, 96(1), 9-38.
Botvinick, M. M. (2007). Multilevel structure in behaviour and in the brain: a model of
Fuster’s hierarchy. Philosophical Transactions of the Royal Society of London.
Series B, Biological Sciences, 362(1485), 1615–1626.
Botvinick, M. M., Niv, Y., & Barto, A. C. (2009). Hierarchically organized behavior and
its neural foundations: a reinforcement-learning perspective. Cognition, 113(3),
262–280.
Botvinick, M. M., & Plaut, D. C. (2006a). Short-term memory for serial order: a recurrent
neural network model. Psychological Review, 113(2), 201–233.
Botvinick, M. M., & Plaut, D. C. (2006b). Such stuff as habits are made on: a reply to
Cooper and Shallice (2006). Psychological Review, 113(4), 917–927.
Botvinick, M. M., & Plaut, D. C. (2004). Doing without schema hierarchies: a recurrent
connectionist approach to normal and impaired routine sequential action.
Psychological Review, 111(2), 395–429.
Botvinick, M. M., & Watanabe, T. (2007). From numerosity to ordinal rank: a gain-field
model of serial order representation in cortical working memory. The Journal of
Neuroscience, 27(32), 8636–8642.
Braun, D. A., Waldert, S., Aertsen, A., Wolpert, D. M., & Mehring, C. (2010). Structure
learning in a sensorimotor association task. PLoS ONE, 5(1), 3–10.
Bullock, D., & Grossberg, S. (1988). Neural dynamics of planned arm movements:
emergent invariants and speed-accuracy properties during trajectory formation.
Psychological Review, 95(1), 49–90.
Bullock, D., & Rhodes, B. (2003). Competitive queuing for serial planning and
performance. In M. A. Arbib (Ed.), Handbook of brain theory and neural networks,
Vol 2 (pp. 241–244). Cambridge, MA: MIT Press.
Byrne, R. W. (2003). Imitation as behaviour parsing. Philosophical Transactions of the
Royal Society of London. Series B, Biological Sciences, 358(1431), 529–536.
Byrne, R. W., & Russon, A. E. (1998). Learning by imitation: a hierarchical approach.
Behavioral and Brain Sciences, 21(5), 667–684.
117
Byrne, R. W., & Whiten, A. (1988). Machiavellian intelligence: social expertise and the
evolution of the intellect in monkeys, apes, and humans. Oxford: Claredon Press.
Caggiano, V., Fogassi, L., Rizzolatti, G., Casile, A., Giese, M. A., & Thier, P. (2012).
Mirror neurons encode the subjective value of an observed action. Proceedings of
the National Academy of Sciences of the United States of America, 109(29), 11848–
11853.
Call, J. (2006). Inferences by exclusion in the great apes: the effect of age and species.
Animal Cognition, 9(4), 393–403.
Campos, M., Breznen, B., & Andersen, R. A. (2010). A neural representation of
sequential states within an instructed task. Journal of Neurophysiology, 104(5),
2831–2849.
Campos, M., Breznen, B., Bernheim, K., & Andersen, R. A. (2005). Supplementary
motor area encodes reward expectancy in eye-movement tasks. Journal of
Neurophysiology, 94(2), 1325–1335.
Carpenter, A. F., Georgopoulos, A. P., & Pellizzer, G. (1999). Motor cortical encoding
of serial order in a context-recall task. Science, 283(5408), 1752–1757.
Cartmill, E. A., & Byrne, R. W. (2007). Orangutans modify their gestural signaling
according to their audience’s comprehension. Current Biology, 17(15), 1345–1348.
Chang, S. W. C., Gariépy, J.-F., & Platt, M. L. (2013). Neuronal reference frames for
social decisions in primate frontal cortex. Nature Neuroscience, 16(2), 243–50.
Chen, S., Swartz, K. B., & Terrace, H. S. (1997). Knowledge of the ordinal position of
list items in rhesus monkeys. Psychological Science, 8(2), 80–86.
Chersi, F. (2011). Neural mechanisms and models underlying joint action. Experimental
Brain Research, 211(3-4), 643–653.
Chersi, F., Ferrari, P. F., & Fogassi, L. (2011). Neuronal chains for actions in the parietal
lobe: A computational model. PLoS ONE, 6(11), 1–15.
Cisek, P. (2006). Integrated neural processes for defining potential actions and deciding
between them: a computational model. The Journal of Neuroscience : The Official
Journal of the Society for Neuroscience, 26(38), 9761–9770.
Cisek, P. (2007). Cortical mechanisms of action selection: the affordance competition
hypothesis. Philosophical Transactions of the Royal Society of London. Series B,
118
Biological Sciences, 362(1485), 1585–1599.
Cisek, P., & Kalaska, J. F. (2002). Simultaneous encoding of multiple potential reach
directions in dorsal premotor cortex. Journal of Neurophysiology, 87(2), 1149–1154.
Cisek, P., & Kalaska, J. F. (2005). Neural correlates of reaching decisions in dorsal
premotor cortex: specification of multiple direction choices and final selection of
action. Neuron, 45(5), 801–814.
Cisek, P., & Kalaska, J. F. (2010). Neural mechanisms for interacting with a world full of
action choices. Annual Review of Neuroscience, 33, 269–298.
Clower, W. T., & Alexander, G. E. (1998). Movement sequence-related activity
reflecting numerical order of components in supplementary and presupplementary
motor areas. Journal of Neurophysiology, 80(3), 1562–1566.
Colby, C. L., Duhamel, J. R., & Goldberg, M. E. (1996). Visual, presaccadic, and
cognitive activation of single neurons in monkey lateral intraparietal area. Journal of
Neurophysiology, 76(5), 2841–2852.
Colby, C. L., & Goldberg, M. E. (1999). Space and attention in parietal cortex. Annual
Review of Neuroscience, 22, 319–349.
Cook, P., & Wilson, M. (2010). Do young chimpanzees have extraordinary working
memory? Psychonomic Bulletin & Review, 17(4), 599–600.
Cooper, R. P., & Shallice, T. (2006). Hierarchical schemas and goals in the control of
sequential behavior. Psychological Review, 113(4), 887–916.
Cooper, R., & Shallice, T. (2000). Contention scheduling and the control of routine
activities. Cognitive Neuropsychology, 17(4), 297–338.
Cromwell, H. C., & Schultz, W. (2003). Effects of expectations for different reward
magnitudes on neuronal activity in primate striatum. Journal of Neurophysiology,
89(5), 2823–2838.
Cui, H., & Andersen, R. A. (2007). Posterior Parietal Cortex Encodes Autonomously
Selected Motor Plans. Neuron, 56(3), 552–559.
D’Amato, M. R., & Colombo, M. (1989). Serial learning with wild card items by
monkeys (cebus apella): implications for knowledge of ordinal position. Journal of
Comparative Psychology, 103(3), 252–261.
Davelaar, E. J. (2007). Sequential retrieval and inhibition of parallel (re)activated
119
representations: a neurocomputational comparison of competitive queuing and
resampling models. Adaptive Behavior, 15(1), 51–71.
Dean, L. G., Kendal, R. L., Schapiro, S. J., Thierry, B., & Laland, K. N. (2012).
Identification of the social and cognitive processes underlying human cumulative
culture. Science, 335(6072), 1114-1118.
Dehaene, S., & Changeux, J. P. (1997). A hierarchical neuronal network for planning
behavior. Proceedings of the National Academy of Sciences of the United States of
America, 94(24), 13293–13298.
di Pellegrino, G., Fadiga, L., Fogassi, L., Gallese, V., & Rizzolatti, G. (1992).
Understanding motor events: a neurophysiological study. Experimental Brain
Research, 91(1), 176–180.
Ding, L., & Hikosaka, O. (2006). Comparison of reward modulation in the frontal eye
field and caudate of the macaque. The Journal of Neuroscience, 26(25), 6695–6703.
Dominey, P., Arbib, M. A., & Joseph, J.-P. (1995). A Model of Corticostriatal Plasticity
for Learning Oculomotor Associations and Sequences. Journal of Cognitive
Neuroscience, 7(3), 311–336.
Dominey, P. F., & Arbib, M. A. (1992). A cortcio-subcortical model for generation of
spatialy accurate sequential saccades. Cerebral Cortex, 2, 153–175.
Dorris, M. C., & Glimcher, P. W. (2004). Activity in posterior parietal cortex is
correlated with the relative subjective desirability of action. Neuron, 44(2), 365–378.
Doya, K. (2002). Metalearning and neuromodulation. Neural Networks, 15(4-6), 495–
506.
Elisabetta, V., Haslam, M., Spagnoletti, N., & Fragaszy, D. (2013). Use of stone hammer
tools and anvils by bearded capuchin monkeys over time and space: construction of
an archeological record of tool use. Journal of Archaeological Science, 40(8), 3222–
3232.
Erlhagen, W., Mukovskiy, A., Bicho, E., Panin, G., Kiss, C., Knoll, A., … Bekkering, H.
(2006). Goal-directed imitation for robots: A bio-inspired approach to action
understanding and skill learning. Robotics and Autonomous Systems, 54, 353–360.
Ferrari, P. F., Gallese, V., Rizzolatti, G., & Fogassi, L. (2003). Mirror neurons
responding to the observation of ingestive and communicative mouth actions in the
120
monkey ventral premotor cortex. European Journal of Neuroscience, 17(8), 1703–
1714.
Ferreira, F., Erlhagen, W., & Bicho, E. (2011). A dynamic field model of ordinal and
timing properties of sequential events. In T. Honkela, W. Dutch, M. Giorlami, S.
Kaski (Eds.), Lecture Notes in Computer Science (pp. 325–332). Springer-Verlag.
Fogassi, L., Ferrari, P. F., Gesierich, B., Rozzi, S., Chersi, F., & Rizzolatti, G. (2005).
Parietal lobe: from action organization to intention understanding. Science,
308(5722), 662–667.
Fragaszy, D. M., Biro, D., Eshchar, Y., Humle, T., Izar, P., Resende, B., & Visalberghi,
E. (2013). The fourth dimension of tool use: temporally enduring artefacts aid
primates learning to use tools. Philosophical Transactions of the Royal Society of
London. Series B, Biological Sciences, 368(1630), 20120410.
Freedman, D. J., & Assad, J. A. (2006). Experience-dependent representation of visual
categories in parietal cortex. Nature, 443(7107), 85–88.
Fujii, N., Hihara, S., Nagasaka, Y., & Iriki, A. (2008). Social state representation in
prefrontal cortex. Social Neuroscience, 4(1), 73–84.
Gallese, V., Fadiga, L., Fogassi, L., & Rizzolatti, G. (1996). Action recognition in the
premotor cortex. Brain, 119, 593–609.
Gariépy, J.-F., Watson, K. K., Du, E., Xie, D. L., Erb, J., Amasino, D., & Platt, M. L.
(2014). Social learning in humans and other animals. Frontiers in Neuroscience,
8(58).
Gasser, B., Cartmill, E. A., & Arbib, M. A. (2014). Ontogenetic ritualization of primate
gesture as a case study in dyadic brain modeling. Neuroinformatics, 12(1), 93–109.
Genty, E., Breuer, T., Hobaiter, C., & Byrne, R. W. (2009). Gestural communication of
the gorilla (gorilla gorilla): repertoire, intentionality and possible origins. Animal
Cognition, 12(3), 527–546.
Gillespie-Lynch, K., Greenfield, P., Lyn, H., & Savage-Rumbaugh, S. (2014). Gestural
and symbolic development among apes and humans: support for a multimodal
theory of language evolution. Frontiers in Psychology, 5, 1228.
Gottlieb, J. (2007). From thought to action: the parietal cortex as a bridge between
perception, action, and cognition. Neuron, 53(1), 9–16.
121
Hartley, T., & Houghton, G. (1996). A linguistically constrained model of short-term
memory for nonwords. Journal of Memory and Language, 35, 1–31.
Hauser, M. D., Carey, S., & Hauser, L. B. (2000). Spontaneous number representation in
semi-free-ranging rhesus monkeys. Proceedings of the Royal Society of London.
Series B, Biological Sciences, 267(1445), 829–833.
Hauser, M. D., MacNeilage, P., & Ware, M. (1996). Numerical representations in
primates. Proceedings of the National Academy of Sciences of the United States of
America, 93, 1514–1517.
Hecht, E. E., Gutman, D. A., Preuss, T. M., Sanchez, M. M., Parr, L. A., & Rilling, J. K.
(2013). Process versus product in social learning: comparative diffusion tensor
imaging of neural systems for action execution-observation matching in macaques,
chimpanzees, and humans. Cerebral Cortex, 23(5), 1014–1024.
Hecht, E. E., Murphy, L. E., Gutman, D. A., Votaw, J. R., Schuster, D. M., Preuss, T. M.,
… Parr, L. A. (2013). Differences in neural activation for object-directed grasping in
chimpanzees and humans. Journal of Neuroscience, 33(35), 14117–14134.
Hikosaka, O., Nakahara, H., Rand, M. K., Sakai, K., Lu, X., Nakamura, K., … Doya, K.
(1999). Parallel neural networks for learning sequential procedures. Trends in
Neurosciences, 22(10), 464–471.
Hobaiter, C., & Byrne, R. W. (2011a). The gestural repertoire of the wild chimpanzee.
Animal Cognition, 14(5), 745–767.
Hobaiter, C., & Byrne, R. W. (2011b). Serial gesturing by wild chimpanzees: Its nature
and function for communication. Animal Cognition, 14(6), 827–838.
Hobaiter, C., & Byrne, R. W. (2014). The meanings of chimpanzee gestures. Current
Biology, 24(14), 1596–1600.
Horner, V., & Whiten, A. (2005). Causal knowledge and imitation/emulation switching
in chimpanzees (Pan troglodytes) and children (Homo sapiens). Animal Cognition,
8(3), 164–181.
Hosokawa, T., & Watanabe, M. (2012). Prefrontal neurons represent winning and losing
during competitive video shooting games between monkeys. Journal of
Neuroscience, 32(22), 7662–7671.
Inoue, S., & Matsuzawa, T. (2007). Working memory of numerals in chimpanzees.
122
Current Biology, 17(23), 1004–1005.
Ipata, A. E., Gee, A. L., Goldberg, M. E., & Bisley, J. W. (2006). Activity in the lateral
intraparietal area predicts the goal and latency of saccades in a free-viewing visual
search task. The Journal of Neuroscience, 26(14), 3656–3661.
Ipata, A. E., Gee, A. L., Gottlieb, J., Bisley, J. W., & Goldberg, M. E. (2006). LIP
responses to a popout stimulus are reduced if it is overtly ignored. Nat Neurosci,
9(8), 1071–1076.
Isoda, M., & Tanji, J. (2003). Contrasting neuronal activity in the supplementary and
frontal eye fields during temporal organization of multiple saccades. Journal of
Neurophysiology, 90(5), 3054–3065.
Jaeger, H., Maass, W., & Principe, J. (2007). Special issue on echo state networks and
liquid state machines. Neural Networks, 20(3), 287–289.
Jensen, G., Altschul, D., Danly, E., & Terrace, H. S. (2013). Transfer of a serial
representation between two distinct tasks by rhesus macaques. PLoS ONE, 8(7).
Kennerley, S. W., Dahmubed, A. F., Lara, A. H., & Wallis, J. D. (2009). Neurons in the
frontal lobe encode the value of multiple decision variables. Journal of Cognitive
Neuroscience, 21(6), 1162–1178.
Kennerley, S. W., & Wallis, J. D. (2009a). Evaluating choices by single neurons in the
frontal lobe: outcome value encoded across multiple decision variables. European
Journal of Neuroscience, 29(10), 2061–2073.
Kennerley, S. W., & Wallis, J. D. (2009b). Reward-dependent modulation of working
memory in lateral prefrontal cortex. The Journal of Neuroscience, 29(10), 3259–
3270.
Klaes, C., Westendorff, S., Chakrabarti, S., & Gail, A. (2011). Choosing goals, not rules:
deciding among rule-based action plans. Neuron, 70(3), 536–548.
Klein, J. T., & Platt, M. L. (2013). Social information signaling by neurons in primate
striatum. Current Biology, 23(8), 691–696.
Klein, R. M. (2000). Inhibition of return. Trends in Cognitive Sciences, 4(4), 138–147.
Krueger, K. A., & Dayan, P. (2009). Flexible shaping: how learning in small steps helps.
Cognition, 110(3), 380–394.
Lashley, K. S. (1951). The problem of serial order in behavior. In L. A. Jeffress (Ed.),
123
Cerebral Mechanisms in Behavior, 112-131. Wiley: New York, NY.
Lopes, M., Melo, F. S., Kenward, B., & Santos-Victor, J. (2009). A computational model
of social-learning mechanisms. Adaptive Behavior, 17(6), 467–483.
Lu, X., & Ashe, J. (2005). Anticipatory activity in primary motor cortex codes
memorized movement sequences. Neuron, 45(6), 967–973.
Luef, E. M., & Liebal, K. (2012). Infant-directed communication in lowland gorillas
(gorilla gorilla): do older animals scaffold communicative competence in infants?
American Journal of Primatology, 74(9), 841–852.
Medendorp, W. P., Buchholz, V. N., Van Der Werf, J., & Leoné, F. T. M. (2011).
Parietofrontal circuits in goal-oriented behaviour. European Journal of
Neuroscience, 33(11), 2017–2027.
Merritt, D. J., & Terrace, H. S. (2011). Mechanisms of inferential order judgments in
humans (homo sapiens) and rhesus macaques (macaca mulatta). Journal of
Comparative Psychology, 125(2), 227–238.
Mushiake, H., Saito, N., Sakamoto, K., Itoyama, Y., & Tanji, J. (2006). Activity in the
lateral prefrontal cortex reflects multiple steps of future events in action plans.
Neuron, 50(4), 631–641.
Nieder, A., & Dehaene, S. (2009). Representation of number in the brain. Annual Review
of Neuroscience, 32, 185–208.
Nieder, A., Diester, I., & Tudusciuc, O. (2006). Temporal and Spatial Enumeration
Processes in the Primate Parietal Cortex. Science , 313 (5792 ), 1431–1435.
Nieder, A., & Merten, K. (2007). A labeled-line code for small and large numerosities in
the monkey prefrontal cortex. The Journal of Neuroscience, 27(22), 5986–5993.
Nieder, A., & Miller, E. K. (2004). A parieto-frontal network for visual numerical
information in the monkey. Proceedings of the National Academy of Sciences of the
United States of America, 101(19), 7457–7462.
O’Reilly, R. C., & Frank, M. J. (2006). Making working memory work: a computational
model of learning in the prefrontal cortex and basal ganglia. Neural Computation,
18(2), 283–328.
Orlov, T., Yakovlev, V., Amit, D., Hochstein, S., & Zohary, E. (2002). Serial memory
strategies in macaque monkeys: behavioral and theoretical aspects. Cerebral Cortex,
124
12(3), 306–317.
Orlov, T., Yakovlev, V., Hochstein, S., & Zohary, E. (2000). Macaque monkeys
categorize images by their ordinal number. Nature, 404(6773), 77–80.
Oztop, E., & Arbib, M. A. (2002). Schema design and implementation of the grasp-
related mirror neuron system. Biological Cybernetics, 87(2), 116–140.
Oztop, E., Wolpert, D., & Kawato, M. (2005). Mental state inference using visual control
parameters. Cognitive Brain Research, 22(2), 129–151.
Page, M., & Norris, D. (1998). The primacy model: a new model of immediate serial
recall. Psychological Review, 105(4), 761–781.
Pan, X., Fan, H., Sawa, K., Tsuda, I., Tsukada, M., & Sakagami, M. (2014). Reward
inference by primate prefrontal and striatal neurons. Journal of Neuroscience, 34(4),
1380–1396.
Pan, X., Sawa, K., Tsuda, I., Tsukada, M., & Sakagami, M. (2008). Reward prediction
based on stimulus categorization in primate lateral prefrontal cortex. Nature
Neuroscience, 11(6), 703–712.
Peck, C. J., Jangraw, D. C., Suzuki, M., Efem, R., & Gottlieb, J. (2009). Reward
modulates attention independently of action value in posterior parietal cortex. The
Journal of Neuroscience, 29(36), 11182–11191.
Platt, M. L., & Glimcher, P. W. (1999). Neural correlates of decision variables in parietal
cortex. Nature, 400(6741), 233–238.
Pollick, A. S., & de Waal, F. B. M. (2007). Ape gestures and language evolution.
Proceedings of the National Academy of Sciences of the United States of America,
104(19), 8184–8189.
Rizzolatti, G., Fadiga, L., Gallese, V., & Fogassi, L. (1996). Premotor cortex and the
recognition of motor actions. Cognitive Brain Research, 3(2), 131–141.
Rizzolatti, G., Fogassi, L., & Gallese, V. (2001). Neurophysiological mechanisms
underlying the understanding and imitation of action. Nature Reviews Neuroscience,
2(9), 661–670.
Rizzolatti, G., Fogassi, L., & Rizzolatti, G. (2014). The mirror mechanism: recent
findinds and perspectives. Philosophical Transactions of the Royal Society of
London. Series B, Biological Sciences, 369(1644), 20130420.
125
Roesch, M. R., & Olson, C. R. (2003). Impact of expected reward on neuronal activity in
prefrontal cortex, frontal and supplementary eye fields and premotor cortex. Journal
of Neurophysiology, 90(3), 1766–1789.
Roesch, M. R., & Olson, C. R. (2005). Neuronal activity in primate orbitofrontal cortex
reflects the value of time. Journal of Neurophysiology, 94, 2457–2471.
Roitman, J. D., Brannon, E. M., & Platt, M. L. (2007). Monotonic coding of numerosity
in macaque lateral intraparietal area. PLoS Biology, 5(8), 1672–1682.
Rorie, A. E., Gao, J., McClelland, J. L., & Newsome, W. T. (2010). Integration of
sensory and reward information during perceptual decision-making in lateral
intraparietal cortex (LIP) of the macaque monkey. PLoS ONE, 5(2).
Rossano, F. & Liebal, K. (2014). “Requests” and “offers” in orangutans and human
infants. In P. Drew & E. Couper-Kuhlen (Eds.), Requesting in social interaction,
(pp. 335-363). Amsterdam: John Benjamins.
Rougier, N. P., Noelle, D. C., Braver, T. S., Cohen, J. D., & O’Reilly, R. C. (2005).
Prefrontal cortex and flexible cognitive control: rules without symbols. Proceedings
of the National Academy of Sciences of the United States of America, 102(20),
7338–7343.
Saga, Y., Iba, M., Tanji, J., & Hoshi, E. (2011). Development of multidimensional
representations of task phases in the lateral prefrontal cortex. The Journal of
Neuroscience, 31(29), 10648–10665.
Saito, N., Mushiake, H., Sakamoto, K., Itoyama, Y., & Tanji, J. (2005). Representation of
immediate and final behavioral goals in the monkey prefrontal cortex during an
instructed delay period. Cerebral Cortex, 15(10), 1535–1546.
Santos, G. S., Nagasaka, Y., Fujii, N., & Nakahara, H. (2011). Encoding of social state
information by neuronal activities in the macaque caudate nucleus. Social
Neuroscience, 7(1), 42–58.
Sauser, E. L., & Billard, A. G. (2006). Parallel and distributed neural models of the
ideomotor principle: An investigation of imitative cortical pathways. Neural
Networks, 19(3), 285–298.
Sawamura, H., Shima, K., & Tanji, J. (2002). Numerical representation for action in the
parietal cortex of the monkey. Nature, 415(6874), 918–922.
126
Sawamura, H., Shima, K., & Tanji, J. (2010). Deficits in action selection based on
numerical information after inactivation of the posterior parietal cortex in monkeys.
Journal of Neurophysiology, 104(2), 902–910.
Schneider, C., Call, J., & Liebal, K. (2012a). Onset and early use of gestural
communication in nonhuman great apes. American Journal of Primatology, 74(2),
102–113.
Schneider, C., Call, J., & Liebal, K. (2012b). What role do mothers play in the gestural
acquisition of bonobos (pan paniscus) and chimpanzees (pan troglodytes)?
International Journal of Primatology, 33(1), 246–262.
Schweighofer, N., & Doya, K. (2003). Meta-learning in reinforcement learning. Neural
Networks, 16(1), 5–9.
Seyfarth, R. M., & Cheney, D. L. (1986). Vocal development in vervet monkeys. Animal
Behaviour, 34(6), 1640–1658.
Shadlen, M. N., & Newsome, W. T. (2001). Neural basis of a perceptual decision in the
parietal cortex (area LIP) of the rhesus monkey. Journal of Neurophysiology, 86(4),
1916–36.
Shepherd, S. V, Klein, J. T., Deaner, R. O., & Platt, M. L. (2009). Mirroring of attention
by neurons in macaque parietal cortex. Proceedings of the National Academy of
Sciences of the United States of America, 106(23), 9489–9494.
Shima, K., Isoda, M., Mushiake, H., & Tanji, J. (2007). Categorization of behavioural
sequences in the prefrontal cortex. Nature, 445(7125), 315–318.
Silberberg, A., & Kearns, D. (2009). Memory for the order of briefly presented numerals
in humans as a function of practice. Animal Cognition, 12(2), 405–407.
Silver, M. R., Grossberg, S., Bullock, D., Histed, M. H., & Miller, E. K. (2012). A neural
model of sequential movement planning and control of eye movements: item-order-
rank working memory and saccade selection by the supplementary eye fields.
Neural Networks, 26(617), 29–58.
Snyder, L. H., Batista, A. P., & Andersen, R. A. (1997). Coding of intention in the
posterior parietal cortex. Nature, 386(6621), 167–170.
Spranger, M., & Steels, L. (2014). Discovering communication through ontogenetic
ritualisation. IEEE ICDL-EPIROB 2014 - 4th Joint IEEE International Conference
127
on Development and Learning and on Epigenetic Robotics, (3), 14–19.
Steels, L. (2003). Evolving grounded communication for robots. Trends in Cognitive
Sciences, 7(7), 308–312.
Subiaul, F., Cantlon, J. F., Holloway, R. L., & Terrace, H. S. (2004). Cognitive imitation
in rhesus macaques. Science, 305(5682), 407–410.
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: an introduction.
Cambridge: MIT Press.
Swartz, K. B., Chen, S. F., & Terrace, H. S. (1991). Serial learning by rhesus monkeys: I.
Acquisition and retention of multiple four-item lists. Journal of Experimental
Psychology. Animal Behavior Processes, 17(4), 396–410.
Swartz, K. B., Chen, S., & Terrace, H. S. (2000). Serial learning by rhesus monkeys: II.
Learning four-item lists by trial and error. Journal of Experimental Psychology.
Animal Behavior Processes, 26(3), 274–285.
Tanji, J., & Shima, K. (1994). Role for supplementary motor area cells in planning
several movements ahead. Nature, 371, 413-416.
Tennie, C., Hedwig, D., Call, J., & Tomasello, M. (2008). An experimental study of
nettle feeding in captive gorillas. American Journal of Primatology, 70(6), 584–593.
Terrace, H. S. (2001). Chunking and serially organized behavior in pigeons, monkeys and
humans. In R. G. Cook (ed.), Avian Visual Cognition, Comparative Cognition Press,
Medford, MA.
Terrace, H. S. (2005). The simultaneous chain: a new approach to serial learning. Trends
in Cognitive Sciences, 9(4), 202–210.
Terrace, H. S., Son, L. K., & Brannon, E. M. (2003). Serial expertise of rhesus macaques.
Psychological Science, 14(1), 66–73.
Thura, D., Beauregard-Racine, J., Fradet, C.-W., & Cisek, P. (2012). Decision-making by
urgency-gating: theory and experimental support. Journal of Neurophysiology,
2912–2930.
Tomasello, M., & Call, J. (2011). Methodological challenges to the study of primate
cognition. Science, 334(6060), 1227.
Tomasello, M., Gust, D., & Frost, G. T. (1989). A longitudinal investigation of gestural
communication in young chimpanzees. Primates, 30(1), 35–50.
128
Tudusciuc, O., & Nieder, A. (2009). Contributions of primate prefrontal and posterior
parietal cortices to length and numerosity representation. Journal of
Neurophysiology, 101(6), 2984–2994.
Umiltà, M. A., Kohler, E., Gallese, V., Fogassi, L., Fadiga, L., Keysers, C., & Rizzolatti,
G. (2001). I know what you are doing: a neurophysiological study. Neuron, 31(1),
155–165.
Wallis, J. D. (2007). Orbitofrontal cortex and its contribution to decision-making. Annual
Review of Neuroscience, 30, 31–56.
Wallis, J. D., & Kennerley, S. W. (2011). Contrasting reward signals in the orbitofrontal
cortex and anterior cingulate cortex. Annals of the New York Academy of Sciences,
1239(1), 33–42.
Wallis, J. D., & Miller, E. K. (2003). Neuronal activity in primate dorsolateral and orbital
prefrontal cortex during performance of a reward preference task. European Journal
of Neuroscience, 18(7), 2069–2081.
Watanabe, M. (2007). Role of anticipated reward in cognitive behavioral control. Current
Opinion in Neurobiology, 17(2), 213–219.
Watanabe, M., & Sakagami, M. (2007). Integration of cognitive and motivational context
information in the primate prefrontal cortex. Cerebral Cortex, 17, 101–109.
Whiten, A. (2005). The second inheritance system of chimpanzees and humans. Nature,
437(7055), 52–55.
Whiten, A., Hinde, R. A., Laland, K. N., & Stringer, C. B. (2011). Culture evolves.
Philosophical Transactions of the Royal Society of London. Series B, Biological
Sciences, 366(1567), 938–48.
Yoshida, K., Saito, N., Iriki, A., & Isoda, M. (2011). Representation of others’ action by
neurons in monkey medial frontal cortex. Current Biology, 21(3), 249–253.
Yoshida, K., Saito, N., Iriki, A., & Isoda, M. (2012). Social error monitoring in macaque
frontal cortex. Nature Neuroscience, 15(9), 1307–1312.
Zelinsky, G. J., & Bisley, J. W. (2015). The what, where, and why of priority maps and
their interactions with visual working memory. Annals of the New York Academy of
Sciences, 1339, 154–164.
Abstract (if available)
Abstract
Neuroscientific computational modeling has been successful in characterizing neural circuits for vision, motor control and decision making, but have not often been applied to behaviors exemplary of the primates, namely social learning, social cognition and communication. There are two main efforts we describe here: the modeling, via computer simulation, of (1) brain mechanisms of serial learning in monkeys and their transfer through observation of others’ performances, and (2) brain mechanisms of social learning and interaction that support the development of gestural repertoires in apes. The first main effort is focused on analyzing previously published behavioral and neurophysiological data in monkeys to construct novel computational models to explain these data, challenge existing interpretations, and generate testable hypotheses. We show how monkeys may learn sequences of items through trial-and-error mechanisms that manage multiple, concurrent learning processes, including of temporal order and reward-predictive value, and can yield behavioral patterns that qualitatively match those in the literature, while simulated neural responses predict response profiles for a variety of cortical and sub-cortical areas in macaques. Further data from the literature suggest how learning a list may be aided by observing another monkey generate that list. A crucial innovation in our work involves going beyond simulating the mechanisms coordinating an individual’s actions or decisions, by having multiple simulated agents able to learn from others’ behavior. We show how we can situate action-recognition and feedback processing elements to process others' performances and yield facilitated performance when simulated later in isolation. ❧ In our second main effort, we go further by simulating: (i) ape brains, (ii) interaction and gestural communication, and (iii) dynamic exchange of information, back-and-forth and not just one-way, between modeled apes. This ‘dyadic brain modeling’ can show how the ‘mutual shaping of behavior’ between apes may lead to novel gestural forms that serve as communicative signals between individuals, as hypothesized in the literature. Further, we show how alternative hypotheses of gestural acquisition—the pruning on an innate gestural repertoire—can be handled by our integrative model, and how we can explain multiple learning pathways leading to varied gestural repertoires in a single unified and computationally-specific model. ❧ These efforts help to clarify brain mechanisms managing social learning in primates—mechanisms likely highly conserved in humans—and mechanisms managing social cognition and interaction. Additionally, we show how competing hypotheses of gestural learning and usage in apes can be explained by one model, and offer hypotheses towards which varying usage patterns may be understood, with additional suggestions for brain mechanisms important in the evolution in apes and humans that yield the flexibility in gestural communication observed in the wild. Finally, the methodological innovation of dyadic brain modeling—simulating brain models in interaction—attempts to move the field of computational neuroscience towards modeling more complex behaviors.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Modeling the mirror system in action observation and execution
PDF
Schema architecture for language-vision interactions: a computational cognitive neuroscience model of language use
PDF
Situated proxemics and multimodal communication: space, speech, and gesture in human-robot interaction
PDF
Generating gestures from speech for virtual humans using machine learning approaches
PDF
Building and validating computational models of emotional expressivity in a natural social task
PDF
Spatiotemporal processing of saliency signals in the primate: a behavioral and neurophysiological investigation
PDF
Interaction between Artificial Intelligence Systems and Primate Brains
PDF
Computational modeling and utilization of attention, surprise and attention gating
PDF
Computational modeling and utilization of attention, surprise and attention gating [slides]
PDF
Computational modeling of human behavior in negotiation and persuasion: the challenges of micro-level behavior annotations and multimodal modeling
PDF
Modeling dyadic synchrony with heterogeneous data: validation in infant-mother and infant-robot interactions
PDF
Modeling the integration of salamander vision and behavior
PDF
The development of object recognition in the newborn brain
PDF
Modeling social and cognitive aspects of user behavior in social media
PDF
Integrating top-down and bottom-up visual attention
PDF
Experimental and computational explorations of different forms of plasticity in motor learning and stroke recovery
PDF
The symbolic working memory system
PDF
Linking eyes to mouth: a schema-based computational model for describing visual scenes
PDF
Modeling motor memory to enhance multiple task learning
PDF
Understanding the relationship between goals and attention
Asset Metadata
Creator
Gasser, Brad
(author)
Core Title
Learning lists and gestural signs: dyadic brain models of non-human primates
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Neuroscience
Publication Date
05/24/2016
Defense Date
05/24/2016
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Apes,computational model,dyadic brain models,Gesture,Neuroscience,OAI-PMH Harvest,primates
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Itti, Laurent (
committee chair
), Arbib, Michael (
committee member
), Bechara, Antoine (
committee member
)
Creator Email
bgasser@usc.edu,bradagasser@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c40-249304
Unique identifier
UC11280600
Identifier
etd-GasserBrad-4419.pdf (filename),usctheses-c40-249304 (legacy record id)
Legacy Identifier
etd-GasserBrad-4419.pdf
Dmrecord
249304
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Gasser, Brad
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
computational model
dyadic brain models