Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Computational model of stroke therapy and long term recovery
(USC Thesis Other)
Computational model of stroke therapy and long term recovery
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
1
COMPUTATIONAL MODEL OF STROKE THERAPY
AND LONG TERM RECOVERY
by
Yukikazu Hidaka
_____________________________________
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(Computer Science)
December 18
th
, 2013
Copyright 2013 Yukikazu Hidaka
2
Acknowledgements
First of all, I thank my adviser Dr. Nicolas Schweighofer for providing this opportunity to go through Ph.D
academic life, and his insights and guidance on my research. Without his encouragement and passion, I
wouldn’t complete my Ph.D.
I would like to show my gratitude to Dr. Carolee Winsten for serving on my committee, and also with Dr.
Steve Wolf for providing invaluable dataset of many years of clinical trials. I thank Dr. Gordon for his
insightful advice during MBNL meeting, from which I learned scientific thinking to analyze data.
I would like to thank Dr. Stefan Schaal for being my committee member, and his robotics, machine
learning, and reinforcement learning courses, where I obtained in-depth knowledge and developed
interests to machine learning.
I also thank Dr. Yan Liu and Dr. Fei Sha for serving my qualification committee and data analytics and
machine learning classes. I enjoyed paper reading and presentations. I appreciate Dr. Jinchi Lv for his
high dimensional statistics class and great discussion in every lecture. I would like to thank Dr. Michel
Arbib for his philosophical and neuroscientific discussion during brain theory class.
Many thanks to my colleagues at Computational Neuro Rehabilitation Learning Lab: Dr. Cheol Han, Dr.
Younggeun Choi, Dr. Jeongyoun Lee, Amarpreet Bains, Sungshin Kim, Feng Qi, Sujin Kim, Youngmin Oh,
Hyeshin Park, Yupeng Xiao, and Chunji Wang for sharing time in my academic life. I enjoyed lots of
interesting discussions and collaboration with them. Similarly, I thank to my colleagues in Motor
Behavior and Neurorehabilitation Lab: Dr. Shuya Chen, Dr. Eric Wade, Ya-yun Lee , Bobby Charalambos,
3
Yu-chen Chung, Yi-an Chen, Matt Konersman, Dr. Shailesh Kantak, and Yi-Hsuan Lai. I appreciate our
multidisciplinary and diverse group of helpful and positive minds.
Next, I thank Dr. Rieko Osu and Toshinori Yoshioka for helping design and conduct robot-guided
experiments during my stay at ATR. I thank Michael Lei for his leadership during my internship. I had
excellent on-the-job training for massive and scalable data analysis. I am thankful to Dr. Masashi
Sugiyama and Dr. Makoto Yamada for their research advice. I thank my new colleagues Dony Ang, Dr.
Moira Regelson, and Harris Chiu for having me on-board. I like creative and inspiring atmosphere, and I
enjoy my career as data scientist and software engineer.
I would not achieve this milestone without the chance of meeting Dr. Hisako Murakawa who inspired
me to pursue my Ph.D, Dr. Toshiyasu Kunii who provided outstanding computer science program in my
early career, and Dr. Alexander Belyaev who was my first research advisor. I appreciate Dr. Evangelos
Theodorou for introducing me to people in University of Southern California. I would like to thank my
former colleagues for their encouragement to pursue my academic career.
Finally, I would like to thank my parents and family for their unconditioned support throughout my life.
4
Table of Contents
Acknowledgements ....................................................................................................................................... 2
List of Figures ................................................................................................................................................ 7
Abstract ....................................................................................................................................................... 11
Chapter 1 Introduction ............................................................................................................................ 13
1.1 Chapter organization .................................................................................................................. 14
Chapter 2 Modeling long term recovery: Use it and improve it or lose it, interactions between arm
function and use in humans post-stroke .................................................................................................... 16
2.1 Summary ..................................................................................................................................... 16
2.2 Introduction ................................................................................................................................ 16
2.3 Methods ...................................................................................................................................... 19
2.3.1 Data for model parameter fit and model selection ............................................................ 19
2.3.2 Quantitative models of arm function and use interaction ................................................. 21
2.3.3 Model fit, immediate and delayed groups .......................................................................... 24
2.3.4 Model comparison, immediate group ................................................................................ 29
2.3.5 Model fit before and after therapy, delayed group ............................................................ 33
2.4 Results ......................................................................................................................................... 33
2.4.1 Model selection and fit analyses......................................................................................... 33
2.4.2 Model parameter analysis .................................................................................................. 37
2.4.3 Effects of therapy on model parameters ............................................................................ 38
2.4.4 Model Simulations .............................................................................................................. 39
2.5 Discussion .................................................................................................................................... 40
2.6 Supplementary materials ............................................................................................................ 51
5
Chapter 3 Modeling therapy training: Task interrelation in motor training post-stroke revealed by
graph structure learning ............................................................................................................................. 66
3.1 Summary ..................................................................................................................................... 66
3.2 Introduction ................................................................................................................................ 66
3.3 Clinical Methods.......................................................................................................................... 70
3.3.1 Task selection and task schedule ........................................................................................ 70
3.3.2 Task contents ...................................................................................................................... 71
3.3.3 Expert rated similarity score ............................................................................................... 71
3.3.4 Performance index .............................................................................................................. 72
3.4 Computational Methods ............................................................................................................. 72
3.4.1 Gaussian Graphical Model (GGM) ...................................................................................... 72
3.4.2 Bayesian linear model ......................................................................................................... 73
3.4.3 Sparse partial concentration matrix ................................................................................... 74
3.4.4 Simulations .......................................................................................................................... 77
3.5 Results ......................................................................................................................................... 78
3.5.1 Training task structure from EXCITE performance data ..................................................... 78
3.5.2 Task structure and expert-rated similarity ......................................................................... 78
3.5.3 Sensitivity analysis .............................................................................................................. 79
3.5.4 Simulation results using synthetic data .............................................................................. 79
3.6 Discussion .................................................................................................................................... 80
3.6.1 Comparisons with other computational methods .............................................................. 81
3.7 Limitations and future work ....................................................................................................... 82
6
Chapter 4 Modeling memory structure: context switching mechanism of motor output from multi-
timescale multi-tasks memory process, using sparse regression with 𝒍𝟏 regularization ........................... 94
4.1 Summary ..................................................................................................................................... 94
4.2 Introduction ................................................................................................................................ 94
4.3 Method ....................................................................................................................................... 96
4.3.1 Computational model ......................................................................................................... 96
4.3.2 Dataset ................................................................................................................................ 98
4.4 Results ......................................................................................................................................... 99
4.4.1 Memory process ................................................................................................................. 99
4.4.2 Sparse regression ................................................................................................................ 99
4.4.3 Memory structure ............................................................................................................. 100
4.4.4 Interference and transfer effects ...................................................................................... 101
4.5 Discussion .................................................................................................................................. 102
4.5.1 Memory structure ............................................................................................................. 102
4.5.2 Interference and transfer effects ...................................................................................... 103
4.6 Conclusion ................................................................................................................................. 103
Chapter 5 Conclusion ............................................................................................................................ 104
References ................................................................................................................................................ 105
7
List of Figures
FIGURE 1: LONGITUDINAL ARM AND HAND USE DATA (AS MEASURED BY THE MAL AOU TEST, NORMALIZED) FOR
48 PARTICIPANTS OF THE IMMEDIATE GROUP IN EXCITE ILLUSTRATING HOW USE CAN INCREASE (A),
DECREASE (B), OR NOT CHANGE (C) IN THE 24 MONTHS FOLLOWING THERAPY. CLASSIFICATION IN THE THREE
CATEGORIES WAS BASED ON THE SIGNIFICANCE OF THE SLOPE PARAMETER OF A LINEAR MODEL FIT OF USE
AS A FUNCTION OF TIME, WITH A LENIENT CRITERION TO TEST THE HYPOTHESIS THAT THE SLOPE IS NOT
DIFFERENT FROM ZERO (P < 0.25). USE INCREASE CATEGORY, N = 14; USE DECREASE CATEGORY, N = 12; NO
CHANGE IN USE CATEGORY, N = 22. 46
FIGURE 2: EXAMPLES OF MODEL FIT FOR UPPER EXTREMITY FUNCTION AND USE OVER 24 MONTHS POST THERAPY
FOR THREE SUBJECTS IN THE IMMEDIATE GROUP USING THE MODEL OF EQUATION (2.1) AND (2.2) IN THE
MAIN TEXT (AND CORRESPONDING EQUATIONS IN BOLD FONTS IN TABLE 1). THE BLUE LINES SHOW THE
ACTUAL DATA. THE RED LINES ARE GENERATED BY THE MODEL WITH THE MEAN MODEL PARAMETERS,
TRAINED WITH 7 DATA POINTS. (A) BOTH ARM FUNCTION AND USE IMPROVE (MEAN MODEL PARAMETERS
1
w = 0.76,
2
w = 2.98 AND
3
w = 0.42) (B) ARM FUNCTION IS MORE OR LESS CONSTANT, WHILE ARM USE
SHOWS “NON-USE” (MEAN MODEL PARAMETERS
1
w = 0.14,
2
w = 3.36 AND
3
w = 3.03). (C) ARM FUNCTION
SLIGHTLY DECREASES, WHILE ARM USE RISES AFTER 4 MONTH AND KEEPS THE LEVEL (MEAN MODEL
PARAMETERS
1
w = 0.19,
2
w = 3.48 AND
3
w = 1.88). SEE HOW THE MODEL FIT IS IN GENERAL GOOD FOR USE
OVER THE 24 MONTHS AND FOR FUNCTION IN THE FIRST YEAR BUT THEN IS GETTING WORSE FOR FUNCTION
IN THE SECOND YEAR (SEE TABLE 3 FOR A SYSTEMATIC EVALUATION OF MODEL FIT). 46
FIGURE 3: HISTOGRAMS OF THE MEANS OF PARAMETERS
1
w ,
2
w , AND
3
w OF THE MODEL ESTIMATED WITH DATA
OF THE IMMEDIATE GROUP IN THE EXCITE TRIAL. BLUE AND RED: SUBJECTS WITH ALL ESTIMATED MEAN
PARAMETERS. BLUE: SUBJECTS WITH MEAN PARAMETERS AFTER APPLICATION OF CONVERGENCE CRITERIA
(SEE RESULTS). THE NUMBERS N’S INDICATE THE NUMBERS OF SUBJECTS WITH GOOD CONVERGENCE FOR
EACH PARAMETER. NOTE THAT FOR
1
w , THE MEANS OF ALL PARAMETERS WITH GOOD CONVERGENCE ARE
8
IN THE RANGE [0; 1] SUPPORTING THE “USE IT AND IMPROVE IT, OR LOSE IT” MODEL. SIMILARLY, FOR
2
w ,
THE MEANS OF MOST PARAMETERS WITH GOOD CONVERGENCE ARE POSITIVE, SUPPORTING AN ACTUAL
INFLUENCE OF FUNCTION ON USE (REFER TO EQUATION (2.1) AND (2.2) IN METHODS FOR THE ROLE OF
THESE PARAMETERS IN THE MODEL). 47
FIGURE 4: EFFECTS OF THERAPY ON MEAN MODEL PARAMETERS FOR PARTICIPANTS OF THE DELAYED GROUP IN
THE EXCITE TRIAL. A. EFFECT ON
1
w . B. EFFECT ON
2
w . C. EFFECT ON
3
w . ONLY THE MEAN PARAMETER
2
w
OF EQUATION 2 VARIES FROM BEFORE (BE) TO AFTER (AF) THERAPY. THIS PARAMETER CONTROLS THE
EFFECT OF FUNCTION ON USE FOR THE AFFECTED ARM. THE HORIZONTAL LINE IN B INDICATES P < 0.05.
48
FIGURE 5: COMPUTER SIMULATIONS OF ARM FUNCTION SHOWING DEPENDENCE ON MODEL PARAMETERS. A.
SIMULATIONS OF THE EFFECT ON USE AFTER HYPOTHETICAL CHANGES IN THE CONFIDENCE PARAMETER
2
w
AS A RESULT OF THERAPY. INITIAL PARAMETERS VALUES:
1
w = 0.6,
2
w = 3,
3
w = 3. FOR SIMPLICITY, WE
ASSUMED HERE THAT THERAPY HAS ONLY AN EFFECT ON THE PARAMETER
2
w AND NOT ON USE AND
PERFORMANCE (WHICH IT DID IN ACTUAL PARTICIPANTS OF THE EXCITE TRIAL (WOLF, WINSTEIN ET AL.
2006)). THE INCREASE IN PARAMETER
2
w FROM BEFORE TO AFTER THERAPY PARALLELS THE INCREASE IN
THIS PARAMETER IN THE DELAYED GROUP OF THE EXCITE TRIAL (SEE FIGURE 4). B. PARAMETER SENSITIVITY
ANALYSIS SHOWING THE ASYMPTOTIC VALUE OF ARM FUNCTION F AS A FUNCTION OF PARAMETER
2
w FOR
A NUMBER OF VALUES OF
3
w . LP: LIMIT POINT. THE LINE LABELED
3
w = 3 IS GENERATED BY THE SAME
MODEL AS IN A. FOR VALUES OF
3
w > 3 THE SYSTEM BEHAVIOR EXHIBITS A NON-STABLE RANGE BETWEEN
THE TWO LIMIT POINTS. FOR
3
w = 3.5 AND
2
w = 5 FOR INSTANCE, ARM FUNCTION F CONVERGES TO EITHER
A LOW OR A HIGH VALUE. 49
9
FIGURE 6 TASK SELECTION FOR 2 SUBJECTS DURING THE 2 WEEKS OF EXCITE TRIALS. A: THE SUBJECT PRACTICED
14 TASKS. B: THE SUBJECT PRACTICED 12 TASKS IN TOTAL DURING THE THERAPY. THE X-AXIS IS THE DAY OF
VISIT, AND Y-AXIS EXCITE TASK ID. THE INTENSITY SHOWS THE NUMBER OF REPETITION PER DAY. 86
FIGURE 7 EXAMPLE OF PERFORMANCE AND LINEAR FIT AS A FUNCTION OF TRIALS FOR TASKS TK AND AR FOR 1
SUBJECT ON DAY 1. THE X AXIS IS THE TRIALS WITHIN A SET, AND Y AXIS IS THE NUMBER OF THE REPETITION
IN A UNIT TIME. BLACK DOTS SHOW THE NUMBER OF REPETITION ON EACH TRIAL; THE GRAY LINE SHOWS
PROGRESS IN ONE SET; THE BLACK LINE SHOWS THE FITTED LINE TO THE DATA. THE INTERCEPT OF THIS LINE
GIVES THE "PERFORMANCE INDEX" THAT WE USED STUDY TASK INTERRELATION VIA GRAPHICAL MODELS.
87
FIGURE 8 GRAPH STRUCTURE OF TASK INTERRELATION FOR THE 6 MOST PRACTICED TASKS TRAINED FROM 542
PAIRWISE MEASUREMENTS FROM 46 PARTICIPANTS AND TWO VALUES OF SPARSITY PARAMETER 𝛌. TICKER
EDGES REPRESENT HIGHER ABSOLUTE VALUE OF CONCENTRATION MATRIX BETWEEN TASKS. A: A LESS
SPARSE GRAPH WITH 𝛌 = 𝟏 . 𝟎𝟎 × 𝟏𝟎 − 𝟑 , AND B: A SPARSER GRAPH WITH: 𝛌 = 𝟓 . 𝟎 × 𝟏𝟎 − 𝟑 . FOR DISPLAY
PURPOSE, THE OFF-DIAGONAL ELEMENTS OF THE OUTCOME 𝐖 WERE NORMALIZED TO MEET THE EDGE
THICKNESS (0-15 LINE WIDTH PROPERTY IN MATLAB PLOT) AND EDGES WITH THICKNESS LESS THAN 1 WERE
REMOVED. 88
FIGURE 9 CORRELATION BETWEEN OFF-DIAGONAL ELEMENTS OF THE RATER-DERIVED AND
THE DATA-DERIVED COVARIANCES. THE DATA-DERIVED COVARIANCE STRUCTURE WAS
FOR 𝛌 = 𝟏 . 𝟎𝟎 × 𝟏 𝟎 − 𝟑 . PEARSON CORRELATION: R = -0.69; P = 0.0043. 89
FIGURE 10 SENSITIVITY ANALYSIS ON 𝛌 FOR 𝟏𝟎 − 𝟏𝟎 ≤ 𝝀 ≤ 𝟏𝟎𝟏. A: KL LOSS BETWEEN THE RATER-DERIVED AND
THE DATA-DERIVED COVARIANCES. B: P-VALUE OF THE CORRELATION BETWEEN OFF-DIAGONAL ELEMENTS
OF THE RATER-DERIVED AND THE DATA-DERIVED COVARIANCES. NOTE THAT IN THE RANGE 𝛌 > 0.0215, IT
WAS NOT FEASIBLE TO COMPUTE THE P-VALUE BECAUSE ALL THE OFF-DIAGONAL ELEMENTS OF
CONCENTRATION MATRIX 𝐖 CONVERGED TO ZERO, C: OFF DIAGONAL TERMS OVER THE RANGE OF 𝛌.
90
10
FIGURE 11: DIAGRAM OF THE PROPOSED MODEL 98
FIGURE 12:MEMORY PROCESS. THICK BLACK LINE SHOWS THE PERFORMANCE ON TASK 1 WITH THE RED DOT AS
ACTUAL MEASUREMENTS. THE OTHER LINES SHOW THE MEMORY STATES WITH DIFFERENT TIME SCALE FOR
CONTROL (A) AND STROKE (B) SUBJECT. 99
FIGURE 13: SPARSE REGRESSION. BLACK LINE SHOWS THE ACTUAL PERFORMANCE OF THE TASK, AND RED LINE
SHOWS THE REGRESSION LINE. (A) CONTROL AND (B) STROKE SUBJECT. 100
FIGURE 14 HISTOGRAM OF MEMORY PROCESS TIME SCALES FOR (A) CONTROL AND (B) STROKE SUBJECT. 101
FIGURE 15 HISTOGRAM OF MEMORY STRUCTURE 102
11
Abstract
Understanding the effects of task practice on the long term recovery of arm function post-stroke could
allow effective motor training at a reduced cost. There is, however, little systematic understanding of
what constitutes effective practice and how arm function changes during and outside of training.
In this light, we first studied long-term effects of therapy, and whether there is supportive evidence for
the much discussed "use it or lose it" axiom at the behavioral level in the chronic phase post-stroke. For
this, we used longitudinal data from a Phase III clinical trial and compared different hypothesizes using
Bayesian model evidence, and showed that indeed sufficient arm use in daily life can further improves
function besides the gains due to therapy .
Next, we aimed to understand the effects of task selections during task-oriented training, as it is at
present not known how to determine the task schedules that maximize recovery, notably because of
possible learning transfer and interference between tasks. Here, we quantified such task interrelations
by forming a graphical model from task performance data obtained during constraint induced
movement therapy. Because the data is not missing at random and partial, we developed a novel
method to learn Gaussian graph structure using the Bayesian Linear Model. We found that the output of
the graphical model correlated with similarity scores between tasks as rated by experts.
Finally, we focused on the mechanism of learning and memory structure in motor tasks. We
hypothesized that context switching mechanism on motor output is not perfect unlike assumptions on
12
previously proposed models. To find a parsimonious motor output model and its memory components,
we ran sparse regression on multi-timescale and multi-task memory process. Our result showed that
output from short timescale memory is most influential to motor output, and context interference or
transfer was observed in the middle and long timescale memory. We also compared the memory output
between the healthy and individuals with stroke.
13
Chapter 1 Introduction
According to the American Heart Association, every year more than 795,000 people in the United States
experience a new or recurrent stroke attack (Schaefer, Patterson et al. 2013), the estimated 6.8 million
American adults have experienced a stroke, and this trend projects to add 4 million people more by
2030. The associated economic cost was $38.6 billion in 2009.
Among all the strokes, up to 85% cause hemiparesis and they often leave stroke survivors with
predominantly unilateral motor impairments (Wolf, Winstein et al. 2006). Improving use of the more
affected arm is important, because difficulty in using this arm in daily tasks has been associated with
reduced quality of life (Mayo, Wood-Dauphinee et al. 2002).
There is now definitive evidence that intensive Task-Oriented Training (TOT) is effective for improving
upper extremity function and use after stroke (Butefisch 1995, Kwakkel 1999, Wolf 2002, Wolf, Winstein
et al. 2006). It is effective in restoring function post-stroke (Fisher and Sullivan 2001, Van Peppen,
Kwakkel et al. 2004, Winstein, Rose et al. 2004, Wolf, Thompson et al. 2005), and such training
reverses, at least partially, the loss of cortical representation due to lesion through recruitment of
adjacent brain areas in animals (Nudo, Wise et al. 1996, Kleim, Barbay et al. 1998) and in humans
(Liepert, Bauder et al. 2000). This reorganization lasts several years (Taub, Uswatte et al. 2003), and has
been linked to improved performance (Conner, Culberson et al. 2003) and increased use of the affected
limb (Taub, Uswatte et al. 2002). On the contrary, lack of training has been associated with further loss
of cortical representation (Nudo, Wise et al. 1996, Plautz, Milliken et al. 2000). TOT has emerged as the
dominant approach to motor rehabilitation post-stroke (NINDS Stroke Progress Review Group,
2006, (Winstein and Wolf 2008)).
14
This thesis aims to understand effects of TOT, especially in learning and retention aspects from
computational modeling approach. Recent advances in machine learning, data analytics, and artificial
intelligence have allowed us to model complex humans post-stroke recovery processes, and evaluate
different hypotheses. Given appropriate measurements, we can identify important factors, estimate
parameters, and predict future trends.
1.1 Chapter organization
The remaining chapters of this proposal are organized as follows. Each chapter has its own summary,
introduction, method, discussion, and conclusion, so that it can be read as an independent paper.
In Chap 2, we focus on the long term effects of the intensive therapy. We explore putative non-linear
interactions between upper extremity function and use by developing a first-order dynamical model of
stroke recovery with longitudinal data from participants receiving constraint induced movement therapy
(CIMT) in the EXCITE clinical trial.
In Chap. 3, we focus on the understanding the effects of task training during an intensive motor therapy.
Especially, we formulate the task interrelation as the graph structure learning with L-1 regularization,
where the edges between tasks indicate statistical conditional dependencies. We propose a graph
structure learning method to estimate covariance of unobserved values based on Bayesian Linear Model.
We apply this method to the dataset from The Extremity Constraint Induced Therapy Evaluation
(EXCITE). Our performance-based inter-task dependencies are compared with movement attributes
based task similarity, rated by physical therapists.
15
In Chap. 4, we model grasping task performance as multiple time-series of multi-task learning. We
hypothesized that context switching mechanism on motor output is not a perfect one, and identified
ranges of timescale on the memory states. To find a parsimonious motor output model, we run sparse
regression with multi-timescale and multi-tasks memory process. Our result showed that output from
short timescale memory is most influential, and context interference happens more in the middle and
long timescale memory. We compare the memory output model between the healthy and individuals
with stroke.
16
Chapter 2 Modeling long term recovery: Use it and improve it or lose it,
interactions between arm function and use in humans post-stroke
2.1 Summary
“Use it and improve it, or lose it” is one of the axioms of motor therapy after stroke. There is, however,
little understanding of the interactions between arm function and use in humans post-stroke. Here, we
explored putative non-linear interactions between upper extremity function and use by developing a
first-order dynamical model of stroke recovery with longitudinal data from participants receiving
constraint induced movement therapy (CIMT) in the EXCITE clinical trial. Using a Bayesian regression
framework, we systematically compared this model with competitive models that included, or not,
interactions between function and use. Model comparisons showed that the model with the predicted
interactions between arm function and use was the best fitting model. Furthermore, by comparing the
model parameters before and after CIMT intervention in participants receiving the intervention one year
after randomization, we found that therapy increased the parameter that controls the effect of arm
function on arm use. Increase in this parameter, which can be thought of as the confidence to use the
arm for a given level of function, lead to increase in spontaneous use after therapy compared to before
therapy.
2.2 Introduction
Stroke often leaves patients with predominantly unilateral motor impairments. Although the affected
upper extremity is often not completely paralyzed, the recovery of upper extremity function is often
achieved solely by compensatory use, i.e., choice of the less-affected arm (Nakayama, Jorgensen et al.
17
1994). Improving use of the more affected arm is important however, because difficulty in using this arm
in daily tasks has been associated with reduced quality of life (Mayo, Wood-Dauphinee et al. 2002).
There is now definitive evidence that intensive task-specific practice is effective for improving upper
extremity function and use after stroke (Butefisch 1995, Kwakkel 1999, Wolf 2002, Wolf, Winstein et al.
2006). Such training reverses, at least partially, the loss of cortical representation due to lesion through
recruitment of adjacent brain areas in animals (Nudo, Wise et al. 1996, Kleim, Barbay et al. 1998) and in
humans (Liepert, Bauder et al. 2000). This reorganization lasts several years (Taub, Uswatte et al. 2003),
and has been linked to improved performance (Conner, Culberson et al. 2003) and increased use of the
affected limb (Taub, Uswatte et al. 2002). On the contrary, lack of training has been associated with
further loss of cortical representation (Nudo, Wise et al. 1996, Plautz, Milliken et al. 2000).
Thus, the axiom “Use it and improve it, or lose it” (Kleim and Jones 2008), seems appropriately
applicable to the training period, when the individual is “forced” to use the affected upper extremity.
But, what happens outside of therapy, when the individual is free to use, or not use, the affected limb?
In some individuals, function and use further improve in the years following therapy (Winstein, Rose et
al. 2004, Winstein and Wolf 2008, Fujiwara, Kasashima et al. 2009) (see Figure 1A). For other individuals,
function and use decrease in the years following therapy (see Figure 1B). We previously hypothesized
that the repeated decisions to use the affected limb in daily activities may be a form of motor practice
that can lead to further improvements (Winstein, Rose et al. 2004). Similarly, repeated, failed, attempts
to use the affected limb have been hypothesized to underlie worsening of the impairment in a process
termed "learned non-use" (Uswatte, Taub et al. 2006).
In our previous neuro-computational model of stroke recovery, we attempted to shed light on the
interactions between function and use in general and learned non-use in particular (Han, Arbib et al.
2008). Our model contained two independent motor cortices, each controlling the contralateral arm,
18
with one being affected by stroke. Before each movement, one motor cortex was selected by an
adaptive decision-making system, tentatively located in cortico-striatal networks. Arm performance
improved via neural reorganization in the motor cortex, which learned both to minimize directional
errors (via supervised learning) and to maximize neuronal activity for desired movement directions (via
Hebbian learning). Furthermore, the decision to use one limb or the other was made by comparing the
“action value” of each limb in the adaptive decision-making system. The values for each arm were
updated based on reward prediction errors (via reinforcement learning). If performance-based rewards
were greater than expected, the arm was chosen more often for this particular movement. Thus, the
model predicted that function of the affected arm depends on prior use and that, in turn, arm use
depends on non-linear competition between prior functions of the affected and the non-affected arm.
The model also predicted that if spontaneous recovery, or motor training, or both, brings performance
above a certain threshold, the repeated spontaneous arm use provides a form of motor learning that
further bootstraps performance and spontaneous use. Below this threshold, spontaneous arm use after
training decreases (thus the model exhibits “learned non-use”), and compensatory movements with the
less affected hand are reinforced. We previously provided clinical evidence for such a threshold at the
group level (Schweighofer, Han et al. 2009).
Here, our principal aim was to test the hypothesis that, in individuals in the chronic phase post-stroke,
function of the affected arm depends on prior use of that arm and arm use, in turn, depends non-
linearly on function, as predicted from our previous model. For this purpose we developed a new data-
driven quantitative first-order dynamical model of stroke recovery that links arm function and use with a
small number of parameters, which can be directly adjusted from actual data. We obtained data on
upper extremity function and use for a two-year period starting from 3 months or more after stroke
from the database of the Phase III randomized controlled clinical EXtremity Constraint Induced Therapy
Evaluation (EXCITE) trial (Wolf, Winstein et al. 2006), which aimed at demonstrating the efficacy of a
19
rehabilitative intervention for upper extremity. Arm function was derived from the time score of the
Wolf Motor Function Test (WMFT) (Wolf 1989, Wolf, Thompson et al. 2005) and arm use data was
derived from the Motor Activity Log Amount of Use (MAL AOU) (Uswatte, Taub et al. 2005, Uswatte,
Taub et al. 2006). Because of the sparsity of the data, we used Bayesian regression to fit the model. In
addition, Bayesian regression allowed us to systematically compare our model with alternative models
to test our hypothesis. We validated the model by computing the prediction errors of the model with a
leave-one-out method.
Our secondary aim was to investigate whether motor therapy can change the hypothesized relationship
between arm function and use by examining the model parameters before and after therapy. Besides
improving both function and use, therapy may increase the confidence to use the arm (Hellstrom,
Lindmark et al. 2003, Salbach, Mayo et al. 2006). We thus predicted, that, the relationship that links arm
function to arm use can be modified by therapy, and that controlling for the level of function, arm use
can increases after therapy compared to before therapy.
2.3 Methods
2.3.1 Data for model parameter fit and model selection
In EXCITE, two groups of participants 3 months or more post-stroke were randomly assigned to either an
immediate or a delayed Constraint Induced Movement Therapy (CIMT) group (Winstein, Miller et al.
2003, Wolf, Winstein et al. 2006, Wolf, Thompson et al. 2010). After 3 months, changes in function can
be attributed more to learning and adaptation rather than to significant physiological modifiers that
dominate the initial recovery period. The immediate group received two weeks of therapy from time
Pre1 (t = 0) to Post1; the delayed group received two weeks of therapy after a one-year delay, from Pre2
(t = 1 year) to Post2.
20
The measure of function that we used to develop our model was the negative of the logarithm of the
WMFT time score, normalized between 0 and 1. The WMFT time score (Wolf 1989, Wolf, Thompson et
al. 2005) has been used as either a primary or a secondary outcome in more than 70 published studies
including the EXCITE trial. The test determines the time required for patients with stroke to perform 15
everyday tasks with each upper extremity. Tasks are sequenced so that the first six tasks involve simple
limb movements, primarily of the proximal musculature; the next nine tasks require manipulation and
distal control. The time score is computed by adding the times of the tasks that the subject can perform
within 120 seconds. For each task that the subject cannot perform, 120 sec are added. The WMFT time
score has good reliability, validity, and no learning effect (Wolf, Thompson et al. 2005). Note that
because the more simple tasks can normally be performed quickly, the distribution of the WMFT time
score has a long-tail. The natural logarithm of the WMFT time score is therefore taken to transform the
distribution into a normal distribution (Wolf, Winstein et al. 2006). To readily incorporate the time score
of the WMFT (after logarithm transformation) into our model, we negated the logarithm transformed
WMFT score such that a good (low) WMFT time score corresponds to good (high) arm function. We then
normalized the range by dividing by the difference between the highest score and the lowest score in
the data set, and subtracting the lowest score in the data set from each point. Thus, a normalized score
of 1 corresponds to excellent function and 0 to very poor function.
The measure of arm use that we incorporated to develop our models was the average MAL AOU score,
normalized between 0 and 1. In the MAL AOU (Uswatte, Taub et al. 2005, Uswatte, Taub et al. 2006), the
participants (or their caregivers) rate how much the paretic arm is used spontaneously to accomplish 30
activities of daily living outside of the laboratory. Each item on the MAL AOU is ranked from 0 (no use) to
5 (normal) via increments of 0.5. Validity and reliability of the MAL AOU has been established (Uswatte,
Taub et al. 2006). The MAL AOU has been used extensively in studies with a few repeated
measurements, including in the EXCITE trial.
21
Participants were tested with the WMFT and the MAL AOU at Pre1 (t = 0 week), Post1 (t = 2 weeks),
Pre2 (t = 1 year), and Post 2 (t = 1 year + 2 weeks). All participants were also tested at 4 months, 8
months, 16 months, 20 months, and 24 months. In the immediate group, because we only studied the
participants’ behavior after therapy, we excluded data at Pre1. Furthermore because little change in
function or use is likely to happen within a 2-week-period one year after CIMT for the immediate group
(Wolf, Winstein et al. 2008), we averaged the data at between Pre2 and Post2 for this group. Thus, for
each subject of the immediate group, a total of 7 data points were available, each spaced by 4 months
(at Post1, 4, 8, 12, 16, 20, and 24 months), as shown in Figure 1. In the delayed group, we compared the
participants’ behavior after therapy to the behavior before therapy. Because little change in function or
use is likely to happen in two weeks between Pre1 and Post1 for this group (Wolf, Winstein et al. 2006),
we averaged the function and use data at these two data points. Thus, for each subject of the delayed
group, 4 data points were available before therapy (at 0, 4, 8 months, and Pre2) and 4 data points
available after therapy (at Post2, and 16, 20, and 24 months).
Because of the very limited number of time points in our study, we only analyzed the data of
participants with full data sets, that is, each participant had a full complement of WMFT and MAL AOU
data. In the immediate group, 48 participants had a full data set. In the delayed group, 45 participants
had a full dataset.
2.3.2 Quantitative models of arm function and use interaction
We investigated the simplest possible model that best accounted for four essential characteristics of our
previous neuro-computational model (Han, Arbib et al. 2008): 1) Time varying changes in arm function
and use reflecting the dynamic of stroke recovery. 2) Effect of use on function, with high use leading to
higher future function, and low use leading to lower future function. 3) Effect of function on decision to
22
use the arm, with higher function leading to higher future use, and lower function leading to lower
future use. 4) Decision to use the affected arm or the non-affected arm based on competition between
prior function of the affected and function of the non-affected arm.
We specifically hypothesized that a first order non-linear dynamical system, with two equations, can
account for the interactions between arm function and spontaneous use in individuals post-stroke. The
first (state-space) equation updates the function of the affected arm; the second equation updates the
use of that arm.
Characteristics (1) and (2) above can be encapsulated by the evolution of arm function at time step t in
terms of arm function and use at the previous time step as:
( ) ( 1) ( 1)
affected a affected b affected c
Ft w Ft w U t w = −+ −+
where arm function at t, ()
affected
Ft , is updated based on arm function and use at the previous time step
t-1, ( 1)
affected
Ft − and ( 1)
affected
Ut − ,
a
w is a decay rate,
b
w a ‘use effect’ rate, and
c
w a constant input.
Given the very few data points at our disposal (7 points in the immediate group), it is unlikely however
that such a complex model with 3 free parameters would provide both good fit and good generalization
(See sub-section “Model fit” below). Although we consider the 3-parameter model above and a simpler
2-parameter model with
c
w
= 0 for model comparison (see below), we take as our reference model the
simplest model, the 1-parameter model given by:
11
( ) (1 ) ( 1) ( 1)
affected affected affected
Ft w Ft wU t = − −+ −
, (2.1)
where
1
w is a free parameter. Equation (2.1) represents a condensed version of “Use it and improve it,
or lose it”, in the condition that 0 ≤
1
w ≤ 1: if ()
affected
Ut is zero or small, ()
affected
Ft decreases. If
()
affected
Ut is large, then ()
affected
Ft increases. The parameter
1
w can be considered as a ‘use effect’ rate;
23
the larger this rate, the greater the effect of spontaneous arm use on function. The term (1-
1
w ) is a
decay rate of arm function: with zero use, arm function would decay exponentially with time constant
D/
1
w , where D is the time step of 4 months.
Characteristics (1), (3) and (4) above can be encapsulated by the update of arm use at time step
t, ()
affected
Ut , in terms of arm function in the previous time step, ( 1)
affected
Ft − as:
23
1
()
1 exp(-( ( -1) - ))
affected
affected
Ut
wF t w
=
+
, (2.2)
where
2
w and
3
w are free parameters. Equation (2.2) is a sigmoidal equation that arises from common
decision-making models in the reinforcement-learning framework (Sutton and Barto 1998), in which the
probability to take an action is computed by comparisons of the values of each actions, with the action
with the highest “value” being the most probable. Here, we assumed that the “action value” of each
limb is proportional to the function of each limb at the previous time step. The slope parameter
2
w thus
controls the sensitivity of arm function on arm use and can tentatively be considered as a “confidence
parameter”: for equal function, greater or smaller
2
w
leads to more or less use, respectively. The
parameter
3
w encapsulates the function of the non-affected arm ()
unaffected
Ft together with any non-
modeled bias for preferred use of one arm versus the other, such as arm dominance or side of stroke.
We did not include ()
unaffected
Ft in the model because the average changes in function of the unaffected
arm following therapy are relatively small compared to the average changes observed in the affected
arm. Among participants of the immediate group the average log time WMFT scores is 8.62 ± 9.78 (SD)
for the affected arm, and 1.99 ±0.82 for the unaffected arm. The median percentage change in the score
for the unaffected arm from just after therapy (Post1, t =2 weeks) to 24 months, normalized by the
score of the affected arm just after therapy, is -3.6% and the interquartile range 7.1%. By comparison,
24
the median percentage change of the score for the affected arm from just after therapy to 24 months,
normalized by the score just after therapy (Post1, t =2 weeks) is -23.2% and the interquartile range
55.5%. We thus considered the function of the non-affected arm constant over the two years following
therapy; only the function of the affected arm ()
affected
Ft enters Equation (2.2) (henceforth, we drop the
subscript “affected”).
Note that because of the simple 1-parameter model of function, arm function converges to the same
value as use in the steady state (although after transformations to the original WMFT and AOU score,
the values would be of course different). This is simply due to our choice of a single parameter function
model, and there is no reason why this should happen in actual individuals post-stroke. Nevertheless,
our model may still be adequate given 1) that the variables may not converge to their asymptotic values
within two years because of long-time constants, and 2) the trade-off between fit and complexity that
favors simpler models.
2.3.3 Model fit, immediate and delayed groups
We estimated
1
w ,
2
w , and
3
w from function and use data of the EXCITE trial participants in both the
immediate and the delayed group. We also aimed at testing our hypothesis of interactions between arm
function and use as encapsulated in equations (2.1) and (2.2), against a number of alternative
hypotheses, as we now describe.
Because we have only 7 data points (immediate group) and 4 data points (delayed group for each before
and after-therapy) for each arm function and use, we must ensure that the model does not overfit the
data, that is, the model should describe the underlying relationship, not the random error or noise.
Overfitting generally occurs when a model is excessively complex, such as having too many parameters
25
relative to the number of data points. For instance, in frequentist (maximum likelihood) linear
regression, a minimum of 10 or 15 points per predictor is usually considered necessary.
In contrast, Bayesian regression is the method of choice in our case, as it does not overfit the data for
very small data sets (see (Bishop 2006) and below for rationale). The Bayesian regression framework has
the additional advantage of allowing principled model comparison based on the training data alone, that
is, without the need for cross-validation, which “wastes” training data. In light of these qualities, we
used Bayesian regression to determine the parameters of all the candidate models based on
(normalized) WMFT and MAL AOU data in the immediate group following therapy for each individual
participant (N = 48).
Here we illustrate Bayesian regression for our reference model of Equations (2.1) and (2.2). Similar
methods are used for the alternative models. We first reformulated Equations (2.1) and (2.2) to form
equations linear in model parameters:
( ) ( ) ( ) ( ) ( )
1
1 1 1 Ft Ft U t Ft w −− = − −− F(t)-F � t-1 � = � U � t-1 � -F � t-1 � � w
1
(2.3)
log �
U( t)
1- U( t)
� = w
2
F � t-1 � -w
3
( )
( )
( )
23
log 1
1
Ut
wF t w
Ut
= −−
−
. (2.4)
We then transformed in a linear regression form:
( ) ( )
1 11
y t tw ϕ = y
1
(t) = φ
1
(t)w
1
(2.5)
( ) ( ) ( )
2 2 23 3
y t t w tw ϕϕ = + , (2.6)
where (2.5) and (2.6) correspond to (2.3) and (2.4) respectively. ( )
1
yt and ( )
2
yt are the dependent
(target) variables, representing the left-hand side of (2.3) and (2.4) respectively, and
1
, ϕ
2
ϕ , and
3
ϕ are
26
basis functions ( ( ) ( )
1
1 1, U t Ft ϕ = −− −
2
ϕ = ( ) 1, Ft − and
3
1 ϕ = − ). Note that we can decouple
1
y
and
2
y for the purpose of model parameter estimation; hence, we use (6) as an example in the
following discussion.
Using a vector form, equation (2.6) gives the regression model given model parameters:
= Φ y w , (2.7)
where
( ) ( ) ( )
T
22 2
1 2 y y yL = …
y , where L is the number of measurements,
T
23
[ ] ww = w , and
is Φ the design matrix
( ) ( )
( ) ( )
23
23
1 1
LL
ϕ ϕ
ϕϕ
=
Φ
.
We need two consecutive measurements to estimate our regression model. Therefore L= 7 – 1 = 6 for
the immediate group, and L=4 -1 for the delayed group. Hence is Φ L×M matrix, where M is the
number of model parameters (i.e., M=2 for arm use model (2.2)).
Measurements from the EXCITE clinical data contain noise, and we assume that this noise is Gaussian
added to the linear regression model y. The data distribution z is thus assumed to be drawn
independently from Gaussian distribution with mean = Φ yw and variance
1
β
−
:
( ) ( )
2
T
1
~ | , exp( ( ))
22
L
N
ββ
β
π
−
= −− −
Φ zΦ zΦ z zw w w
, (2.8)
27
where β is a data accuracy hyper-parameter (inverse of variance). In Bayesian regression, we treat
model parameters as a probability distribution. We assume that the prior distribution of model
parameters is also independent identically distributed Gaussian.
( ) ( )
2
T
1
0 00
| , exp( ( ))
22
M
p
αα
α
π
−
= −− −
wμ wμ wμ
, (2.9)
where
0
μ is the mean of model parameter, and α the model accuracy hyper-parameter.
The goal of Bayesian regression is to maximize the Bayesian model evidence, which is the probability of
data distribution, given the model parameters: ( )
0
|,, p αβ z μ . Using the sum rule and product rule of
probability from (2.8) and (2.9), and taking the logarithm, we obtain the log of the model evidence (see
(Bishop 2006) page 167 for derivation for 0 centered priors, and Supplementary material: Text S1) :
( )
22
00
log | , , log log
2 22 2
ML
p
αβ
αβ α β = + − −− − z Φ z μ mμ m ( )
1
log| | log 2
2 2
L
π −− A
(2.10)
where
T
,
MM
αβ
×
= + AI ΦΦ and (2.11)
( )
1T
0
, α β
−
= + A Φz m μ (2.12)
where | A | is the determinant of A , and ⋅ is the Euclidean norm. As shown from Equation (S24) to
(S26) (Supplementary Material: Text S1), m is actually the mean of the posterior distribution of the
model parameters, and A is the accuracy (inverse of covariance) of the distribution. Note how m
reduces to the frequentist regression solution for 0 α = .
28
Equation (2.10) illustrates how Bayesian regression implements a trade-off between data fitting and
model complexity. With larger M (more model parameters), Φmcan better approximate the data
distribution z , and the error between z and Φm ,
2
− z Φm decreases. On the other hand, because
the size M of vector m also scales up with larger number of parameters, the regularization term
2
0
− m μ may increase. Similar trade-offs are found in (2.11) and (2.12) in the form of weighted
average between prior knowledge and data. Note that, since the design matrix Φ utilizes all data points,
we do not need to spare testing data points for evaluating model fit, unlike cross-validation (e.g., leave-
one-out).
We maximized the model evidence in terms of the two hyper-parameters , α which controls model
parameter distribution (2.9), and , β which controls data distribution (2.8). Note that m (2.11) and A
(2.12) are also functions of α and . β We used an iterative method (Bishop 2006), where we fixed m
and A in the first step and optimized α and , β and update them with the new α and β in the
second step. We provide here a summary of the algorithm to compute the model evidence (see
Supplementary material: Text S1 for details).
1. Set
0
, μ and initial α and β
2. Compute α and β using Equation (S21) and (S23)
2
0
2.1 α
γ
←
− m μ
2
2.2
L γ
β
−
←
− z Φm
29
where
1
M
i
i
λ
γ
αλ
=
=
+
∑
i
and
i
λ is i-th eigenvalue of
T
βΦΦ
3. Update and m A using (2.11) and (2.12)
3.1
T
MM
αβ
×
←+ AI ΦΦ
3.2
( )
1T
0
α β
−
←+ A Φz m μ
4. Repeat 2 and 3 until convergence
5. Set
*
α α ← and
*
ββ ← , and compute model evidence (2.10), and posterior model parameter
distribution, Equation (S24)
2.3.4 Model comparison, immediate group
Our hypothesis of “Use it and improve it, or lose it” is encapsulated in our reference model of arm
function, which assumes that current arm F(t) function depends on a weighted sum of previous arm
function F(t-1) and previous arm use U(t-1). We compared this model with alternative hypotheses in
which F(t) does not depend on use, but only on previous arm function F(t-1) (i.e. use has no effect on
function), or conversely, in which F(t) depends solely on previous use U(t-1), not on previous function.
As noted above, our hypothesis is not specific to the exact model given in equation (2.1), but other
models containing a linear combination of F(t-1) and U(t-1) also fall under “Use it and improve it or lose
it”. Thus, we also considered more complex linear stable models with 2 and 3 parameters. Table 1A
shows the 7 possible models of function that we considered, with the bold model our “reference model”.
30
Our reference model of arm use assumes that current use of the affected arm U(t) depends via a
sigmoidal function on previous function of the affected arm F(t-1) and a constant representing the
function of the non-affected arm. We compared this model with alternative models in which U(t)
depends linearly on previous arm function F(t-1). In simulations of our previous neuro-computational
model, the values for each arm were updated based on reward prediction errors at a much higher rate
than the update of performance. Since our time step in the current model is 4 months, it is thus possible
that the decisions to use the arm are updated much faster than performance. We therefore also
compared the model of Equation (2.2) to models in which the current arm use U(t) depends on current
arm function F(t), either via a sigmoid or linearly. Table 1B shows the 4 possible models of use that we
considered, with the bold model our “reference model”.
A
Regressors 1 parameter model 2 parameters model 3parameters model
F(t-1)
a
w
F(t-1)
a
w
F(t-1)+
c
w
--
U(t-1)
b
w
U(t-1)
b
w
U(t-1)+
c
w
--
F(t-1) and U(t-1)
(1-
1
w
)F(t-1) +
1
w
U(t-1)
a
w
F(t-1)+
b
w
U(t-1)
a
w
F(t-1)+
b
w
U(t-1)+
c
w
B
31
Regressors Models
F(t)
2
w
F(t)+
3
w
F(t-1)
2
w
F(t-1) +
3
w
F(t)
1/(1+exp[-(
2
w
F(t)-
3
w
)])
F(t-1)
1/(1+exp[-(
2
w
F(t-1)-
3
w
)])
Table 1 (A) Model comparison candidates for predicting arm function F(t). (B) Model comparison candidates for predicting
arm use U(t).
Initial means of the parameter distributions were taken as the values found with maximum likelihood
regression of all entries of the immediate group, except the weighted average model (bold in Table 1)
with initial mean value of
1
w at 1. We reflected our emphasis on data and lack of prior knowledge by
setting the ratio of the initial values of the prior accuracy
0
α and the data accuracy
0
β to
00
/ α β = 10
-3
and choosing almost flat priors with
0
α = 10
-11
, for both the function and use models. Note that these
initial parameter values were taken equal for all subjects. We verified in simulations that when
0
α < 10
-
11
, the results of model comparison are qualitatively the same as that presented. We set
0
β
= 10
-8
for
model fitting. We also performed a sensitivity analysis (i.e. a systematic variation) on the initial data
accuracy
0
β (See Supplementary Figure S1).
The Bayesian model evidence for each model was used to compare models by computing the Bayes
factor (BF), which is the ratio of model evidence probability of competitive models to the reference
32
model (Kass and Raftery 1995). Thus, given the model evidence probability ( )
0
|, ,
r
p αβ z μ for our
reference model and the model evidence probability for a competitive model ( )
0
|,, ,
c
p αβ z μ the Bayes
factor is given by
BF = ( )
0
|, ,
r
p αβ z μ / ( )
0
| , , .
c
p αβ z μ The Bayes factor has a role similar to the p-value in frequentist
statistics and is used to accept or reject the hypothesis (Raftery 1995). If BF < 1, there is negative
evidence for the hypothesis, and the hypothesis should be rejected. If 1≤ BF<3, the evidence is “barely
worth mentioning”. If 3≤BF<10, there is then substantial evidence for the hypothesis, and BF = 3 is a
threshold for accepting the hypothesis similar to p = 0.05 in classical statistics. Then for BF > 10, 30, and
100 there is strong, very strong, and decisive evidence for the hypothesis, respectively.
To compare the models over groups of subjects, a “group Bayes factor” can be computed by multiplying
the individual Bayes factors (Stephan, Harrison et al. 2007). However, such group Bayes factor is
misleading in the presence of the strong outliers, which are present in our analysis due to poor
convergence of the models for a number of individuals (as a result of our very limited data set).
Therefore, we evaluated the number of comparisons for which BF > 3 for either of the compared models
to compute the “positive evidence ratio”, which serves as a measure of which model is optimal at the
group level (Stephan, Harrison et al. 2007). Positive evidence ratios read as x:y, where x is the number of
subject for which the Bayes factor of the reference model is greater than 3, and y the number of
subjects for which the Bayes factor of the alternative model is greater than 3. For N – (x+y) subjects, no
conclusion can be drawn.
33
2.3.5 Model fit before and after therapy, delayed group
For this analysis, we hypothesized that motor training in the EXCITE trial, besides improving
function and use, also had a “meta-learning” effect (e.g., (Doya 2002, Schweighofer and Doya
2003)). According to this hypothesis, CIMT has an effect not only on arm function and use, but
also on the relationships between function and use. In our model, such meta-learning would
translate to different values of the parameters
1
w ,
2
w , and
3
w before and after therapy. In
particular, we hypothesized that training increases the confidence to use the arm for a specific
level of function, in which the model translates in an increase in the parameter
2
w . Using data
from the delayed group in the EXCITE trial (N = 45), we used Bayesian regression for our
reference model of equations (2.1) and (2.2), and we compared the means of the parameters for
each subject before and after therapy. The initial values of the hyper-parameters were the same
as of the immediate group analysis.
2.4 Results
2.4.1 Model selection and fit analyses
We first computed the Bayes factors to test the two hypotheses encapsulated in equations 1 and 2.
Then we computed the positive evidence ratio for each model from the individual Bayes factors. Table
2A shows that our reference arm function model weighting previous arm function and previous use with
a single parameter is strongly preferred over all other models with 2 or 3 parameters. This is presumably
because of the sparsity of data in our database. Our reference arm function model is preferred over the
model that depends only on previous arm function for 27 subjects out of 48 subjects, For 1 subject this
alternative model is preferred, and for 20 subjects, no conclusion can be drawn. Similarly, our reference
34
arm function model is preferred over the model that depends only on previous arm use for 25 subjects.
For 5 subjects this alternative model is preferred, and for 18 subjects, no conclusion can be drawn.
Table 2B shows that our reference use model with sigmoidal model of arm is strongly preferred over the
two linear models. However, our reference model is not preferred over an alternative model in which
arm use depends on current function; there is indeed a small advantage to the model that computes use
based on current function.
A
Regressors 1 parameter model 2 parameters model 3parameters model
F(t-1) 27:1 46:1 --
U(t-1) 25:5 45:2 --
F(t-1) and U(t-1)
1
w
U(t-1)+(1-
1
w
)F(t-1)
47:1 47:1
B
Regressors Models
F(t) (linear) 38:2
F(t-1) (linear) 38:2
F(t) (sigmoidal) 14:19
35
F(t-1) (sigmoidal)
1/(1+exp[-(
2
w
F(t-1)-
3
w
)])
Table 2 (A) Positive Evidence Ratio for function model (48 subjects, immediate group). The models correspond to the models
of Table 1A. (B) Positive Evidence Ratio for use model (48 subjects, immediate group). The models correspond to the models
of Table 1B.
Figure 2 shows examples of fits with our model for both arm function and use, using the mean
parameters for three subjects in the immediate group. In Figure 2A, both function and use continue to
increase after therapy (mean model parameters
1
w = 0.76,
2
w = 2.98 and
3
w = 0.42). In Figure 2B, arm
use initially largely decreases post-therapy despite relatively high function. This subject thus exhibits
“learned non-use” (Mean model parameters
1
w = 0.14,
2
w = 3.36 and
3
w = 3.03). In Figure 2C,
conversely, arm use increases after therapy, while function is relatively high. Because arm function
slightly decreases in the months following therapy, so does arm use, which reaches immediately post-
therapy levels after 2 years (mean model parameters
1
w = 0.19,
2
w = 3.48 and
3
w = 1.89). These figures
illustrates the dynamic, nonlinear nature of arm function and use post-therapy, and how our model
adequately captures these dynamical interactions and provide a reasonably good fit to the data,
although the use model appears to better fit the data than the function model, and with better fit soon
after therapy.
To systematically evaluate the goodness of fit, we trained the model on 6 of the 7 data points available
in the immediate group and compared the prediction of the model to the actual data point for testing
(thus performing a leave-one-out model fit). Note that we kept the first and the 7th point, since we used
them as an initial and final value of our model. Table 3A shows the average absolute errors of prediction
among subjects of the immediate group. The average absolute errors of all 2
nd
to 6
th
leave-one-out
prediction errors were 0.16 for arm function and 0.091 for arm use in the range between 0 and 1. The
36
models thus reasonably fit the data, especially in the first year after therapy, although the prediction
errors of the use model are lower than those of the function model overall (p < 0.0001, t-test). As a
comparison, the average absolute errors of randomized models were 0.22 for arm function and 0.26 for
arm use (Table 3B). Here, the randomized model generates predictions points from randomly selected
subject at the corresponding time step. A repeated measure ANOVA confirmed that mean prediction
errors of the proposed arm function model are smaller overall than those of the randomized arm
function model (p = 0.01), although the prediction errors in the proposed model increase with time
(model x time interactions: p < 0.0001. One way repeated ANOVAs, effect of time, proposed function
model: p < 0.0001, randomized function model p > 0.1). Similarly, the prediction errors of the proposed
arm use model are smaller overall than those of the randomized arm use model overall (p < 0.001, no
model x time interactions; p > 0.5).
A
Leave-one-out point 2
(4 months)
3
(8 months)
4
(12 months)
5
(16 months)
6
(20 months)
Ave.
Average absolute error
(Arm function)
0.097 0.12 0.18 0.18 0.19 0.16
Average absolute error
(Arm use)
0.055 0.080 0.088 0.12 0.12 0.091
37
B
Leave-one-out point 2
(4 months)
3
(8 months)
4
(12 months)
5
(16 months)
6
(20 months)
Ave.
Average absolute error
(Arm function)
0.23 0.22 0.21 0.22 0.21 0.22
Average absolute error
(Arm use)
0.25 0.22 0.26 0.27 0.31 0.26
Table 3 (A) Prediction error of proposed model. (B) Prediction error of randomized model
2.4.2 Model parameter analysis
Histograms of the mean parameters
1
w ,
2
w , and
3
w for the models of equations (2.1) and (2.2) are
shown in Figure 3. Because of the very few and noisy data points, Bayesian regression did not exhibit
adequate convergence of the model parameter distributions for all subjects; that is, the parameter
distributions were relatively flat for some subjects. In a first approximation, we defined good
convergence as follows: the standard deviation of the final parameter distributions after convergence
should be less than one standard deviation of the distributions of the parameters means. This criterion
resulted in the following cut-off standard deviations: 0.316 for
1
w ; 6.38 for
2
w , and 3.67 for
3
w . As
shown in Figure 3, all negative mean parameters
1
w were removed after applying this criterion. Thus,
for all 27 subjects with good convergence of the Bayesian regression for the function model, the mean
38
parameter
1
w was positive and in the range [0, 1], with median 0.64. This indicates a positive effect of
arm use on the previous time step upon arm function at the next time step.
Similarly, mean parameters
2
w and
3
w with large absolute values were removed by the cut-off
procedure. The median of the mean of
2
w for the 32 subjects with good convergence was 2.20. The
median of the mean of
3
w for the 33 subjects with good convergence was 1.40. Positive parameters
2
w
indicate that arm function has a positive effect on arm use, as hypothesized. Positive parameter
3
w
indicates competition between function of the affected limb and (constant) function of the non-affected
limb, as predicted by models of decision-making based on comparisons of “values”. Note that we
verified with surrogate data derived from the model that our Bayesian regression method can indeed
retrieve the parameters of the original model (see Supplementary material: Text S2 and Figure S2).
2.4.3 Effects of therapy on model parameters
We then examined whether CIMT had an effect on the model parameters in the delayed group by
comparing before and after therapy models. Before-therapy model parameters were trained with arm
function and use in the year before therapy. After-therapy model parameters were trained with arm
function and use in the year after the therapy period. The standard deviation cut-off values were the
same as above, and only parameters with good convergence before and after therapy were analyzed.
Among the three model parameter means, only the means of
2
w was significantly different between
before and after and therapy (Figure 4B, mean of
2
w before therapy 2.95 ± 0.32; after-therapy 4.58 ±
0.49; p= 0.041; N = 22; 2-tailed pair t-test). There was no difference in
1
w (Figure 4A, before-therapy
39
0.759 ± 0.044; after-therapy 0.825 ± 0.036; p= 0.55; N = 27; 2-tailed pair t-test) and in
3
w (Figure 4C,
before-therapy 2.21 ± 0.16; after-therapy 2.10 ± 0.20; with p= 0.54, N = 28, 2-tailed pair t-test).
2.4.4 Model Simulations
Our previous neuro-computational model of stroke recovery (Han, Arbib et al. 2008) exhibited non-
linear and bi-stable behavior of stroke recovery: the model predicted that if natural recovery, motor
training or both, brings performance above a certain threshold, training can be stopped, as the repeated
spontaneous arm use provides a form of motor learning that further bootstraps performance and
spontaneous use.
Here, we simulated our model made of equations (2.1) and (2.2) to study whether the simplified model
of the present study also contained such threshold and bi-stable behavior, and to study the effect of the
increase of the “confidence” parameter
2
w from before to after therapy, with the simplifying
assumptions that therapy does not increase function and use. For this purposes we performed a
parameter sensitivity analysis using the continuation and bifurcation toolbox Matcont
(http://sourceforge.net/projects/matcont/).
The sensitivity analysis of Figure 5B shows that for
3
w ≤ 3 and low values of
2
w , asymptotic function and
use are low. However, by increasing
2
w , therapy can “move” the participants from one low attractor to a
high attractor region, exhibiting convergence to different arm function values, as shown in simulation
results of Figure 5 5 A and B. Thus, if therapy increases the confidence to use the arm, the greater
spontaneous arm use will lead to greater function, in a virtuous cycle (Figure 5A,
2
w = 4 or
2
w = 5). In
contrast, for a low value of the parameter
2
w , the simulated patient is in a vicious cycle and use
decreases (as in Figure 5A for
2
w =3). Because of competition between function or each arm in
40
computing use, high values of
3
w
lead to greater non-use compared to smaller values of
3
w (See left
side of Figure 5B). This is illustrated by comparing arm use for the two subjects in Figure 2B and C. The
main difference in parameters between the subjects of Figure 2B and C is the value of
3
w . Because
3
w is
relatively large in 2B, arm use decreases to low level; in contrast use stays relatively high in 2C. However,
for
3
w > 3, a sufficient increase in the parameter
2
w will bring the system in a truly bi-stable mode.
Depending on the initial condition (i.e. values of F(t) and U(t) just after therapy), function and use can
either remain near low values or near high values delimited by the low Limit Point (LP) and high LP in the
Figure 5B. Thus, the model exhibits a “threshold” in function, as we proposed in our previous work (Han,
Arbib et al. 2008, Schweighofer, Han et al. 2009).
2.5 Discussion
Stroke recovery is, by definition, a time-varying process. Although our dynamical “state-space” model
naturally accounts for the time-varying nature of stroke recovery, this paper represents, to our
knowledge, the first effort to use approach to quantitatively model recovery of individuals post-stroke.
The stroke recovery model proposed here depicts a time-evolving process with interactions between
arm function and use. The model, which is composed of two sub-models, one that updates arm function
(Equation 1) and the other that updates arm use (Equation 2), has only three free parameters, which
were estimated with repeated measurements of upper extremity function and use obtained in a phase
III randomized controlled clinical trial, the EXCITE trial.
For a majority of the participants in the immediate group of the EXCITE trial that we studied, arm
function depends both on prior function and prior use. Presumably because of the very limited amount
of data that penalizes models with more parameters, the preferred arm function model performs a
41
weighted average of previous arm function and use with a single parameter. This model is preferred for
27 subjects out of 48 over a competitive model in which arm function is not dependent on previous use,
and is preferred for 25 subjects out of 48 over a model in which function is solely based on use. The
alternative models are preferred for 1 and 5 subjects respectively; for the remainder of the subjects, no
conclusion can be drawn. Furthermore, parameter analysis showed a positive effect of arm use at the
previous time step upon arm function at the current time step, thus truly capturing the phenomenon of
“Use it and improve it, or lose it” for a majority of the participants we studied. Although this
phenomenon may be taken for granted by stroke rehabilitation specialists, this is, to our knowledge, the
first systematic demonstration of the effect of the upper extremity use on changes in function and vice-
versa in stroke recovery in individual subjects (in our previous study (Schweighofer, Han et al. 2009) we
only study this effect of function immediately following therapy upon future use at the group level).
We further showed that for the large majority of the participants we studied, models of spontaneous
arm use based on a sigmoidal dependency of arm function are preferred over linear models. This result
indicates that the non-linear dependency of use on function has a strong effect on the fit of the use data.
Furthermore, parameter analysis showed that arm function has a positive effect on arm use with
competition between function of the affected limb and (constant) function of the non-affected limb, as
predicted by models of decision-making based on comparisons of “values” e.g. (Daw, O'Doherty et al.
2006). Time has a lesser effect: Our reference model of Equation (2.2) in which use depends on previous
function F(t-1)) is only preferred for 14 subjects over a model in which use depends on current function
F(t)). Contrarily this alternative model is preferred for 19 subjects over the reference model (no
conclusion can be drawn for the remainder 15 subjects). This inconclusive effect of time on arm use
suggests that update of the arm choice (presumably via the learning of “values”) is fast compared to the
update of arm function.
42
Our previous neuro-computational model of stroke recovery (Han, Arbib et al. 2008) exhibited bi-stable
behavior of stroke recovery. Here, our simpler data-driven model also exhibits a bi-stable behavior,
although for relatively large values of the parameters
2
w
and
3
w of the use model (see Figure 5B).
However, even for lower value of the parameters (around the mean of the estimated parameters)
therapy can, by increasing the parameter
2
w , “move” the participants from one low attractor to a high
attractor region shown in simulation results of Figure 5A. This simulation of the model made of
equations (2.1) and (2.2) illustrates the effect of the increase of the “confidence” parameter
2
w from
before to after therapy, with the simplifying assumptions that therapy does not increase function and
use. Simulations show that if therapy increases confidence to use the arm, the greater spontaneous arm
use will lead to greater performance, in a virtuous cycle (Figure 5,
2
w = 4 or
2
w = 5). In contrast, for a
low value of the parameter, the patient is in a vicious cycle and use decreases (as in Figure 5 for
2
w =3).
Unfortunately, because of the limited data set, the sustainability of this increase in confidence in
participants of the EXCITE trial is unclear. Since the median
2
w post-therapy in the immediate group
(2.20) is inferior to the median
2
w post-therapy in the delayed group (3.90) such increase may be
relatively short-lasting post-therapy.
In sum, our results suggest that learned non-use results, at least in part, from three non-mutually
exclusive factors: 1) a decrease in function of the affected arm; 2) a relative increase in function of the
non-affected arm (if for instance stroke affects the right arm and the right-hand dominant subject is
learning how to use her left arm); 3) reduced “confidence levels” in using the arm for a given function
(as a result of spilling a hot coffee on someone else for instance). Since our study is only a model of
changes in behavior, we can only speculate on the causes of non-use at the neural level. Reduced use
may lead to contraction of motor cortical maps leading to decreased performance and further reduced
43
use (Han, Arbib et al. 2008); contrarily forced use (i.e. practice) may lead to map expansion and increase
performance (Nudo, Wise et al. 1996). If such improvements in function together with confidence levels
are sufficient, then use of the affected arm in daily activities may increase sufficiently such that function
will improve spontaneously, effectively reversing non-use, as shown by our simulations in Figure 5. The
median of
1
w across subjects with good convergence was 0.64; given a time step of 4 months, this is
equivalent to a median time constant of forgetting of 1/0.64*4 months = 6.25 months. This appears
reasonable in light of the long-lasting cortical reorganization after training, e.g. [10].
Our model assumes the existence of independent measures of arm function and use across individuals
at specific times. So does the MAL AOU reflect arm use that does not depend on arm function? We
found a moderate but significant correlation (r = 0.58, p < 0.0001) between the normalized MAL AOU vs.
WMFT at t = 0 for all 93 patients (48 in immediate and 43 in delayed group). However, there is no
correlation between arm function and use for those 54 patients with medium to low function
(normalized WMFT < 0.5, r = 0.10, p = 0.45). For this sub-group, normalized MAL AOU ranges between 0
and 0.64. This indicates that, within this sub-group, some patients have relatively high use with low
function, and vice-versa, and that function and use are independent variables across subjects. Model
comparisons for this sub-group of subjects with medium to low function still largely favor our
hypothesized models over competitive models (See Supplementary material: Text S3).
The results of the present study need to be replicated with to-be-developed databases that contain
dozens of repeated measurements of upper extremity function and use before, during, and after
therapy. In particular, our model provides only “substantial” (in a Bayesian model comparison
terminology) evidence for the “use and improve it or lose it” hypothesis for a majority but not for all the
EXCITE participants we studied. Because of the sparcity of the data, the models did not fit the data in a
satisfactory manner for large subgroups of subjects, and no conclusion can be drawn for these subjects.
44
Furthermore, the predictions from our model, quite accurate in the first year, became worse with time
across subjects (See Table 3). A possible interpretation of this result is that the influence of function on
use and vice versa is stronger soon after therapy, but that this influence is reduced due to the myriad of
other un-modeled factors that influence use after stroke (the patient could for instance go back to work,
start to exercise, hire a caregiver, etc., all of which could affect the rate of recovery). Thus, our model is
currently best viewed as a prototype against which one can develop further time dependent models of
stroke recovery. Future models, based on a richer longitudinal data set of arm function and use,
including measurements just after the stroke, and that include neural measurement variables such as
lesion size, location, excitability of the corticospinal tract etc., might better characterize the time course
of stroke recovery. Our assumptions of two independent cortices, equal roles of each arm, and pure uni-
manual actions are also clear oversimplifications. Also, while motor (re-) learning after stroke can be
understood at least in part as practice-dependent reduction of kinematic and dynamic performance
errors (Krakauer 2006), no such error data were available in our data set, and we therefore did not
include a corresponding error-based (supervised) learning term in this simplified model (unlike in our
previous model (Han, Arbib et al. 2008)). Instead, the present model only includes a trivial form of
unsupervised learning in the update of arm function, and a degenerated form of reinforcement learning,
with “values” simply equal to functions. Finally, our model cannot predict the time course of
spontaneous recovery in the acute phase post-stroke. Here again, more longitudinal data points,
including early after stroke, are needed for viable extensions of the model.
Nonetheless, our model, although preliminary and despite its important limitations, is a first step in the
direction of the development of an accurate recovery model that can predict the time course of
recovery post-stroke. Our long-term goal is to validate and test a method based on such dynamical
models to compute the dose of arm and hand motor therapy for individual patients and provide treating
therapists with such a method to be used in the clinic. A well-validated model of upper extremity
45
recovery that generates accurate predictions of long-term use and performance, and the confidence
intervals of the predictions, could be highly valuable because the clinician, patient, or provider (if
applicable) will be able to make informed decisions about treatment and potentially determine the
critical dose of motor therapy for an individual patient. If for instance the model predicts that no
amount of recovery can increase use, rehabilitation may be in “vain”, and compensatory strategy should
be emphasized. On the other hand, if therapy is predicted to be effective, a well-validated and accurate
model could be used to determine minimally effective dose of therapy to maximize the benefit/cost
ratio of therapy.
46
Figure 1: Longitudinal arm and hand use data (as measured by the MAL AOU test, normalized) for 48 participants of the
immediate group in EXCITE illustrating how use can increase (A), decrease (B), or not change (C) in the 24 months following
therapy. Classification in the three categories was based on the significance of the slope parameter of a linear model fit of
use as a function of time, with a lenient criterion to test the hypothesis that the slope is not different from zero (p < 0.25).
Use increase category, N = 14; Use decrease category, N = 12; No change in use category, N = 22.
Figure 2: Examples of model fit for upper extremity function and use over 24 months post therapy for three subjects in the
immediate group using the model of Equation (2.1) and (2.2) in the main text (and corresponding equations in bold fonts in
Table 1). The blue lines show the actual data. The red lines are generated by the model with the mean model parameters,
trained with 7 data points. (A) Both arm function and use improve (mean model parameters
1
w = 0.76,
2
w = 2.98 and
3
w =
0.42) (B) Arm function is more or less constant, while arm use shows “non-use” (mean model parameters
1
w = 0.14,
2
w =
3.36 and
3
w = 3.03). (C) Arm function slightly decreases, while arm use rises after 4 month and keeps the level (mean model
parameters
1
w = 0.19,
2
w = 3.48 and
3
w = 1.88). See how the model fit is in general good for use over the 24 months and
for function in the first year but then is getting worse for function in the second year (see Table 3 for a systematic evaluation
of model fit).
47
Figure 3: Histograms of the means of parameters
1
w ,
2
w , and
3
w of the model estimated with data of the immediate group
in the EXCITE trial. Blue and Red: subjects with all estimated mean parameters. Blue: subjects with mean parameters after
application of convergence criteria (see Results). The numbers N’s indicate the numbers of subjects with good convergence
for each parameter. Note that for
1
w , the means of all parameters with good convergence are in the range [0; 1] supporting
the “Use it and improve it, or lose it” model. Similarly, for
2
w , the means of most parameters with good convergence are
positive, supporting an actual influence of function on use (Refer to Equation (2.1) and (2.2) in Methods for the role of these
parameters in the model).
48
Figure 4: Effects of therapy on mean model parameters for participants of the delayed group in the EXCITE trial. A. Effect on
1
w . B. Effect on
2
w . C. Effect on
3
w . Only the mean parameter
2
w of equation 2 varies from before (Be) to after (Af)
therapy. This parameter controls the effect of function on use for the affected arm. The horizontal line in B indicates p < 0.05.
49
Figure 5: Computer simulations of arm function showing dependence on model parameters. A. Simulations of the effect on
use after hypothetical changes in the confidence parameter
2
w
as a result of therapy. Initial parameters values:
1
w = 0.6,
2
w = 3,
3
w = 3. For simplicity, we assumed here that therapy has only an effect on the parameter
2
w and not on use and
performance (which it did in actual participants of the EXCITE trial (Wolf, Winstein et al. 2006)). The increase in parameter
50
2
w from before to after therapy parallels the increase in this parameter in the delayed group of the EXCITE trial (see Figure
4). B. Parameter sensitivity analysis showing the asymptotic value of arm function F as a function of parameter
2
w for a
number of values of
3
w . LP: limit point. The line labeled
3
w = 3 is generated by the same model as in A. For values of
3
w > 3
the system behavior exhibits a non-stable range between the two limit points. For
3
w = 3.5 and
2
w = 5 for instance, arm
function F converges to either a low or a high value.
51
2.6 Supplementary materials
Text S1: Bayesian regression and model parameter optimization
Here, we detail the formulation of 1) the Bayesian regression for the arm function and arm use model, 2)
the model evidence criterion for model fit evaluation, and 3) the iterative parameter optimization
techniques to tune the model parameters.
Our state-space model in the main text Equation (2.1) and (2.2) is shown here again.
𝐹 ( 𝑡 ) = (1 − 𝑤 1
) 𝐹 ( 𝑡 − 1) + 𝑤 1
𝑈 ( 𝑡 − 1) (S1)
𝑈 ( 𝑡 ) =
1
1 + exp ( −( 𝑤 2
𝐹 ( 𝑡 − 1) − 𝑤 3
))
(S2)
where 𝑤 1
, 𝑤 2
, and 𝑤 3
are model parameters to be estimated for each subject ( 𝑤 1
is a “use effect” rate,
𝑤 2
a “confidence parameter” of using affected arm, and 𝑤 3
the function of the non-affected arm, as we
discussed in the main text), and t is an time index t=1,…,T (T=7 for the immediate group, and T=4 for the
delayed group). We omit the subscript affected in the F and U for simplified notation, but still F(t) is arm
function of stroke-affected arm at time t, measured by normalized negative log WMFT, and U(t) is arm
use of affected arm at time t, measured by normalized MAL AOU. Here, we describe how to optimize
these model parameters under Bayes model evidence, given F and U measurements.
We reformulate Equations (S1) and (S2) to form equations linear in model parameters.
( ) ( ) ( ) ( ) ( )
1
1 1 1 (S3) Ft Ft U t Ft w −− = − −−
( )
( )
( )
23
log 1 (S4)
1
Ut
wF t w
Ut
= −−
−
Then, we apply a standard linear regression form:
52
( ) ( )
1 11
(S5) y t tw ϕ = ( ) ( ) ( )
2 2 23 3
(S6) y t t w tw ϕϕ = +
where (S5) and (S6) correspond to (S3) and (S4) respectively. Namely, 𝑦 1
( 𝑡 ) and 𝑦 2
( 𝑡 ) are the
dependent (target) variables, representing left hand side of (S3) and (S4) respectively. The variables 𝜑 1
,
𝜑 2
, and 𝜑 3
are the basis functions ( 𝜑 1
= 𝑈 ( 𝑡 − 1) − 𝐹 ( 𝑡 − 1), 𝜑 2
= 𝐹 ( 𝑡 − 1), and 𝜑 3
= −1). This
formulation with different basis functions and different number of model parameters is repeated to the
other competitive models in Table 1. Note that we can decouple 𝑦 1
and 𝑦 2
for the purpose of model
parameter estimation; hence, we use (S6) as an example in the following discussion.
We now apply time-evolving measurements as a design matrix to the arm use regression model (S6) in
a vector form.
𝒚 = 𝚽 𝒘 (S7)
where 𝒚 = [ 𝑦 2
(1) 𝑦 2
(2) … 𝑦 2
( 𝐿 )]
T
, 𝒘 = [ 𝑤 2
𝑤 3
]
T
, and the design matrix 𝚽 is given by
𝚽 = �
𝜑 2
(1) 𝜑 3
(1)
⋮ ⋮
𝜑 2
( 𝐿 ) 𝜑 3
( 𝐿 )
�
Since our model is derived from 1
st
order Markov chain, we need two consecutive measurements of U(t)
and U(t+1) for the linear regression form (S6). Therefore, the number of L is one less than T: L=T-1 (L=6
for the immediate group, and L=3 for the delayed group). Hence 𝚽 is L×M matrix, where M is the
number of model parameters (i.e., M=2 for arm use model (S2)).
The regression models cannot perfectly capture the dynamics, and we assume that the target variable
y varies with Gaussian distribution.
𝒛 = 𝒚 + 𝜼 = 𝚽 𝒘 + 𝜼
53
𝒛 ~ 𝑁 � 𝒛 � 𝚽 𝒘 , 𝛽 − 1
� = �
𝛽 2𝜋 �
𝐿 2
exp( −
𝛽 2
( 𝐳 − 𝚽 𝒘 )
T
( 𝐳 − 𝚽 𝒘 )) (S8)
where 𝜼 is independent identically distributed (i.i.d) Gaussian noise with the mean 𝚽 𝒘 and an accuracy
hyper-parameter (inverse of covariance) 𝛽 . We set the initial 𝛽 , 𝛽 0
=10
-8
for model fitting and vary for
sensitivity analysis (see below).
In Bayesian regression, we treat model parameters as a probability distribution, not as single point
values. We assume that the prior distribution of model parameters is also independent identically
distributed Gaussian.
𝑝 � 𝒘 � 𝝁 0
, 𝛼 − 1
� = 𝑁 � 𝒘 � 𝝁 0
, 𝛼 − 1
� = �
𝛼 2𝜋 �
𝑀 2
exp( −
𝛼 2
( 𝒘 − 𝝁 0
)
T
( 𝒘 − 𝝁 0
)) (S9)
where 𝝁 0
is the mean of model parameter, and 𝛼 accuracy. Batch least square solutions of all subjects
data from the immediate or delayed group are used for the initial mean of model parameters 𝝁 𝟎 (except
for 𝜇 0
=1 in weighted average arm function model). Since we have little knowledge of model parameters,
we set 𝛼 very small ( 𝛼 0
= 10
− 1 1
).
We reflected our emphasis on individual data and lack of prior knowledge by setting the ratio of the
initial values of the prior accuracy
0
α and the data accuracy
0
β to
0 0
/ β α = 10
-3
and choosing almost flat
priors with
0
α = 10
-11
. We verified in simulations that when
0
α < 10
-11
, the results of model comparison
are qualitatively the same as that presented. We set
0
β
= 10
-8
for model fitting and varied initial and
performed a sensitivity analysis (i.e. a systematic variation) on the initial precision
0
β .
From (S8) and (S9), we notice that the mean 𝚽 𝒘 of 𝒛 in (S8) is not a deterministic function, since 𝒘 is
also a Gaussian distribution as described in (S9). This means that we cannot evaluate the model fit using
point to point distance error metric; therefore, we use a Bayesian approach. We introduce a metric
54
called Bayesian model evidence, which is the probability density function of data, given model
parameters: 𝑝 ( 𝒛 | 𝝁 0
, 𝛼 , 𝛽 ). Using the sum rule, and product rule,
𝑝 ( 𝒛 | 𝝁 0
, 𝛼 , 𝛽 ) = � 𝑝 ( 𝒛 | 𝒘 , 𝛽 ) 𝑝 ( 𝒘 | 𝛼 , 𝝁 0
)d 𝒘 ( 𝑆 10)
Substituting (S8) and (S9) into (S10), we have
𝑝 ( 𝒛 | 𝝁 0
, 𝛼 , 𝛽 ) = �
𝛼 2𝜋 �
𝑀 2
�
𝛽 2𝜋 �
𝐿 2
�exp( − 𝐸 ( 𝒘 )))d 𝒘 ( 𝑆 11)
where
𝐸 ( 𝒘 ) =
𝛽 2
( 𝐳 − 𝚽 𝒘 )
T
( 𝐳 − 𝚽 𝒘 ) +
𝛼 2
( 𝒘 − 𝝁 0
)
T
( 𝒘 − 𝝁 0
) (S12)
This is a Gaussian distribution with mean 𝒎 z
= ϕ𝛍 𝟎 and covariance 𝐂 z
= 𝛽 − 1
𝑰 𝐿 × 𝐿 + 𝛼 − 1
𝚽 T
𝚽 .
However, here, we directly evaluate integral term for derivation purpose.
We separate terms with model parameters and without them.
2𝐸 ( 𝒘 ) = 𝛽 ( 𝐳 − 𝚽 𝒘 )
T
( 𝐳 − 𝚽 𝒘 ) + 𝛼 ( 𝒘 − 𝝁 0
)
T
( 𝒘 − 𝝁 0
)
= 𝒘 T
A 𝒘 − 2𝛼 𝝁 0
T
𝒘 − 2𝛽 𝐳 T
𝚽 𝒘 + 𝛽 𝐳 T
𝐳 + 𝛼 𝝁 0
T
𝝁 0
where
𝐀 = 𝛼 𝐈 𝑀 × 𝑀 + 𝛽 𝚽 T
𝚽 (S13)
We complete the square of the model parameter terms
2𝐸 ( 𝒘 ) = ( 𝒎 − 𝒘 )
T
𝐀 ( 𝒎 − 𝒘 ) + 𝛼 𝝁 0
T
𝝁 0
+ 𝛽 𝐳 T
𝐳 − 𝒎 T
𝐀 𝒎
where
𝒎 = 𝐀 − 1
� 𝛼 𝝁 0
+ 𝛽 𝚽 T
𝐳 � (S14)
Further, we complete the square over 𝝁 and 𝐳 .
55
𝐸 ( 𝒘 ) =
1
2
( 𝒘 − 𝒎 )
T
𝐀 ( 𝒘 − 𝒎 ) +
𝛼 2
‖ 𝒎 − 𝝁 0
‖
2
+
𝛽 2
‖ 𝐳 − 𝚽 𝒎 ‖
2
(S15)
where ‖ ∙ ‖ is the Euclidean norm. We substitute (S15) into (S11), and use the fact that the integral of
Gaussian distribution is equal to 1:
�
1
(2𝜋 )
𝑀 / 2
| 𝑨 |
1/ 2
exp � −( 𝒘 − 𝒎 )
T
𝐀 ( 𝒘 − 𝒎 ) � d 𝒘 = 1,
where |A| is the determinant of design matrix A. The resulting log model evidence probability is given
by
log 𝑝 ( 𝒛 | 𝝁 0
, 𝛼 , 𝛽 ) =
𝑀 2
log 𝛼 +
𝐿 2
log 𝛽 −
𝛼 2
‖ 𝒎 − 𝝁 0
‖
2
−
𝛽 2
‖ 𝐳 − 𝚽 𝒎 ‖
2
−
1
2
log | 𝐀 | −
𝐿 2
log (2𝜋 )
(S16)
Although (S16) looks complicated, among constant terms related with the data size 𝐿 and the number of
model parameters 𝑀 , we can find a trade-off between a regularization term ‖ 𝒎 − 𝝁 0
‖
2
and a data
fitting error ‖ 𝐳 − 𝚽 𝒎 ‖
2
term, which are balanced by 𝛼 and 𝛽 . Roughly speaking, a complicated over-
fitting model is penalized by the number of model dimension 𝑀 . With larger 𝑀 (more model
parameters), 𝚽 𝒎 can better approximate the data distribution 𝐳 , and the error between 𝐳 and 𝚽 𝒎 ,
‖ 𝐳 − 𝚽 𝒎 ‖
2
decreases. On the other hand, because the size 𝑀 of vector 𝒎 also scales up with larger
number of parameters, the regularization term ‖ 𝒎 − 𝝁 0
‖
2
may increase. Similar trade-offs are found in
(S13) and (S14) in the form of weighted average between prior knowledge and data. Note that, since the
design matrix 𝚽 utilizes all data points, unlike cross-validation (e.g., leave-one-out), we do not need to
spare testing data points for evaluating model fit.
Now, we maximize this model evidence probability in terms of 𝛼 , which controls model parameter
distribution (S9), and 𝛽 , which controls data distribution (S8). Note that 𝒎 (S13) and 𝐀 (S14) are also
56
functions of 𝛼 and 𝛽 . We use an iterative method, where we fix 𝒎 and 𝐀 in the first step and optimize
𝛼 and 𝛽 , and update 𝒎 and 𝐀 with the new 𝛼 and 𝛽 in the second step (Bishop 2006). Note that this
iterative method can be formulated equivalently by EM algorithm (Bishop 2006). Taking derivative of
(S16) with respect to 𝛼 is trivial in most terms but log| 𝐀 | term. The determinant of the matrix 𝐀 can be
expressed by the product of eigenvalues of A: | 𝐀 |= ∏ eig
𝒊 ( 𝐀 )
𝑀 𝑖 = 1
. Since 𝐀 = 𝛼 𝐈 + 𝛽 𝚽 T
𝚽 , the
eigenvalue of A is given by eig
𝑖 ( 𝐀 ) = 𝛼 + 𝜆 𝑖 ,
where 𝜆 𝑖 is i-th eigenvalue of 𝛽 𝚽 T
𝚽 . Hence,
d
d α
log| 𝐀 | =
d
d α
log � eig
𝑖 ( 𝐀 )
𝑀 𝑖 = 1
=
d
d α
� log ( 𝛼 + 𝜆 𝑖 )
𝑀 𝑖 = 1
= �
1
𝛼 + 𝜆 𝑖 𝑀 𝒊 = 1
(S17)
Using this, we take the derivative of (S16) with respect to 𝛼 , and set the derivative = 0.
d log 𝑝 � 𝒛 � 𝝁 0,
𝛼 , 𝛽 �
d 𝛼 =
𝑀 2𝛼 −
1
2
‖ 𝒎 − 𝝁 0
‖
2
−
1
2
�
1
𝛼 + 𝜆 𝑖 𝑀 𝑖 = 1
= 0 (S18)
Arranging this equation, and defining 𝛾 with 𝛼 ‖ 𝒎 − 𝝁 0
‖
2
,
𝛾 ≡ 𝛼 ‖ 𝒎 − 𝝁 0
‖
2
= 𝑀 − �
𝛼 𝛼 + 𝜆 𝑖 𝑀 𝑖 = 1
(S19)
Alternatively, we can express γ as follows.
𝛾 = 𝑀 − �
𝛼 𝛼 + 𝜆 𝑖 𝑀 𝑖 = 1
= �
𝛼 + 𝜆 𝑖 𝛼 + 𝜆 𝑖 𝑀 𝑖 = 1
− �
𝛼 𝛼 + 𝜆 𝑖 = �
𝜆 𝑖 𝛼 + 𝜆 𝑖 𝑀 𝑖 = 1
𝑀 𝑖 = 1
( 𝑆 20)
From (S19),
α =
𝛾 ‖ 𝒎 − 𝝁 0
‖
2
(S21)
Similarly, taking derivative of (S16) with respect to 𝛽 ,
57
d log 𝑝 ( 𝒛 | 𝝁 0
, 𝛼 , 𝛽 )
d 𝛽 =
𝐿 2𝛽 −
1
2
‖ 𝐳 − 𝚽 𝒎 ‖
2
−
1
2𝛽 �
𝜆 𝑖 𝛼 + 𝜆 𝑖 𝑀 𝑖 = 1
=
𝐿 2𝛽 −
1
2
‖ 𝐳 − 𝚽 𝒎 ‖
2
−
𝛾 2𝛽 = 0 (S22)
where we used the following equation of the derivative of log| 𝐀 | term with respect to 𝛽 .
d
d 𝛽 log| 𝐀 | =
d
d 𝛽 � log ( 𝛼 + 𝜆 𝑖 )
𝑀 𝑖 = 1
=
1
𝛽 �
𝜆 𝑖 𝛼 + 𝜆 𝑖 𝑀 𝑖 = 1
It is similar to (S17), but 𝜆 𝑖
is a linear function of 𝛽 in eig
𝑖 ( 𝐀 ) = 𝛼 + 𝜆 𝑖
, where � 𝛽 𝚽 T
𝚽 � 𝐯 𝑖 = 𝜆 𝑖
𝐯 𝑖 , and
𝐯 𝑖 is an eigenvector associated with 𝜆 𝑖 . Taking derivative: � 𝚽 T
𝚽 � 𝐯 𝑖 d 𝛽 = 𝐯 𝑖 d 𝜆 𝑖
, and multiplying
𝛽 𝛽 to
the left hand side, we have
1
𝛽 � 𝛽 𝚽 T
𝚽 � 𝐯 𝑖 d 𝛽 =
1
𝛽 𝜆 𝑖
𝐯 𝑖 d 𝛽 = 𝐯 𝑖 d 𝜆 𝑖
. Hence, the derivative of an eigenvalue
𝜆 𝑖
with respect to 𝛽 is the ratio of them:
d 𝜆 𝑖
d 𝛽 =
𝜆 𝑖
𝛽 . Solving (S22) for 𝛽 , we have
𝛽 =
𝐿 − 𝛾 ‖ 𝐳 − 𝚽 𝒎 ‖
2
(S23)
Our iterative method computes 𝛼 and 𝛽 using (S21) and (S23) in the first step, and then use the
updated 𝛼 and 𝛽 to re-compute 𝐀 and 𝒎 using (S13) and (S14) in the second step. We repeat these
steps until 𝛼 and 𝛽 converge. After several iterations, suppose that we have the converged values of
𝛼 and 𝛽 , denoted by 𝛼 ∗
and 𝛽 ∗
. This means that the model evidence log probability is maximized to
log 𝑝 ( 𝒛 | 𝝁 0
, 𝛼 ∗
, 𝛽 ∗
).
Now, we can use this model evidence probability to compute the posterior probability of model
parameters. Remember that the prior distribution of model parameters is given by Gaussian
𝒘 ~ 𝑁 � 𝒘 � 𝝁 0
, 𝛼 − 1
� , and the data distribution is also Gaussian 𝒛 ~ 𝑁 � 𝒛 � 𝚽 𝒘 , 𝛽 − 1
� , in which the mean is a
linear combination of 𝒘 . Applying a linear Gaussian formula (Roweis & Ghahramani, 1999 and(Bishop
2006) of 𝒘 given 𝒛 , we have
58
𝑝 ( 𝒘 | 𝒛 , 𝝁 0
, 𝛼 ∗
, 𝛽 ∗
) = 𝑁 ( 𝒘 | 𝝁 𝒘 , 𝑪 𝒘 ) (S24)
with the mean and covariance
𝝁 𝒘 = 𝑪 𝒘 � 𝛼 ∗
𝝁 0
+ 𝛽 ∗
𝚽 T
𝐳 � = 𝒎 (S25)
𝑪 𝒘 = ( 𝛼 ∗
𝐈 𝑀 × 𝑀 + 𝛽 ∗
𝚽 T
𝚽 )
− 𝟏 = 𝑨 − 𝟏 (S26)
where 𝒎 and 𝑨 are actually given by (S13) and (S14) and 𝛼 and 𝛽 are converged values after
optimization ( 𝛼 → 𝛼 ∗
and 𝛽 → 𝛽 ∗
). From this, we find that 𝒎 and 𝑨 used in the model evidence
computation above is the mean and accuracy (inverse of covariance) of the model parameter probability
distribution. Note that like EM algorithm, 𝛼 ∗
and 𝛽 ∗
is sub-optimal and hence model evidence too.
Here is a summary of the algorithm to compute Bayesian evidence with 𝛼 and 𝛽 optimization.
1. Set 𝝁 0
, and initial 𝛼 and 𝛽
2. Compute 𝛼 and 𝛽 using (S21) and (S23)
2.1 α ←
𝛾 ‖ 𝒎 − 𝝁 0
‖
2
2.2 𝛽 ←
𝐿 − 𝛾 ‖ 𝐳 − 𝚽 𝒎 ‖
2
where 𝛾 = ∑
𝜆 𝑖 𝛼 + 𝜆 𝑖 𝑀 𝒊 = 1
and 𝜆 𝑖 is i-th eigenvalue of 𝛽 𝚽 T
𝚽
3. Update 𝒎 and 𝑨 using (S13) and (S14)
3.1 𝐀 ← 𝛼 𝐈 𝑀 × 𝑀 + 𝛽 𝚽 T
𝚽
3.2 𝒎 ← 𝐀 − 1
� 𝛼 𝝁 0
+ 𝛽 𝚽 T
𝐳 �
4. Repeat 2 and 3 until convergence
5. Set 𝛼 ∗
← α and 𝛽 ∗
← 𝛽 , and compute model evidence (S16), and posterior model
59
parameter distribution (S24)
5.1 log 𝑝 ( 𝒛 | 𝝁 0
, 𝛼 ∗
, 𝛽 ∗
)
=
𝑀 2
log 𝛼 ∗
+
𝐿 2
log 𝛽 ∗
−
𝛼 ∗
2
‖ 𝒎 − 𝝁 0
‖
2
−
𝛽 ∗
2
‖ 𝐳 − 𝚽 𝒎 ‖
2
−
1
2
log | 𝐀 |
−
𝐿 2
log (2𝜋 )
5.2 𝑝 � 𝒘 � 𝒛 , 𝝁 0
, 𝛼 − 1
� = 𝑁 � 𝒘 � 𝒎 , 𝑨 − 𝟏 �
This discussion is based on (Chap. 3.5, Bishop, 2006 and Chap. 28, D. J. C. MacKay, 2002). For more
detail, please refer to the books. It should be noted that Bayesian regression fully utilizes the data set for
both model fitting and evaluation, without partitioning data for cross-validation.
We used the model evidence log probability as a metric of model fitting optimization, and applied this
algorithm for individual subject data (48 subjects for immediate group, and 45 for delayed group) for all
different models (9 models for arm function, and 4 models for arm use). The evidence for each model
was used to compare the models by computing the Bayes factor (BF). Given the model evidence
probabilities 𝑝 𝑟 ( 𝒛 | 𝛼 , 𝛽 , 𝝁 0
) for our reference model and for a competitive model 𝑝 𝑐 ( 𝒛 | 𝛼 , 𝛽 , 𝝁 0
), the
Bayes factor is given by 𝑝 𝑟 ( 𝒛 | 𝛼 , 𝛽 , 𝝁 0
)/ 𝑝 𝑐 ( 𝒛 | 𝛼 , 𝛽 , 𝝁 0
). This Bayes factor was computed for each subject.
We evaluated Positive Evidence Ratio (PER) (Stephan, et. al., 2007), which compares the number of
subjects, where the model evidence is positive (BF>=3) versus negative (BF=<1/3). Note that this
criterion excludes the “barely worth mentioned” evidence (1/3< BF < 3). Since it simply compares each
subject Bayes factor, this metric is robust against outliers. The result of PER was shown in Table 2 of the
main text.
60
References for Text S1
Bishop, C. M. (2006). Pattern Recognition and Machine Learning (Information Science and Statistics).
Springer.
MacKay, D. J. C. (2002). Information Theory, Inference & Learning Algorithms. Cambridge University
Press.
Roweis, S., & Ghahramani, Z. (1999). A Unifying Review of Linear Gaussian Models. Neural Computation,
345(1995), 305-345.
Stephan, K. E., Marshall, J. C., Penny, W. D., Friston, K. J., & Fink, G. R. (2007).Interhemispheric
integration of visual processing during task-driven lateralization. The Journal of neuroscience, 27(13),
3512-22.
Text S2: Simulations with surrogate data
The goal of these simulations was to show that our inference procedure is able to identify which model
the data came from reliably (i.e., that the model comparison works), and to recover the parameters of
the true underlying model well (i.e., that the model fitting works). We first generated large surrogate
data sets by choosing model parameters around the means of the estimated parameters. Specifically,
we selected fitted models of arm function for 27 subjects and the fitted models of arm use for 29
subjects, as these fitted models showed good convergence in parameters (see Figure 3 in main text). For
each fitted model, we generated 100 surrogate data sets with parameters means drawn from normal
distributions centered on the original parameter mean with standard deviation SD= 0.2.
Using these surrogate data, we performed model comparison using the same method as with the real
data. As we see in Table S1 below, there is strong evidence that our proposed model performs better
than the others on the surrogate data set, with qualitatively similar results to the actual data set
61
(compare to Table 2in the main text). Furthermore, histograms of the model parameters derived from
surrogate data compare favorably with those derived from actual data in Figure 3 – see Supplementary
Figure S2.
Text S3: Model comparison for subjects with medium and low WMFT scores
We conducted model fitting on the low to medium function (normalized WMFT<0.5) subset of
participants in the immediate group (N=22). Model comparison results are shown Table S2. As with the
original data set, the proposed models show better performance than competitor models.
Table S1: Positive evidence ratio of the simulation as described in Text S2. This table shows strong
evidence that our proposed model performs better than the others on the surrogate data set (2700
datasets for arm function and 2900 datasets for arm use). For more detail of surrogate data set, please
refer to Text S2.
A. Positive evidence ratio of arm function models estimated from surrogate data (2700 surrogate
subjects).
Regressors 1 parameter model 2 parameters model 3parameters model
F(t-1) 1461:498 2228:472 --
U(t-1) 1346:149 2617:81 --
F(t-1) and U(t-1)
(1-
1
w
)F(t-1)+
1
w
U(t-1)
2549:59 2624:75
62
B. Positive evidence ratio of arm use estimated from surrogate data (2900 surrogate subjects).
Regressors Models
F(t) (linear) 2689:86
F(t-1) (linear) 2424:336
F(t) (sigmoidal) 1771:486
F(t-1) (sigmoidal)
1/(1+exp[-(
2
w F(t-1)-
3
w )])
Table S2: A Positive evidence ratio of arm function for subjects with medium to low arm functions, as
described in Text S3. This table shows model comparison results of arm function for subjects with
medium to low arm function (normalized WMFT<0.5, N=22). Model comparisons for this sub-group of
subjects with medium to low function still largely favor our hypothesized models over competitive
models.
A Positive evidence ratio of arm function for subjects with medium to low arm function
Regressors 1 parameter model 2-parameter models 3parameter model
F(t-1) 12:3 19:3 --
U(t-1) 11:5 18:4 --
F(t-1) and U(t-1)
(1-
1
w
)F(t-1)+
1
w
U(t-1)
19:3 19:3
63
B. Positive evidence ratio of arm use for subjects with medium to low arm function
Regressors Models
F(t) (linear) 16:1
F(t-1) (linear) 16:1
F(t) (sigmoidal) 4:13
F(t-1) (sigmoidal)
1/(1+exp[-(
2
w F(t-1)-
3
w )])
A B
64
Figure S1: Sensitivity analysis of the initial value of the data accuracy for model of arm function (A),
and model of arm use (B). For this analysis, the group median of the log evidence probability among all
subjects was used to represent each model performance; we then compared the model by computing a
Bayes factor with the group median evidence probabilities. The x axis is the range of in the power of
10 and y axis is the group median of log Bayes factor for each model. Our reference model is Equation
(2.1) for arm function, and Equation (2.2) for arm use; see the Table 1 in the main text for the other
model entries. We varied (10
-8
10
-3
), with fixed α=10
-11
. The Bayes factor BF = 3 is shown by
the black dashed lines in log scale. A: Sensitivity analysis of for arm function model. The bluish color
lines correspond to the models of the 1
st
row of Table 1, which are regression models with F(t-1)
regressor. The light blue color line shows a model with a single parameter, and the dark blue color line
shows a model with two parameters. Similarly, the grayish color lines correspond to the models of the
2
nd
row of Table 1 with regressor U(t-1). The reddish lines correspond to the models of the 3
rd
row (with
regressor F(t-1) and U(t-1)). The darker lines have the more number of model parameters. This graph
shows that our reference model outperforms the others, although the differences with some models are
barely worth mentioning in a small range. B: Sensitivity analysis of for arm use model. The bluish
color lines correspond to the linear regression models. Light blue is with regressor F(t), and dark blue
with regressor F(t-1). The reddish color lines correspond to the sigmoidal regression models with
regressor F(t) (light red) and with regressor F(t-1) (dark red). This figure shows that for all < 10
-1
, the
two sigmoidal arm use model largely outperform the linear models, with little differences between the
sigmoidal models on one hand, and the linear models on the other hand.
o
β
0
β
o
β
0
β ≤≤
o
β
o
β
0
β
65
w
1
w
2
w
3
Figure S2: Histograms of model parameter derived from surrogate data as described in Text S2. These
histograms of the model parameters trained by surrogate data sets (2700 datasets for arm function and
2900 datasets for arm use) compare favorably with those derived from actual data in Figure 3. For more
detail of surrogate data set, please refer to Text S2.
66
Chapter 3 Modeling therapy training: Task interrelation in motor training
post-stroke revealed by graph structure learning
3.1 Summary
Although manipulating the schedule of motor tasks during task-oriented training has the potential to
improve recovery of arm and hand function post-stroke, it is at present not known how to determine
the task schedules that maximize recovery, notably because of possible learning transfer and
interference between tasks. Here, we aimed at quantifying such task interrelations by forming a
graphical model from task performance data obtained during task-oriented training collected in the
Extremity Constraint Induced Therapy Evaluation (EXCITE) clinical trial. At each session, post-stroke
participants practiced a few tasks that are selected by both the therapist and the patient out of a large
task bank. As a result, task performance data contains a large amount of missing data that are not
missing at random. To form a graphical model from such partial observations, we developed a new
method to learn Gaussian graph structure using the Bayesian Linear Model. We found that the output of
the graphical model correlated with similarity scores between tasks as rated by experts. Our results thus
suggest that our method can characterize task interrelations between multiple complex and ecological
motor tasks, typical of task-oriented training post-stroke.
3.2 Introduction
Task-Oriented Training (TOT) is effective in restoring function post-stroke (Fisher and Sullivan 2001, Van
Peppen, Kwakkel et al. 2004, Winstein, Rose et al. 2004, Wolf, Thompson et al. 2005) and has emerged
as the dominant approach to motor rehabilitation post-stroke (NINDS Stroke Progress Review Group,
2006, (Winstein and Wolf 2008)). In each TOT session, the rehabilitation therapist, in collaboration with
the patient, selects a number of functional tasks to practice. But how should the therapist (or a TOT
67
robot, see (Schweighofer, Choi et al. , Choi, Gordon et al. 2009) for instance) determine the task practice
schedule within a session? Although manipulating the schedule of motor tasks during motor training has
the potential to improve recovery of arm and hand function post-stroke (Krakauer 2006, Winstein and
Wolf 2008), there is little understanding on how to determine effective and efficient task schedules
(Schweighofer, Lee et al. 2011). A general training guideline recommends that the order of different
tasks should be randomized (Hubbard, Parsons et al. 2009). However, such random schedule may not
maximally increase long-term outcomes, as learning one task can induce “transfer”, or “interference”, in
learning a second task. In transfer, learning task A facilitates learning of a subsequent task B. There are
two types of interferences: anterograde and retrograde, which have been largely studied in the motor
adaptation field, e.g., (Brashers-Krug, Shadmehr et al. 1996, Tong, Wolpert et al. 2002, Miall, Jenkinson
et al. 2004, Krakauer, Ghez et al. 2005, Imamizu, Sugimoto et al. 2007, Krakauer 2009, Lee and
Schweighofer 2009). In anterograde interference, learning task A impairs learning of the subsequent
task B. In retrograde interference, learning task B impairs relearning of task A. Here, we considered only
anterograde interferences between tasks for two reasons. First, anterograde interferences appear to
have larger effects than retrograde interference - see (Sing and Smith). Second, retrograde interferences
may be less applicable in a single session of rehabilitation, where task A is typically only presented in
one block during a session, the time frame considered here.
Compared to motor adaption paradigms, TOT can contain a large number of different and complex
motor tasks, and it is unknown which combination of ecological tasks causes transfer and interference.
To study this problem, we used data from the EXCITE trial, in which 65 adaptive tasks were designed to
mimic daily life activities and emphasize different movements of upper extremity, and each task
practice were approved for training. In a sub-set of patients of the EXCITE trial (N = 48), the treating
therapist recorded a measure of performance for each task, as the number of repetitions per unit time.
In motor adaptation paradigms, anterograde interferences is often measured by the increase in the
68
initial error in task B compared to task A (Krakauer 2009). Here, similarly, we use an estimate of initial
task performance, which we call “performance index”, to measure interference and transfer. The
performance index is taken as the intercept of a linear regression of the performance as a function of
trials in each block. If task A leads to a decrease in the performance index of B, then we assume
(anterograde) interference. If conversely A leads to an increase in the performance index of task B, then
we assume transfer. Thus, we define a single measure of interrelation between tasks, and assume that
transfer and interference form a continuum of this measure, with positive values indicating transfer,
negative values indicating interference, and zero no effect of task A upon task B.
How can we quantify this measure of interrelations between tasks in TOT? A naïve approach to find such
interrelations would be to compute the sample covariance of the performance indices, with positive
covariance representing transfer, and negative covariance representing interference. However, this
would not represent the true interrelations between tasks, as shown by the following example. Let’s
suppose that the performance task A and B appears highly correlated given that all the other conditions
are the same across subjects. But, further investigation shows that correlation on the performance of
the tasks is zero among subjects with the same performance on a task C. This probably indicates that the
high correlation between A and B is due to the performance on a common movement found in C. In this
case, we can say that the task A and B are conditionally independent given performance data about C.
Thus, instead of covariance (or correlation) between tasks, we estimate the concentration matrix, the
inverse of the covariance matrix. An advantageous property of the concentration matrix is that a zero
entry at (i,j) element indicates conditional independence between tasks i and j given all the other
variables (Yuan and Lin 2007). Conditional independence indicates that two random variables (i.e., task
performance) may show some dependencies at first glance but turn out to be independent after a third
variable is observed.
69
However, the concentration matrix is difficult to estimate from actual TOT data such as EXCITE data, for
at least three reasons: the data can be “high dimensional” and “sparse,” and the measurements are
“partial”. The problem can be “high dimensional”, because the number of possible tasks can be
relatively large, and as a result, the number of task interrelations to describe can be very large. For
example, the 65 tasks in the EXCITE trial give a total of 2080 possible interrelations (assuming
symmetrical effect of one task on the other). The problem can be ‘sparse”, because many tasks will
presumably show no transfer or interference on each other. For example, chalkboard cleaning involves
gross grasp and shoulder flexion, while turning coins requires fine motor control of fingers, so these
tasks may be independent from each other. Finally, the measurements are “partial”, because only a few
tasks are selected from a pool of tasks at each session, and task selection varies between subjects and
between days for individual subjects. This is problematic since we cannot measure performance on the
unpracticed tasks and the influence to the performance of other tasks (e.g., the transfer and
interference effects).
Here, we propose to solve these three difficulties by learning an inter-task graph structure derived from
the task performance index. The graph structure is a graphical representation of the concentration
matrix: the tasks are the “nodes”, and the interrelations between tasks are the “edges” between tasks.
Recent graph structure learning methods have been developed to deal with partial data. The
MissGLasso is an Expectation-Maximization (EM) method for Gaussian Graphical Model (Städler and
Bühlmann 2010). In the mGLasso, the estimation of sample covariance and the optimization of graph
structure are decoupled (Kolar and Xing 2012). However, MissGLasso and mGLasso respectively rely on
the assumption of the missing data distribution such as Missing At Random (MAR), in which the pattern
of data missing is independent from both observed and unobserved variables, and Missing Completely
At Random (MCAR), in which missingness depends on observed variables but is independent from
unobserved variables. These assumptions, however, violate post-stroke TOT, in which patients and their
70
physical therapists select tasks from a task bank, based on both observed (e.g, task performance
progress) and unobserved (e.g., motivation to practice specific tasks) performance.
To form task structure from partial observations that are not missing at random, we propose a novel
graph structure learning method with a decoupled estimator approach, in which we use Bayesian Linear
Model (BLM) (Kay 1993) to estimate the sample covariance. This method has three main advantages.
First, our method needs no assumption on data missingness (e.g., MCAR and MAR). Instead, we fuse
partial measurements by Bayes theorem. Second, while mGLasso uses only pairwise samples to estimate
each variance term, our method can use any linear combination of measurements. Finally, like the
mGLasso, the estimation of sample covariance and the optimization of graph structure are decoupled in
our method, resulting in fast computations. To test the feasibility of our method, we compared the
output of the model trained by performance data collected in the EXCITE clinical trial (Wolf,
Winstein et al. 2006), with expert-rated similarity scores between tasks. In addition, we compared
performance of our new graph structure learning method with that of MissGLasso (Städler and
Bühlmann 2010) and mGLasso (Kolar and Xing 2012) on synthetic data with missing-at-random
violation.
3.3 Clinical Methods
3.3.1 Task selection and task schedule
Participants in the EXCITE trial received motor training during 2-weeks for 6 hours a day, during which
they practiced functional motor tasks that emphasized different arm and hand movements (Wolf,
Winstein et al. 2006). Among 113 registered tasks, we considered here the 65 adaptive tasks designed to
mimic daily life activities and emphasize different movements of upper extremity (e.g., pinch grasp,
71
wrist extension, and shoulder flexion) (Winstein and Wolf 2008). Physical therapists selected tasks from
the task bank at each session, with the selection based on interest of the patient and on progress in
performance of previously practiced tasks. Figure 6 shows examples of the between-day task selection
and schedule for two participants. One subject practiced 14 tasks (Figure 6 A), and the second subject 12
tasks (Figure 6 B) in total during the therapy. Notice how the task selections and schedules are unique
to each subject.
3.3.2 Task contents
Among the 65 registered adaptive tasks, we selected the 6 tasks that have the largest number of
repetitions across all subjects. This choice allowed us to have at least several pairwise measurements for
all task combinations (see below). These 6 tasks were Task VP, Vertical pegboard, Task WE, Weighted
extensions, Task TK, Telegraph key, Task AR, Arc and ring, Task CB, Cotton balls, and Task PC Poker
chips, as described in Table 4. Volunteers (students at the University of Southern California) examined
clinical records, and coded the number of repetitions for 46 participants. One task practice consists of 10
contiguous trials (called 1 set), and the performance in each trial was recorded by the number of
repetitions of task completion in a unit time.
3.3.3 Expert rated similarity score
We compared the outcome of task structure from our method with expert-rated similarity scores
computed from task movement attributes. The tasks in the EXCITE trial were coded with 22 attributes
characterizing the movement associated with each task. The 22 attributes were derived from a
classification scheme developed through a consensus process from a previous analysis (Winstein,
Blanton et al. 2007). The following instructions were given to each of three raters: “To rate each task,
use the following 4-level rating scale: 0 (not present), 1 low (0-33%), 2, moderate (34-66%), 3 high (67-
100%) for the listed 22 attributes”. The three rater scores were averaged. Table 5 shows the mean
72
attribute scores of our 6 tasks. To quantify the task similarity from the expert rating, we first normalized
the similarity scores task-wise and then computed between-task covariance based on the 22 movement
attributes.
3.3.4 Performance index
As discussed above, the performance index of a set of trials was taken as the offset of the linear fit of
repetitions versus trials within a set (= 10 trials). Figure 7 shows an example of performance data and its
linear fit for 2 tasks (1 set each) for a subject.
We formed pairwise measurements Z
( i)
= [z
t
a
, z
t
b
]
T
where t
a
and t
b
are tasks practiced within a day,
and z
t
a
and z
t
b
are corresponding performance index. When more than two tasks were practiced within
a day, we formed exhaustive combinations of tasks as pairwise measurements. Table 6 shows the
number of paired measurements across subjects and days for the 6 tasks. There were 542 pairwise
measurements in total.
3.4 Computational Methods
3.4.1 Gaussian Graphical Model (GGM)
GGM is a special case of pairwise undirected graphical model, where the random variables are
multivariate Gaussian pdf:
𝑝 ( 𝐗 ) =
1
C
exp( −
1
2
( 𝐗 − 𝐛 )
T
W( 𝐗 − 𝐛 ))
where 𝐗 = � x
1
, x
2
, … , x
p
�
𝐓 is p-dimensional vector of random variables, where each element represents
the performance index for each task, W is the concentration matrix (inverse of covariance), 𝐛 is the
mean of the distribution, and C is a normalization constant. The corresponding graph structure (i.e., task
73
interrelation structure) G = (V, E) is defined by a set of nodes V = � x
1
, x
2
, … , x
p
� and a set of
edges E = � e
ij
|x
i
∈ V, x
j
∈ V � where the edge weight is given by (i, j) element of W.
Our goal is to find the concentration matrix W from the samples of 𝐗 . However, simply inverting sample
covariance of 𝐗 fails under sparse setting. One elegant form to estimate the concentration matrix is the
graphical Lasso, which does not require inversion of the sample covariance (Friedman, Hastie et al.
2008):
min
W
�
≻ 0
− log � W
�
� + trace � ΣW
�
� + λ
W
�
� |W
�
� |
1
(1)
where W
�
is a concentration matrix to be estimated under positive definiteness: W
�
≻ 0, Σ represents the
sample covariance of 𝐗 , λ
W
�
is a parameter controlling sparsity of the task structure, | ⋅| is the
determinant operation, || ⋅ ||
1
is element-wise 𝑙 1
norm. Coordinate descent search provides numerical
solutions to this optimization problem (Friedman, Hastie et al. 2008).
3.4.2 Bayesian linear model
A simple possibility to estimate the sample covariance with partial data would be to use pairwise
samples to estimate each term, as in the mGLasso. However, the mGLasso relies on the MCAR
assumption. Since our data violate this assumption, as discussed above, we applied BLM to compute the
sample covariance estimate Σ
�
of the task performance from the partial measurements. Importantly,
there is no assumption on the missing data in the BLM. Let’s suppose that the estimate of task
performance index 𝐗 is given by 𝐗 �
with a (prior) Gaussian pdf 𝑁 ( 𝐗 �
, Σ
�
). When a partial measurement 𝐙
(observed task practice index) is provided, we update the estimate 𝐗 �
to 𝐗 �
| 𝐳 with the posterior Gaussian
pdf 𝑁 ( 𝐗 �
| 𝐳 , Σ
�
| 𝐳 ). The estimation of the measurement 𝐙 = H 𝐗 + 𝐰 is given by the Bayesian linear model:
𝐙 �
= H 𝐗 �
(2)
74
where H is m × p observation matrix, and w is m × 1 measurement noise with distribution 𝑁 ( 𝟎, R)
which is independent from 𝐗 . We call R the measurement error covariance. H has a specific form from
our partial measurements. The i-th row vector of H corresponds to i-th element of the measurement
vector Z, where
h
ij
= �
1 j-th task performance is measured
0 otherwise
We have H = � h
ij
�
m× p
when m tasks are practiced out of p tasks within a session (m=2 and p=6 in our
pairwise measurement setting).
Then, assuming that 𝐙 and 𝐗 are jointly Gaussian, the posterior distribution (Kay 1993) is
𝐗 �
| 𝐳 = 𝐗 �
+ Σ
�
H
T
UH � 𝐙 − 𝐙 �
� (3)
Σ
�
| 𝐳 = Σ
�
− Σ
�
H
T
UH Σ
�
= � H
T
R
− 1
H + Σ
�
− 1
�
− 1
(4)
The partial measurement covariance S is defined as
S = H Σ
�
H
T
+ R (5)
Its inverse, the partial concentration matrix, U is given by U = S
− 1
3.4.3 Sparse partial concentration matrix
The above BLM method requires to compute the partial concentration matrix U by inverting the partial
covariance S. Such computation may suffer in numerical instability especially when eigenvalues of the
measurement noise covariance R is close to 0 (or when the condition number is large). More
importantly, the estimate of concentration matrix Σ
�
| 𝐳 is not consistent, that is, the estimate does not
converge to the true covariance at the limit, as we show here: With recursive application of up-to k
measurements of the same form of H, the estimate Σ
�
| 𝐳 − 1
(k) is given by Σ
�
| 𝐳 − 1
(k) = Σ
�
| 𝐳 − 1
(0) + kH
T
R
− 1
H
75
(Dissanayake, Newman et al. 2001, Terra, Ishihara et al. 2007), where Σ
�
| 𝐳 − 1
(0) is the initial prior
distribution concentration matrix, and Σ
�
| 𝐳 − 1
(k) is the posterior distribution concentration matrix
(Dissanayake, Newman et al. 2001, Terra, Ishihara et al. 2007). Thus, by matrix inversion lemma, at the
limit where k converges to infinity, the covariance estimate Σ
�
| 𝐳 converges to the initial prior covariance.
Here, taking advantage of the repeated partial measurements (i.e., the same combination of pairwise
measurements), we propose a simple but effective method to solve the inconsistency problem by using
a sample estimator for partial measurement covariance. Suppose that we have k measurements on the
same form of H: 𝐙 ( i)
(i = 1, … k). Instead of repeating the posterior update k times, or stacking up all k
measurements into a batch form of H (Zhang, Qi et al. 2012), we compute the sample mean and
covariance of measurements,
𝐙 �
=
1
k − 1
� 𝐙 ( i)
k
i = 1
(6)
S
�
=
1
k
� ( 𝐙 ( i)
− 𝐙 �
k
i = 1
) � 𝐙 ( i)
− 𝐙 �
�
T
(7)
To obtain the partial concentration matrix U , we apply again the graphical Lasso:
min
U
�
≻ 0
− log � U
�
� + trace � S
�
U
�
� + λ
U
�
� |U
�
� |
1
(8)
Then, the posterior update which corresponds to Eq. (3) and (4) is
𝐗 �
| 𝐳 = 𝐗 �
+ Σ
�
H
T
U
�
H � 𝐙 − H 𝐗 �
� (9)
Σ
�
| 𝐳 = Σ
�
− Σ
�
H
T
U
�
H Σ
�
(10)
This sample estimator has good properties. First, it does not require to directly invert S = 𝐻 Σ
�
𝐻 T
+ 𝑅
anymore, therefore there is no instability issue to compute the partial concentration matrix. Second, 𝐙 �
and
S
�
are unbiased and consistent, as they converge to the true mean and covariance of partial measurements
76
E[ 𝐙 ] and Cov[Z], as the number of measurement goes to infinity (DasGupta 2008). Finally, at the limit,
the concentration matrix U
�
converges to the true concentration matrix U even for growing p (Friedman,
Hastie et al. 2008, Lam and Fan 2009, Ravikumar, Wainwright et al. 2011).
Here is a summary of our algorithm to estimate state concentration matrix from partially observed data.
1. From the repeated measurements: Z
( i)
( i = 1, … k ),
compute the sample partial measurement covariance
S
�
=
1
k
� ( 𝐙 ( i)
− 𝐙 �
k
i = 1
) � 𝐙 ( i)
− 𝐙 �
�
T
where
𝐙 �
=
1
k − 1
� 𝐙 ( i)
k
i = 1
2. Run the graphical lasso to estimate inverse of innovation matrix U
�
.
min
U
�
≻ 0
− log � U
�
� + trace � S
�
U
�
� + λ � |U
�
� |
1
3. Using U
�
, compute posterior state covariance estimate.
Σ
�
| 𝐳 = Σ
�
− Σ
�
H
T
U
�
H Σ
�
4. Repeat 1 to 3 with different pairs of measurements.
5. Run the graphical lasso again to estimate W
�
, the concentration matrix of X
min
W
�
≻ 0
− log � W
�
� + trace � Σ
�
| 𝐳 W
�
� + λ
W
� � |W
�
� |
1
Note that there are two sparsity parameters: λ
U
�
and λ
W
�
. For simplicity, we use the same value for the
two parameters ( λ
U
�
= λ
W
�
= λ) when we tune them via cross-validation (see below).
77
3.4.4 Simulations
We conducted numerical simulations of graph structure learning with partial observation data. We
compared the performance of our graph structure learning based on Bayesian Linear Model, to the EM-
based method MissGLasso (Rothman, Bickel et al. 2008, Städler and Bühlmann 2010) and to the pair-
only estimate method mGLasso (Kolar and Xing 2012) for different ratios of observations to the number
of nodes. Our datasets are generated from multivariate Gaussian, as in (Rothman, Bickel et al. 2008,
Städler and Bühlmann 2010). The covariance of the Gaussian distribution was given by Ω = B + δI,
where B is a symmetric matrix, where the diagonal elements are zero and the off-diagonal elements are
0.5 with probability 0.1 and 0 otherwise, I is the identity matrix, and δ is a constant so that the condition
number of Ω is equal to p. We set the mean of the multivariate Gaussian as an instance from the
exponential distribution with the mean 1. We generated n samples of p-dimensional vectors from this
distribution.
We selected the m entries in a partial observation using a biased sampling from exhaustive �
p
m
�
combinations, where m is the number of observations out of p. The bias weight for sampling was set
proportional to the signal strength of the instance values. The selection criterion of the partial
observations depends on both observed and unobserved variables, in Missing Not At Random (MNAR)
condition. We varied the triple (p, m, n), which is the number of nodes, the number of entries observed
in a measurement, the total number of measurements respectively in this numerical simulation. We
used Kullback-Leibler (KL) loss as the performance metric.
78
3.5 Results
3.5.1 Training task structure from EXCITE performance data
Task performance index passed the Lilliefors normality test with 5% significance level except for tasks
WE and AR. As training data, we used pairwise measurements of the performance index across subjects
during 2 weeks therapy for 6 tasks. The initial Gaussian prior distribution of 𝐗 was given by the mean of
task performance data and the covariance of mean-filled performance data.
Figure 8 shows the final task graph structures for two values of the sparsity parameter λ (Figure 8A:
λ = 1.00 × 10
− 3
, and Figure 8B: λ = 5.00 × 10
− 3
). Note how the interrelation between VP and PC seen
for low sparsity λ = 1.00 × 10
− 3
disappears for higher sparsity λ = 5.00 × 10
− 3
, replaced by strong
interrelations between VP and TK, and PC and TK. In addition, with increased value of λ, task WE
(weighted extension) became independent from all the other tasks.
3.5.2 Task structure and expert-rated similarity
We compared the task structure obtained from our method with pairwise measurements, and the task
similarity obtained from expert ratings. We selected λ = 1.00 × 10
− 3
which corresponded to the
minimum p-value of the Pearson correlation between the rater-derived and the data-derived
covariances (Figure 8A shows the corresponding task structure). We then inverted the concentration
matrix W
�
to compute the covariance matrix. To compare the rater-derived and the data-derived
covariances, we computed the Pearson’s correlation between the coefficients on off-diagonal elements
of the covariances. The performance-based covariance derived from our method showed significant
negative correlation with the attribute-based covariance derived from the expert ratings (r = -0.69 p =
0.0043; Figure 9).
79
3.5.3 Sensitivity analysis
We conducted a sensitivity analysis on the single free parameter λ by running our graph structure
learning algorithm with 𝜆 (10
− 1 0
≤ 𝜆 ≤ 10
1
), which covers the range of task structure from fully-
connected to no edge (self-pointing edge only). We first evaluated the task structure using the expert-
rated similarity and the KL loss (Rothman, Bickel et al. 2008, Städler and Bühlmann 2010) : trace � AW
�
� −
log � det � AW
�
� � − p, where A is the task similarity covariance from the raters, W
�
is the estimated
concentration matrix from the graph structure learning algorithm, and p is the number of tasks. As
shown in Figure 10A, the KL loss stays near constant for a large range of λ, and then, starts increasing
from about λ > 10
− 3
.
The KL loss takes into account all terms of the concentration matrix, including the diagonal terms, which
are not related to tasks interrelations. Thus, to only study the effect of λ on the pairwise task elements,
we computed the Pearson’s correlation coefficient between the off-diagonal elements of A and the
inverse of W
�
, as above. Figure 10B shows the p-values of this correlation over the range of λ. In Figure
10C, we plotted the off diagonal terms over the range of λ. As expected, all terms converge to 0 as λ
increases. In contrast, for small values of λ, many terms are non-zero, with a few large terms in absolute
value.
3.5.4 Simulation results using synthetic data
Table 7 compares the average and standard deviation of KL loss on simulation results obtained with 2-
fold cross-validation, with 50 runs for varying ratio of unobserved variables (m) to the number of nodes
(p; the number of tasks in our experiment) and for three methods: our graph structure learning based
on Bayesian Linear Model BLM, EM-based method MissGLasso, and pair-only estimate mGLasso. Results
showed that our BLM based method outperformed the mGLasso, and has comparable performance to
the MissGLasso. However, the computation time of the MissGLasso was much larger than the other two
80
due to the iteration to reach convergence. For example, for (p, m, n) = (30,9,300), one trial took 63.35,
0.02363, and 0.07634 seconds respectively for the MissGLasso, mGLasso, and BLM methods using a 3.4
GHz Intel dual-core 4G memory machine.
3.6 Discussion
The goal of this study was to quantify the measure of interrelations of tasks presented in TOT post-
stroke using graph structure learning. In TOT, assumptions of data missingness (e.g., MCAR and MAR)
are largely violated, as the patients and their therapists select tasks from a task bank, based on both
observed (e.g, task performance progress) and unobserved (e.g., motivation to try new challenging tasks)
performance. To form task structure from partial observations that are not missing at random, we thus
developed a novel method to estimate the concentration matrix from sample covariance (Friedman,
Hastie et al. 2008). Our method learns the GGM based on the Bayesian Linear Model (BLM) and use of
the 𝑙 1
norm regularization to enhance sparsity of graph structure. Our simulation and clinical results
suggest that our novel method can characterize inter-task relations between multiple complex and
ecological motor tasks, typical of TOT post-stroke.
Our capacity to learn multiple tasks is limited by the nature of the tasks and the time between the
presentations of the tasks (Tong, Wolpert et al. 2002). When tasks are highly similar, i.e., when subjects
are required to adapt to equal and opposite position-dependent visuomotor rotations (Krakauer,
Ghilardi et al. 1999) or velocity-dependent force fields (Brashers-Krug, Shadmehr et al. 1996) in quick
succession, interference occurs that prevents the first task from being consolidated in memory and
lower performance. In contrast, when the tasks are sufficiently dissimilar, such interference is not
observed, as when between learning a position-dependent visuomotor rotation and an acceleration-
dependent force field (Krakauer, Ghilardi et al. 1999). Our analysis support these interference findings,
81
and extend them to ecological tasks used rehabilitation post-stroke: when the experts rated two tasks as
being similar (as tasks WE and TK, see upper corner of Figure 9), our data-driven analysis of task
interrelation showed negative covariance, indicating interference between the two tasks, as the initial
performance was lower than would have been if one task was not preceded by the other (in the same
session).
But how can we interpret the positive task interrelations for low similarity scores? The overall trend
seen on the right of Figure 4 is that the greater the dissimilarity between tasks (negative covariance in
rater analysis), the more the transfer in learning (positive covariance in data driven analysis). However, if
we do not consider the special case of the task pair (TK, AR), it can be seen that practicing tasks that are
somewhat, but not too dissimilar can lead to some degree of transfer. For instance, the pairs (AR, PC),
(WE, PC), and (WE, AR) were all rated somewhat dissimilar, with expert-rated covariance around -1 (see
center of Figure 9). These three pairs showed some degrees of transfer however (data-estimated
covariance is positive). Because we do not have kinematic recordings, we can only speculate here, but it
is possible that the training on the wrist components, common to these three tasks (expert ratings all
equal to 1), increases initial performance on the second task of the pair.
3.6.1 Comparisons with other computational methods
A simpler method of graph structure learning than the graphical Lasso is based on covariance
regularization by hard-thresholding the sample covariance (Bickel and Levina 2008). This method is
consistent under uniformity property (relaxed version of banded matrix). However, it is in general
unlikely that performance data on multiple tasks practice can be ordered in such a way with uniformity
property considering complex interrelations of tasks. This was the primary reason why we used the
graphical Lasso method to learn the graph structure.
82
The posterior distribution from partial observation on BLM can be seen as one-time update in state-
space model of dynamic systems. In this context, sparse signal recovery using Kalman Filter was
proposed (Vaswani 2008, Carmi, Gurfil et al. 2010). The authors proposed a pseudo-measurement
model to tune the measurement matrix e.g., using Dantzig selector. We assumed that, with a context
switching mechanism between tasks, performance of a task (not with the other tasks) accounts for the
measurement of the task performance. We therefore used an explicit observation model H with 1s and
0s, instead of pseudo-measurement model with sparsity tuning.
3.7 Limitations and future work
Our approach to quantify task interrelations in TOT post-stroke, despite being highly novel, has at least
three important limitations. First, we included in our analysis only the six tasks most practiced tasks in
terms of the total number of repetitions across all subjects. This number is relatively small compared to
common applications of graph structure learning with sparsity enhanced. However, increasing the
number of task entries in our database was not possible, as it led to decreased in the number of
pairwise measurements. Even with only 6 tasks, there were only four pairwise measurements for task
AR and CB (Table 6). Note however that simulations validated our algorithm in higher dimensions.
Second, the results of our graph learning methods depend on the sparsity parameter λ, and we relied on
the expert rated score covariance to evaluate the task structure and the sparseness parameter. In
practice, the expert rating will not be available, and thus evaluation of the optimal sparsity parameter λ
via expert knowledge not possible. Note however, for a large range of parameter λ <5.0 × 10
− 3
, both
the KL loss and p-value of the correlation between performance-based covariance derived from the
GGM and the attribute-based covariance derived from expert ratings leveled off, and the correlation
was significant in all this range (Figure 10). Therefore, at least for our small number of tasks, the method
is relatively robust with regard to the sparsity parameter.
83
Third, the approach taken here is highly labor intensive because during training in EXCITE therapists
recorded manually task performance for each task and each subject. This is not practical in actual
rehabilitation setting. In addition, raters spend large amount of time rating task via the tasks according
to the 22 attributes. However, the current revolution in the use of robotics, wearable sensors, and
computer gaming in rehabilitation, will allow vast collection of kinematic data. Thus, in the near future,
automated kinematic analysis will allow study of task similarity and the study of task transfer and
interference effortlessly (see (Schaefer, Patterson et al.) for preliminary work in this direction). Our
novel graph learning method will thus allow the analysis of task interference and transfer based on
kinematic analysis for large task-bank, even for individual patients during training.
Aknowledgements. NS and CW acknowledge grant NIH R01 HD 0654438-3. We thank Makoto Yamada
and Yuko Takagi for comments on a previous draft.
Figure legends:
Figure 1: Task selection for 2 subjects during the 2 weeks of EXCITE trials. A: The subject practiced 14
tasks. B: The subject practiced 12 tasks in total during the therapy. The x-axis is the day of visit, and y-
axis EXCITE task ID. The intensity shows the number of repetition per day.
Figure 2: Example of performance and linear fit as a function of trials for Tasks TK and AR for 1 subject
on day 1. The x axis is the trials within a set, and y axis is the number of the repetition in a unit time.
Black dots show the number of repetition on each trial; the gray line shows progress in one set; the
black line shows the fitted line to the data. The intercept of this line gives the "performance index" that
we used study task interrelation via graphical models.
Figure 3: Graph structure of task interrelation for the 6 most practiced tasks trained from 542 pairwise
measurements from 46 participants and two values of sparsity parameter λ. Ticker edges represent
84
higher absolute value of concentration matrix between tasks. A: a less sparse graph with λ =
1.00 × 10
− 3
, and B: a sparser graph with: λ = 5.0 × 10
− 3
. For display purpose, the off-diagonal
elements of the outcome W
�
were normalized to meet the edge thickness (0-15 line width property in
MatLab plot) and edges with thickness less than 1 were removed.
Figure 4: Correlation between off-diagonal elements of the rater-derived and the data-derived
covariances. The data-derived covariance structure was for λ = 1.00 × 10
− 3
. Pearson correlation:
r = -0.69; p = 0.0043.
Figure 5: Sensitivity analysis on λ for 10
− 1 0
≤ 𝜆 ≤ 10
1
. A: KL loss between the rater-derived and the
data-derived covariances. B: p-value of the correlation between off-diagonal elements of the rater-
derived and the data-derived covariances. Note that in the range λ > 0.0215, it was not feasible to
compute the p-value because all the off-diagonal elements of concentration matrix W
�
converged to zero,
C: Off diagonal terms over the range of λ.
Table legends
Table 1: Task content description for the six tasks selected.
Table 2: Expert rated scores with descriptions of the 22 attributes for the six tasks selected.
Table 3: Number of measurements of pairwise tasks for the six tasks selected.
Table 4: Comparison of KL loss in simulation results with synthetic data for three different methods of
graph structure learning with partial observation data: BLM is our novel graph structure learning based
on Bayesian Linear Model, MissGLasso: EM-based method; mGLasso: as pair-only estimate. n: number
85
of samples: m: number of (partial) observations, p number of nodes. Shown are mean KL Loss and in
parentheses, standard deviation.
86
A B
Figure 6 Task selection for 2 subjects during the 2 weeks of EXCITE trials. A: The subject practiced 14 tasks. B: The subject
practiced 12 tasks in total during the therapy. The x-axis is the day of visit, and y-axis EXCITE task ID. The intensity shows the
number of repetition per day.
87
Figure 7 Example of performance and linear fit as a function of trials for Tasks TK and AR for 1 subject on day 1. The x axis is
the trials within a set, and y axis is the number of the repetition in a unit time. Black dots show the number of repetition on
each trial; the gray line shows progress in one set; the black line shows the fitted line to the data. The intercept of this line
gives the "performance index" that we used study task interrelation via graphical models.
88
A
B
Figure 8 Graph structure of task interrelation for the 6 most practiced tasks trained from 542 pairwise measurements from
46 participants and two values of sparsity parameter 𝛌 . Ticker edges represent higher absolute value of concentration matrix
between tasks. A: a less sparse graph with 𝛌 = 𝟏 . 𝟎𝟎 × 𝟏𝟎
− 𝟑 , and B: a sparser graph with: 𝛌 = 𝟓 . 𝟎 × 𝟏𝟎
− 𝟑 . For display
purpose, the off-diagonal elements of the outcome 𝐖 �
were normalized to meet the edge thickness (0-15 line width property
in MatLab plot) and edges with thickness less than 1 were removed.
89
Figure 9 Correlation between off-diagonal elements of the rater-derived and the data-derived covariances. The data-derived
covariance structure was for 𝛌 = 𝟏 . 𝟎𝟎 × 𝟏𝟎
− 𝟑 . Pearson correlation:
r = -0.69; p = 0.0043.
90
Figure 10 Sensitivity analysis on 𝛌 for 𝟏𝟎
− 𝟏 𝟎 ≤ 𝝀 ≤ 𝟏𝟎
𝟏 . A: KL loss between the rater-derived and the data-derived
covariances. B: p-value of the correlation between off-diagonal elements of the rater-derived and the data-derived
covariances. Note that in the range 𝛌 > 0.0215, it was not feasible to compute the p-value because all the off-diagonal
elements of concentration matrix 𝐖 �
converged to zero, C: Off diagonal terms over the range of 𝛌 .
91
Task VP, Vertical pegboard: A pegboard and pegs are used with this task. The subject lifts a wooden peg and
places it in a designated hole on the pegboard. The pegboard can be placed flat or more vertically at any angle
depending on the movements desired. For example, the flat pegboard position challenges elbow extension; the
vertical pegboard position challenges elbow extension with shoulder flexion.
Task WE, Weighted extensions: The subject rests their forearm on a table so that their wrist is at the edge and
their hand is dangling off the edge of the table. They are then asked to repeatedly raise their wrist while
holding a weight.
Task TK, Telegraph key: A button (key) is attached to a counter and placed on a small square wooden
platform. The subject is asked to depress the key repeatedly with one finger at a time. Subjects are instructed
to isolate the individual finger movements by keeping their hand as flat as possible against the surface of the
base of the apparatus.
Task AR, Arc and ring: Apparatus is a long board with two wooden pieces sticking up on both ends. Rings
are placed on a piece of plastic tubing and both ends of the tubing are placed on the wooden pieces so that an
arc is formed. The subject is asked to move the rings (one at a time) from the right side of the arc to the left
side or vice versa.
Task CB, Cotton balls: Cotton balls and some type of container are placed on the table. The subject is asked
to pick the cotton balls up off of the table and place them in the container. The subject is instructed to try to
use a pincer grasp.
Task PC Poker chips: A piggy bank, poker chips, and dartboard grid or play-doh are used for this task. The
hole in the piggy bank is made large enough for the poker chips to be placed in the bank. The subject is asked
to pick up the poker chips one at a time and place them in the piggy bank. The poker chips can be arranged on
a dartboard grid or in a lump of Play-Doh so that they stand upright initially to make it easier for the subject to
pick them up.
Table 4 Task content description for the six tasks selected.
92
Table 5 Expert rated scores with descriptions of the 22 attributes for the six tasks selected.
Table 6 Number of measurements of pairwise tasks for the six tasks selected.
93
p m n MissGLasso mGLasso BLM
6 3 300 0.46 (0.15) 3.05 ( 2.91) 1.55 ( 8.08)
12 3 300 1.79 (0.54) 11.24 ( 3.71) 1.15 ( 0.58)
12 6 300 2.71 (1.08) 13.37 ( 4.69) 4.35 ( 1.48)
30 3 300 13.77 (2.76) 28.43 ( 8.92) 11.60 ( 2.85)
30 6 300 14.67 (1.89) 28.86 (10.01) 12.92 ( 1.84)
30 9 300 10.71 (2.52) 26.28 (10.72) 10.02 ( 3.17)
Table 7 Comparison of KL loss in simulation results with synthetic data for three different methods of graph structure
learning with partial observation data: BLM is our novel graph structure learning based on Bayesian Linear Model,
MissGLasso: EM-based method; mGLasso: as pair-only estimate. 𝐧 : number of samples: 𝐦 : number of (partial) observations,
p number of nodes. Shown are mean KL Loss and in parentheses, standard deviation.
94
Chapter 4 Modeling memory structure: context switching mechanism
of motor output from multi-timescale multi-tasks memory process,
using sparse regression with 𝒍 𝟏 regularization
4.1 Summary
In the literature of motor learning and adaptation, several computational models have been proposed
to account for observed interference and transfer effects on motor output, using different timescale
motor memory states and error-based state update (Smith, Ghazizadeh et al. 2006, Lee and
Schweighofer 2009). However, these models typically assume a perfect contextual switching mechanism
to selectively retrieve motor memory, which is suitable to the context. Here, we hypothesized that
context switching mechanism on motor output is not a perfect one, and identified ranges of timescale
on the memory states. To find a parsimonious motor output model, we run sparse regression to multi-
timescale and multi-tasks memory process. Our result showed that output from short timescale memory
is most influential, and context interference happens more in the middle and long timescale memory.
We also compare the memory output model between the healthy and individuals with stroke.
4.2 Introduction
What is happening of memory processes for synaptic consolidation, when a person learns a motor task?
Smith, et al. proposed two time-scale state space memory model to explain learning motor tasks (Smith,
Ghazizadeh et al. 2006). Observed motor output (i.e., force profile in reaching task under force field) is a
simple sum of two linear state space models, one is quickly adapt to motor errors (fast memory process),
and the other is slow to adapt (slow memory process). These two memory processes are updated after
each trial from the errors between ideal force and actual force exerted. This simple model explains
95
several motor memory adaptation processes such as saving and interference in force field adaptation of
error-clamp experiments.
Lee, et al. examined that the fast and slow memory processes are cascaded or parallel connected, and
showed that only parallel architecture is accountable for dual adaptation (Schweighofer, Han et al. 2009).
They proposed parallel architecture of 1-fast and 2-slow memory on two tasks of motor adaptation task,
and showed that the model best-explains dual adaptation (e. g., spontaneous recovery and anterograde
interference). Furthermore, Schweighofer, et al. used this model for contextual interference for subjects
with stroke (Schweighofer, Lee et al. 2011).
Both models, proposed by Smith, et al. and Lee, et al., are based on 1-fast and N-slow paradigm, where
N is the number of distinct tasks. One fast memory is shared by all the N tasks, and updated when any of
the N tasks are presented, while the slow process is selectively updated when the corresponding task is
updated.
It assumed that context switching mechanism exclusively selects one of slow processes, corresponding
to the task presented; however, it is unclear that memory model in humans has such a perfect switching
mechanism.
Extending the 1-fast and N-slow to multi-timescale memory paradigm, Kording, et al proposed Bayesian
learner model which consists of 30 disturbance states of multiple timescales, the range from 2 seconds
to 4 days (Kording, Tenenbaum et al. 2007) . Without context switching mechanism, disturbance states
works as “shared memory” over 2 tasks, and assigns “credit” to which timescales of disturbance
contribute to the motor output; however, it is unknown that how this disturbance states adopts to more
than two tasks.
96
Here, we take a hybrid approach: we prepare multiple-timescale memory process for each task to model
adaptation to more than 2 tasks, but we use soft (non-exclusive) context switching on motor output,
where the switching assigns credits to memory states of multi-timescale and multi-task memory.
As we seek for most parsimonious motor output model, sparse regression with L1 regularization is
applied with (net) motor output as the output variable, and the multi-timescale and multi-task memory
states as input. We analyzed the structure of context switching: timescale distribution, and interference
and transfer effects, and compare the result between control and stroke group.
4.3 Method
4.3.1 Computational model
Our motor process is represented by a partitioned vector,
T
T TT
1
| ... ... |
mM
=
xx x x
where 𝐱 is P × 1 vector, and task-wise sub-vector
m
x
(
1,..., mM =
and M=3 in our case) corresponds
to the m-th task memory process. Memory process for each task consists of memory states with
different time scale.
( 1) () () ()
mm
n A n Bc n e n + = + xx
1 -th task presented
()
0 Otherwise
m
cn
=
where n is the time step n = 1, … N (N = 165), 𝐱 m
is K × 1 vector (K = 30, and thus P = KM),
A = diag(a
1
, … , a
K
) is a forgetting rate matrix, B = diag(b
1
, … , b
K
) is a learning rate matrix, and
e(n) = d − y(n) is a scalar error feedback between the desired performance d and actual motor output
97
on task m. The motor process context switching c selects one of task-wise memory process that
corresponds to the task presented, allowing to update the task-wise memory process from the error
feedback , and decay the other task-wise memory process by the forgetting rate.
Timescale of forgetting and learning rate (Kording, Tenenbaum et al. 2007) and (Kim, et al., unpublished
paper 2013) is given by
/
k
k
ae
τ −∆
=
k h
k
d
b
τ
=
where Δ is the inter-trial interval ( Δ = 12 sec), τ
k
is the time constant ranging from 2 seconds as fastest
to 24 hours as slowest, with K = 30 different time constants logarithmically scaled, and free parameters
d ≥ 0 and ℎ ≥ 0 which are tuned by constraint optimization (the interior-point method).
The motor output (observed task performance) is given by the linear combination of the multi-timescale
and multi-task motor process, with the soft context switching mechanism.
y(n) = 𝐱 (n)
T
𝛃
where y(n) is a scalar motor output and 𝐱 (n) is P × 1 memory process at the n-th trial, and 𝛃 is P × 1
motor output context switching, which selects.
Now, we use the memory process to estimate the motor output.
𝐲 � = 𝐗𝛃 (1)
where 𝐗 = [ 𝐱 (1) … , 𝐱 (N)]
T
is the N × P design matrix, 𝒚 �= [ 𝑦 � (1), … , 𝑦 � ( 𝑁 )]
T
is the motor output vector,
and 𝛃 is the P × 1 regression coefficient vector. Note that this 𝛃 serves as the context switching on the
retrieval motor output. The Figure 11 shows the diagram of this model.
98
We aim to find a parsimonious model and to know which memory states contribute, to explain motor
output. We therefore maximize the likelihood while enhancing sparsity by using with the
𝑙 1
regularization:
min
1
2𝑁 || 𝐲 − 𝐗𝛃 ||
2
2
+ 𝜆 || 𝜷 ||
1
(2)
where || ⋅ ||
1
and || ⋅ ||
2
are 𝑙 1
and 𝑙 2
norm respectively, and λis the sparsity parameter. We run lasso
algorithm to solve this minimization problem (Tibshirani 1996, Efron, Hastie et al. 2003). We varied λ so
that 𝛃 ranges from least square solution to all zeros, and selected a λwhich minimizes mean square
error on 10 fold cross validation.
Figure 11: Diagram of the proposed model
4.3.2 Dataset
We used grip force dataset (Schweighofer, Lee et al. 2011). We have 12 subjects in stroke group, and 12
subjects in control group. A grip force task is selected out of 3 tasks, and a subject was asked to output
grip force to fit to the grip force pattern. 150 (50 for each task) trials are presented in pseudo random
order.
99
4.4 Results
4.4.1 Memory process
We used memory processes with different time scale from 2 sec to 24 hours, logarithmically spaced as
described in Method. Larger forgetting rate, more quickly reflect performance of task trial when the
task is presented, and converges to 0 performance when the task is not presented. Larger learning rate,
s slowly adapts change in performance. The Figure 12 shows an example of memory process sub-vector
𝐱 1
for a control subject and a subject with stroke. The aggregated performance metric was (mean, std,
min,max, median)=(4.64, 0.272, 1.91, 5.34, 4.66) for control, and (4.37, 0.448, 0.000482, 5.34, 4.44) for
stroke.
(A) (B)
Figure 12: Memory process. Thick black line shows the performance on task 1 with the red dot as actual measurements. The
other lines show the memory states with different time scale for control (A) and stroke (B) subject.
4.4.2 Sparse regression
We run a lasso algorithm to minimize square errors with 𝑙 1
regularization by using memory process as
base functions. Figure 13 shows examples of control and stroke subjects’ task performance and its
100
regression fit. Mean absolute error of regression fit was 0.1707 for control and 0.2658 for stroke
subjects.
(A) (B)
Figure 13: Sparse regression. Black line shows the actual performance of the task, and red line shows the regression line. (A)
Control and (B) stroke subject.
4.4.3 Memory structure
Figure 14 shows the histogram of times scales on memory process over all subjects. The number of
entries on task-wise memory was almost evenly distributed for both control and stroke:
(task1, task2, task3) = (25,20,27) for control and (26,22,21) for stroke. The distribution was almost
identical.
101
(A) (B)
Figure 14 Histogram of memory process time scales for (A) control and (B) stroke subject.
4.4.4 Interference and transfer effects
Here, we regress output variable, and analyzed the values of 𝜷 ′
𝑠 . We categorized resulting 𝜷 into 3 bins
(short, middle, and long ranges of timescale) and summed up over positive and negative 𝜷 ′ 𝑠 . The entries
are normalized by the number of non-zero coefficients. The difference on the middle range negative
coefficients were observed. While the control group shows zero middle range negative coefficients, the
stroke group shows larger entries in the middle range negative memory.
102
Control Stroke
Figure 15 Histogram of memory structure
4.5 Discussion
We prepared the observation model of task performance, based on multiple-task, multiple-time scale
memory process, and identify which memory states are most influential to estimate the measured task
performance, by running sparse regression.
4.5.1 Memory structure
Results on memory structure analysis showed that the distribution has two distinct peaks. One is 2
seconds, and the other is around 40 seconds. This may suggest that fast and slow memory model (Smith)
correctly depicts memory structure. The short range memory (2 seconds) retrieves the motor memory
of the current task presented, and the other memory retrieves previously presented tasks. The task-wise
entries are evenly distributed, which is plausible considering the task schedule is pseudo-random. We
103
found that the distribution is similar for both control and stroke group. This suggests mechanism of
context switching on motor output is intact for our stroke group.
4.5.2 Interference and transfer effects
We analyzed the regression coefficients. Since negative coefficients worsen the performance of a motor
task, we considered that they interfere of learning of the task. Conversely, the positive coefficients are
assumed to show the transfer of learning on a task. Short range memory interference is observed as the
2
nd
largest component for stroke group, while there is no interference for control group. This may
indicate that the stroke group be confused to the context cue to perform tasks as they have visual
working memory deficits (Schweighofer, Lee et al. 2011)
.
4.6 Conclusion
We proposed multiple-timescale multi-task memory process model with soft context switching on
motor output, and trained the model with sparse regression. We analyzed the structure of context
switching: timescale distribution, and interference and transfer effects, and compared the result
between control and stroke group. The result showed that memory structure has two distinct peaks
considered as short and long term memory in the previous research. However, the stroke group found
context interference on the middle range of memory, while we found no interference for the control
group. This indicates that the stroke group may suffer the deficits on the visual working memory.
104
Chapter 5 Conclusion
Toward more effective and efficient rehabilitation methodology, this thesis contributes to the field from
computational modeling approach by supporting the axiom of "use it or lose it" with the behavioral data,
revealing measurement-based inter-task graphical structure, and analyzing memory structure and
context switching on motor output.
We first showed quantitatively that the axiom “Use it and improve it, or lose it” is correct, and the
relationship between arm function and use is non-linear (sigmoidal) function, by applying Bayesian
regression to EXCITE clinical trial dataset.
Secondly, we developed a new method to learn Gaussian graph structure using the Bayesian Linear
Model. We found that the output of the graphical model correlated with similarity scores between tasks
as rated by experts, thus suggesting that our method can characterize task interrelations between
multiple complex and ecological motor tasks, typical of task-oriented training post-stroke.
Finally, we modeled multi-timescale and multi-tasks memory process. Our result showed that output
from short timescale memory is most influential, and context interference happens more in the middle
and long timescale memory. We also compare the memory output model between the healthy and
individuals with stroke, and showed larger interference in learning multiple tasks in stroke group,
compared with the control.
105
References
Bickel, P. J. and E. Levina (2008). "Covariance regularization by thresholding." The Annals of Statistics
36(6): 2577-2604.
Bishop, C. M. (2006). Pattern recognition and machine learning, Springer.
Brashers-Krug, T., R. Shadmehr and E. Bizzi (1996). "Consolidation in human motor memory." Nature
382(6588): 252-255.
Butefisch, C., Hummelsheim, H, Denzler, P, Mauritz, KH. (1995). "Repetitive training of isolated
movements improves the outcome of motor rehabilitation of the centrally paretic hand." J Neurol Sci
130: 59-68.
Carmi, A., P. Gurfil and D. Kanevsky (2010). "Methods for Sparse Signal Recovery Using Kalman Norms
and Quasi-Norms." 58(4): 2405-2409.
Choi, Y., J. Gordon, D. Kim and N. Schweighofer (2009). "An Adaptive Automated Robotic Task Practice
System for Rehabilitation of Arm Functions after Stroke." IEEE Transactions in Robotics In press.
Conner, J. M., A. Culberson, C. Packowski, A. A. Chiba and M. H. Tuszynski (2003). "Lesions of the Basal
forebrain cholinergic system impair task acquisition and abolish cortical plasticity associated with motor
skill learning." Neuron 38(5): 819-829.
DasGupta, A. (2008). Asymptotic Theory of Statistics and Probability. New York, NY, Springer New York.
Daw, N. D., J. P. O'Doherty, P. Dayan, B. Seymour and R. J. Dolan (2006). "Cortical substrates for
exploratory decisions in humans." Nature 441(7095): 876-879.
Dissanayake, M. W. M. G., P. Newman, S. Clark, H. F. Durrant-Whyte and M. Csorba (2001). "A solution
to the simultaneous localization and map building (SLAM) problem." IEEE Transactions on Robotics and
Automation 17(3): 229-241.
Doya, K. (2002). "Metalearning and neuromodulation." Neural Netw 15(4-6): 495-506.
Efron, B., T. Hastie, I. Johnstone and R. Tibshirani (2003). "Least Angle Regression." 1-44.
Fisher, B. E. and K. J. Sullivan (2001). "Activity-dependent factors affecting poststroke functional
outcomes." Top Stroke Rehabil 8(3): 31-44.
106
Friedman, J., T. Hastie and R. Tibshirani (2008). "Sparse inverse covariance estimation with the graphical
lasso." Biostatistics (Oxford, England) 9(3): 432-441.
Fujiwara, T., Y. Kasashima, K. Honaga, Y. Muraoka, T. Tsuji, R. Osu, K. Hase, Y. Masakado and M. Liu
(2009). "Motor improvement and corticospinal modulation induced by hybrid assistive neuromuscular
dynamic stimulation (HANDS) therapy in patients with chronic stroke." Neurorehabilitation and neural
repair 23(2): 125-132.
Han, C. E., M. a. Arbib and N. Schweighofer (2008). "Stroke rehabilitation reaches a threshold." PLoS
computational biology 4(8): e1000133-e1000133.
Hellstrom, K., B. Lindmark, B. Wahlberg and A. R. Fugl-Meyer (2003). "Self-efficacy in relation to
impairments and activities of daily living disability in elderly patients with stroke: a prospective
investigation." J Rehabil Med 35(5): 202-207.
Hubbard, I. J., M. W. Parsons, A. S. Unit and J. Hunter (2009). "Task-specifi c training : evidence for and
translation to clinical practice." 16(June): 175-189.
Imamizu, H., N. Sugimoto, R. Osu, K. Tsutsui, K. Sugiyama, Y. Wada and M. Kawato (2007). "Explicit
contextual information selectively contributes to predictive switching of internal models." Exp Brain Res
181(3): 395-408.
Kass, R. E. and A. E. Raftery (1995). "Bayes factors." Journal of the American Statistical Association
90(430): 773-795.
Kay, S. M. (1993). Fundamentals of Statistical Signal Processing: Estimation Theory, Prentice-Hall PTR.
Kleim, J. A., S. Barbay and R. J. Nudo (1998). "Functional reorganization of the rat motor cortex following
motor skill learning." J Neurophysiol 80(6): 3321-3325.
Kleim, J. A. and T. A. Jones (2008). "Principles of Experience-Dependent Neural Plasticity: Implications for
Rehabilitation After Brain Damage." Journal of Speech, Language, and Hearing Research 51: S225-S239
Kolar, M. and E. P. Xing (2012). "Estimating Sparse Precision Matrices from Data with Missing Values."
Kording, K. P., J. B. Tenenbaum and R. Shadmehr (2007). "The dynamics of memory as a consequence of
optimal adaptation to a changing body." Nature neuroscience 10(6): 779-786.
107
Krakauer, J. W. (2006). "Motor learning: its relevance to stroke recovery and neurorehabilitation." Curr
Opin Neurol 19(1): 84-90.
Krakauer, J. W. (2009). "Motor learning and consolidation: the case of visuomotor rotation." Adv Exp
Med Biol 629: 405-421.
Krakauer, J. W., C. Ghez and M. F. Ghilardi (2005). "Adaptation to visuomotor transformations:
consolidation, interference, and forgetting." J Neurosci 25(2): 473-478.
Krakauer, J. W., M. F. Ghilardi and C. Ghez (1999). "Independent learning of internal models for
kinematic and dynamic control of reaching." Nat Neurosci 2(11): 1026-1031.
Kwakkel, G., Wagenaar, RC, Twisk, JW, Lankhorst, GJ, Koetsier, JC. (1999). "Intensity of leg and arm
training after primary middle-cerebral artery stroke: a randomized trial." Lancet 354: 191-196.
Lam, C. and J. Fan (2009). "Sparsistency and Rates of Convergence in Large Covariance Matrix
Estimation." Annals of statistics 37(6B): 4254-4278.
Lee, J.-y. and N. Schweighofer (2009). "Dual Adaptation Supports a Parallel Architecture of Motor
Memory." Memory 29(33): 10396-10404.
Lee, J. Y. and N. Schweighofer (2009). "Dual adaptation supports a parallel architecture of motor
memory." J Neurosci 29(33): 10396-10404.
Liepert, J., H. Bauder, H. R. Wolfgang, W. H. Miltner, E. Taub and C. Weiller (2000). "Treatment-induced
cortical reorganization after stroke in humans." Stroke 31(6): 1210-1216.
Mayo, N. E., S. Wood-Dauphinee, R. Cote, L. Durcan and J. Carlton (2002). "Activity, participation, and
quality of life 6 months poststroke." Arch Phys Med Rehabil 83(8): 1035-1042.
Miall, R. C., N. Jenkinson and K. Kulkarni (2004). "Adaptation to rotated visual feedback: a re-
examination of motor interference." Exp Brain Res 154(2): 201-210.
Nakayama, H., H. S. Jorgensen, H. O. Raaschou and T. S. Olsen (1994). "Compensation in recovery of
upper extremity function after stroke: the Copenhagen Stroke Study." Arch Phys Med Rehabil 75(8):
852-857.
Nudo, R. J., B. M. Wise, F. SiFuentes and G. W. Milliken (1996). "Neural substrates for the effects of
rehabilitative training on motor recovery after ischemic infarct." Science 272(5269): 1791-1794.
108
Plautz, E. J., G. W. Milliken and R. J. Nudo (2000). "Effects of repetitive motor training on movement
representations in adult squirrel monkeys: role of use versus learning." Neurobiol Learn Mem 74(1): 27-
55.
Raftery, A. E. (1995). Bayesian model selection in social research. Sociological methodology. P. V.
Marsden. Cambridge, MA, Blackwell: 111–196.
Ravikumar, P., M. J. Wainwright, G. Raskutti and B. Yu (2011). "High-dimensional covariance estimation
by minimizing ℓ 1 -penalized log-determinant divergence." Electronic Journal of Statistics 5(January
2010): 935-980.
Rothman, A. J., P. J. Bickel, E. Levina and J. Zhu (2008). "Sparse permutation invariant covariance
estimation." Electronic Journal of Statistics 2(January): 494-515.
Salbach, N. M., N. E. Mayo, S. Robichaud-Ekstrand, J. A. Hanley, C. L. Richards and S. Wood-Dauphinee
(2006). "Balance self-efficacy and its relevance to physical function and perceived health status after
stroke." Arch Phys Med Rehabil. 87: 364-370.
Schaefer, S. Y., C. B. Patterson and C. E. Lang (2013). "Transfer of Training Between Distinct Motor Tasks
After Stroke: Implications for Task-Specific Approaches to Upper-Extremity Neurorehabilitation."
Neurorehabil Neural Repair In press.
Schweighofer, N., Y. Choi, C. Winstein and J. Gordon "Task-oriented rehabilitation robotics." Am J Phys
Med Rehabil 91(11 Suppl 3): S270-279.
Schweighofer, N. and K. Doya (2003). "Meta-learning in reinforcement learning." Neural Netw 16(1): 5-9.
Schweighofer, N., C. E. Han, S. L. Wolf, M. a. Arbib and C. J. Winstein (2009). "A functional threshold for
long-term use of hand and arm function can be determined: predictions from a computational model
and supporting data from the Extremity Constraint-Induced Therapy Evaluation (EXCITE) Trial." Physical
therapy 89(12): 1327-1336.
Schweighofer, N., J.-Y. Lee, H.-T. Goh, Y. Choi, S. S. Kim, J. C. Stewart, R. Lewthwaite and C. J. Winstein
(2011). "Mechanisms of the contextual interference effect in individuals poststroke." Journal of
neurophysiology 106(5): 2632-2641.
Sing, G. C. and M. A. Smith "Reduction in learning rates associated with anterograde interference results
from interactions between different timescales in motor adaptation." PLoS Comput Biol 6(8).
109
Smith, M. A., A. Ghazizadeh and R. Shadmehr (2006). "Interacting Adaptive Processes with Different
Timescales Underlie Short-Term Motor Learning." PLoS Biol 4(6): e179-e179.
Städler, N. and P. Bühlmann (2010). "Missing values: sparse inverse covariance estimation
and an extension to sparse regression." Statistics and Computing 22(1): 219-235.
Stephan, K. E., L. M. Harrison, S. J. Kiebel, O. David, W. D. Penny and K. J. Friston (2007). "Dynamic causal
models of neural system dynamics:current state and future extensions." Journal of biosciences 32(1):
129-144.
Sutton, R. S. and A. G. Barto (1998). Reinforcement Learning: An Introduction, The MIT Press.
Taub, E., G. Uswatte and T. Elbert (2002). "New treatments in neurorehabilitation founded on basic
research." Nature reviews. Neuroscience 3(3): 228-236.
Taub, E., G. Uswatte and D. M. Morris (2003). "Improved motor recovery after stroke and massive
cortical reorganization following Constraint-Induced Movement therapy." Phys Med Rehabil Clin N Am
14(1 Suppl): S77-91, ix.
Terra, M. H., J. Y. Ishihara and A. C. Padoan (2007). "Information Filtering and Array Algorithms for
Descriptor Systems Subject to Parameter Uncertainties." 55(1): 1-9.
Tibshirani, R. (1996). "Regression shrinkage and selection via the lasso." Journal of the Royal Statistical
Society. 58(1): 267-288.
Tong, C., D. M. Wolpert and J. R. Flanagan (2002). "Kinematics and dynamics are not represented
independently in motor working memory: evidence from an interference study." J Neurosci 22(3): 1108-
1113.
Uswatte, G., E. Taub, D. Morris, K. Light and P. A. Thompson (2006). "The Motor Activity Log-28:
assessing daily use of the hemiparetic arm after stroke." Neurology 67(7): 1189-1194.
Uswatte, G., E. Taub, D. Morris, M. Vignolo and K. McCulloch (2005). "Reliability and validity of the
upper-extremity Motor Activity Log-14 for measuring real-world arm use." Stroke 36(11): 2493-2496.
Van Peppen, R. P., G. Kwakkel, S. Wood-Dauphinee, H. J. Hendriks, P. J. Van der Wees and J. Dekker
(2004). "The impact of physical therapy on functional outcomes after stroke: what's the evidence." Clin
Rehabil 18: 833-862.
110
Vaswani, N. (2008). Kalman filtered compressed sensing. Image Processing, 2008. ICIP 2008. 15th IEEE
International Conference on, IEEE.
Winstein, C. and S. Wolf (2008). Task-oriented training to promote upper extremity recovery. Stroke
recovery & rehabilitation. J. Stein, R. Macko, C. Winstein and R. Zorowitz. New York, Demos Medical
267-290.
Winstein, C. J., S. Blanton, C. Hahn, C. Kushi, J. Wang, L. Horvath, M. Prettyman, P. A. Thompson, Q.
Zhang, D. Nichols-Larsen, S. L. Wolf and S. Rowles (2007). "Task-oriented training: an analysis of the
components of training from the EXCITE trial." Society for Neuroscience.
Winstein, C. J., J. P. Miller, S. Blanton, E. Taub, G. Uswatte, D. Morris, D. Nichols and S. Wolf (2003).
"Methods for a multisite randomized trial to investigate the effect of constraint-induced movement
therapy in improving upper extremity function among adults recovering from a cerebrovascular stroke."
Neurorehabil Neural Repair 17(3): 137-152.
Winstein, C. J., D. K. Rose, S. M. Tan, R. Lewthwaite, H. C. Chui and S. P. Azen (2004). "A randomized
controlled comparison of upper-extremity rehabilitation strategies in acute stroke: A pilot study of
immediate and long-term outcomes." Arch Phys Med Rehabil 85(4): 620-628.
Wolf, S., Blanton S, Baer H, Breshears J, Butler AJ (2002). "Repetitive Task Practice: A critical review of
constraint induced therapy in stroke." The Neurologist 8: 325-338.
Wolf, S., Lecraw DE, Barton LA, Jann BB (1989). "Forced use of hemiplegic upper extremities to reverse
the effect of learned nonuse among chronic stroke and head-injured patients." Exp Neurol 104: 104-132.
Wolf, S. L., P. A. Thompson, D. M. Morris, D. K. Rose, C. J. Winstein, E. Taub, C. Giuliani and S. L. Pearson
(2005). "The EXCITE trial: attributes of the Wolf Motor Function Test in patients with subacute stroke."
Neurorehabil Neural Repair 19(3): 194-205.
Wolf, S. L., P. A. Thompson, D. M. Morris, D. K. Rose, C. J. Winstein, E. Taub, C. Giuliani and S. L. Pearson
(2005). "The EXCITE trial: attributes of the Wolf Motor Function Test in patients with subacute stroke."
Neurorehabilitation and neural repair 19(3): 194-205.
Wolf, S. L., P. a. Thompson, C. J. Winstein, J. P. Miller, S. R. Blanton, D. S. Nichols-Larsen, D. M. Morris, G.
Uswatte, E. Taub, K. E. Light and L. Sawaki (2010). "The EXCITE Stroke Trial. Comparing Early and Delayed
Constraint-Induced Movement Therapy." Stroke; a journal of cerebral circulation.
Wolf, S. L., C. J. Winstein, J. P. Miller, E. Taub, G. Uswatte, D. Morris, C. Giuliani, K. E. Light and D.
Nichols-Larsen (2006). "Effect of constraint-induced movement therapy on upper extremity function 3
111
to 9 months after stroke: the EXCITE randomized clinical trial." JAMA : the journal of the American
Medical Association 296(17): 2095-2104.
Wolf, S. L., C. J. Winstein, J. P. Miller, E. Taub, G. Uswatte, D. Morris, C. Giuliani, K. E. Light and D.
Nichols-Larsen (2006). "Effect of constraint-induced movement therapy on upper extremity function 3
to 9 months after stroke: the EXCITE randomized clinical trial." Jama 296(17): 2095-2104.
Wolf, S. L., C. J. Winstein, J. P. Miller, P. A. Thompson, E. Taub, G. Uswatte, D. Morris, S. Blanton, D.
Nichols-Larsen and P. C. Clark (2008). "Retention of upper limb function in stroke survivors who have
received constraint-induced movement therapy: the EXCITE randomised trial." Lancet Neurol 7(1): 33-40.
Yuan, M. and Y. Lin (2007). "Model selection and estimation in the Gaussian graphical model."
Biometrika 94(1): 19-35.
Zhang, P., W. Qi and Z. Deng (2012). "Sequential fusion Kalman filter." Information Fusion (FUSION)(6):
2140-2146.
Abstract (if available)
Abstract
Understanding the effects of task practice on the long term recovery of arm function post-stroke could allow effective motor training at a reduced cost. There is, however, little systematic understanding of what constitutes effective practice and how arm function changes during and outside of training. ❧ In this light, we first studied long-term effects of therapy, and whether there is supportive evidence for the much discussed "use it or lose it" axiom at the behavioral level in the chronic phase post-stroke. For this, we used longitudinal data from a Phase III clinical trial and compared different hypothesizes using Bayesian model evidence, and showed that indeed sufficient arm use in daily life can further improves function besides the gains due to therapy. ❧ Next, we aimed to understand the effects of task selections during task-oriented training, as it is at present not known how to determine the task schedules that maximize recovery, notably because of possible learning transfer and interference between tasks. Here, we quantified such task interrelations by forming a graphical model from task performance data obtained during constraint induced movement therapy. Because the data is not missing at random and partial, we developed a novel method to learn Gaussian graph structure using the Bayesian Linear Model. We found that the output of the graphical model correlated with similarity scores between tasks as rated by experts. ❧ Finally, we focused on the mechanism of learning and memory structure in motor tasks. We hypothesized that context switching mechanism on motor output is not perfect unlike assumptions on previously proposed models. To find a parsimonious motor output model and its memory components, we ran sparse regression on multi-timescale and multi-task memory process. Our result showed that output from short timescale memory is most influential to motor output, and context interference or transfer was observed in the middle and long timescale memory. We also compared the memory output between the healthy and individuals with stroke. ❧ Toward more effective and efficient rehabilitation methodology, this thesis contributes to the field from computational modeling approach by supporting the axiom of "use it or lose it" with the behavioral data, revealing measurement-based inter-task graphical structure, and analyzing memory structure and context switching on motor output.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Modeling motor memory to enhance multiple task learning
PDF
Design of adaptive automated robotic task presentation system for stroke rehabilitation
PDF
Experimental and computational explorations of different forms of plasticity in motor learning and stroke recovery
PDF
Learning reaching skills in non-disabled and post-stroke individuals
PDF
Hemisphere-specific deficits in the control of bimanual movements after stroke
PDF
Computational principles in human motor adaptation: sources, memories, and variability
PDF
Computational models and model-based fMRI studies in motor learning
PDF
Relationship between brain structure and motor behavior in chronic stroke survivors
PDF
Virtual surgeries as a tool for studying motor learning
PDF
Bayesian methods for autonomous learning systems
PDF
Learning controllable data generation for scalable model training
PDF
Arm choice post-stroke
PDF
The representation, learning, and control of dexterous motor skills in humans and humanoid robots
PDF
Inference of computational models of tendon networks via sparse experimentation
PDF
The effects of fast walking, biofeedback, and cognitive impairment on post-stroke gait
PDF
Syntactic alignment models for large-scale statistical machine translation
PDF
Deficits and rehabilitation of upper-extremity multi-joint movements in individuals with chronic stroke
PDF
Data-driven autonomous manipulation
PDF
Robot life-long task learning from human demonstrations: a Bayesian approach
PDF
Minimum jerk model for control and coarticulation of arm movements with multiple via-points
Asset Metadata
Creator
Hidaka, Yukikazu
(author)
Core Title
Computational model of stroke therapy and long term recovery
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
08/29/2013
Defense Date
06/11/2013
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Bayesian regression,computational model,graph structure learning,machine learning,OAI-PMH Harvest,stroke rehabilitation,task specific training
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Schweighofer, Nicolas (
committee chair
), Schaal, Stefan (
committee member
), Winstein, Carolee J. (
committee member
)
Creator Email
hidaka@usc.edu,hidakav@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-322852
Unique identifier
UC11293934
Identifier
etd-HidakaYuki-2026.pdf (filename),usctheses-c3-322852 (legacy record id)
Legacy Identifier
etd-HidakaYuki-2026.pdf
Dmrecord
322852
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Hidaka, Yukikazu
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
Bayesian regression
computational model
graph structure learning
machine learning
stroke rehabilitation
task specific training